Configuration

steep.yaml

The file steep.yaml contains the main configuration of Steep. This page describes all configuration keys and values you can set.

Note that keys are specified using the dot notation. You can use them as they are given here or use YAML notation instead. For example, the following configuration item

steep.cluster.eventBus.publicPort: 41187

is identical to:

steep:
  cluster:
    eventBus:
      publicPort: 41187

You may override items in your configuration file with environment variables. This is particularly useful if you are using Steep inside a Docker container. The environment variables use a slightly different naming scheme. All variables are in capital letters and dots are replaced by underscores. For example, the configuration key steep.http.host becomes STEEP_HTTP_HOST and steep.cluster.eventBus.publicPort becomes STEEP_CLUSTER_EVENTBUS_PUBLICPORT. You may use YAML syntax to specify environment variable values. For example, the array steep.agent.capabilities can be specified as follows:

STEEP_AGENT_CAPABILITIES='["docker", "python"]'

General configuration

steep.tmpPath
The path to a directory where temporary files should be stored during processing. Steep generates names for the outputs of execute actions in a workflow. If the store flag of an output parameter is false (which is the default), the generated filename will be relative to this temporary directory.
steep.outPath
The path to a directory where output files should be stored. This path will be used instead of steep.tmpPath to generate a filename for an output parameter if its store flag is true.
steep.overrideConfigFile
The path to a file that keeps additional configuration. The values of the overrideConfigFile will be merged into the main configuration file, so it basically overrides the default values. Note that configuration items in this file can still be overridden with environment variables. This configuration item is useful if you don’t want to change the main configuration file (or if you cannot do so) but still want to set different configuration values. Use it if you run Steep in a Docker container and bind mount the overrideConfigFile as a volume.
steep.services
The path to the configuration files containing service metadata. Either a string pointing to a single file, a glob pattern (e.g. **/*.yaml), or an array of files or glob patterns.
steep.macros
The path to the configuration file(s) containing macros. Either a string pointing to a single file, a glob pattern (e.g. **/*.yaml), or an array of files or glob patterns.
steep.plugins
The path to the configuration file(s) containing plugin descriptors. Either a string pointing to a single file, a glob pattern (e.g. **/*.yaml), or an array of files or glob patterns.

Cluster settings

Use these configuration items to build up a cluster of Steep instances. Under the hood, Steep uses Vert.x and Hazelcast, so these configuration items are very similar to the ones found in these two frameworks. To build up a cluster, you need to configure an event bus connection and a cluster connection. They should use different ports. host typically refers to the machine your instance is running on and publicHost or publicAddress specify the hostname or IP address that your Steep instance will use in your network to advertise itself so that other instances can connect to it.

For more information, please read the documentation of Vert.x and Hazelcast.

steep.cluster.eventBus.host

The IP address (or hostname) to bind the clustered eventbus to

Default: Automatically detected local network interface

steep.cluster.eventBus.port

The port the clustered eventbus should listen on

Default: A random port

steep.cluster.eventBus.publicHost

The IP address (or hostname) the eventbus uses to announce itself within in the cluster

Default: Same as steep.cluster.eventBus.host

steep.cluster.eventBus.publicPort

The port that the eventbus uses to announce itself within in the cluster

Default: Same as steep.cluster.eventBus.port

steep.cluster.hazelcast.clusterName

An optional cluster name that can be used to separate clusters of Steep instances. Two instances from different clusters (with different names) cannot connect to each other.

By default, no cluster name is set, which means all instances can connect to each other. However, a Steep instance without a cluster name cannot connect to a named cluster.

Heads up: if you have a cluster name set and you’re using a cloud connection to deploy remote agents on demand, make sure these Steep instances use the same cluster name. Otherwise, you won’t be able to connect to them.

steep.cluster.hazelcast.publicAddress

The IP address (or hostname) and port Hazelcast uses to announce itself within in the cluster

steep.cluster.hazelcast.port

The port that Hazelcast should listen on

steep.cluster.hazelcast.interfaces

A list of IP address patterns specifying valid interfaces Hazelcast should bind to

steep.cluster.hazelcast.members

A list of IP addresses (or hostnames) of Hazelcast cluster members

steep.cluster.hazelcast.tcpEnabled

true if Hazelcast should use TCP to connect to other instances, false if it should use multicast

Default: false

steep.cluster.hazelcast.placementGroupName

An optional name specifying in which group this Hazelcast member should be placed. Steep uses distributed maps to share data between instances. Data in these maps is partitioned (i.e. distributed to the individual cluster members). In a large cluster, no member keeps all the data. Most nodes only keep a small fraction of the data (a partition).

To make sure data is not lost if a member goes down, Hazelcast uses backups to distribute copies of the data across the cluster. By specifying a placement group, you can control how Hazelcast distributes these backups. Hazelcast will always prefer creating backups in a group that does not own the data so that if all members of a group go down, the other group still has all the backup data.

Examples for sensible groups are racks, data centers, or availability zones.

For more information, see the following links:

Note that if you configure a placement group name, all members in your cluster must also have a placement group name. Otherwise, you will receive an exception about mismatching configuration on startup.

steep.cluster.hazelcast.liteMember

true if this instance should be a Hazelcast lite member. Lite members do not own any in-memory data. They are mainly used for compute-intensive tasks. With regard to Steep, an instance with a controller and a scheduler should not be a lite member, because these components heavily rely on internal state. A Steep instance that only contains an agent and therefore only executes services, however, could be a lite member. See the architecture section for more information about these components.

Your cluster cannot consist of only lite members. Otherwise, it is not able to maintain internal state at all.

Note that since lite members cannot keep data, they are not suitable to keep backups either. See steep.cluster.hazelcast.placementGroupName for more information. For reasons of reliability, a cluster should contain at least three full (i.e. non-lite) members.

steep.cluster.lookupOrphansInterval

The interval at which Steep’s main thread looks for orphaned entries in its internal remote agent registry (specified as a duration). Such entries may (very rarely) happen if there is a network failure during deregistration of an agent. You normally do not have to change this configuration.

Default: 5m

steep.cluster.hazelcast.restoreMembersOnStartup.enabled

true if Steep should try to load IP addresses of possibly still running VMs from its database during startup and add them to steep.cluster.hazelcast.members. This is useful if a Steep instance has crashed and should be reintegrated into an existing cluster when it’s back.

Default: false

steep.cluster.hazelcast.restoreMembersOnStartup.defaultPort

If steep.cluster.hazelcast.restoreMembersOnStartup.enabled is true, potential Hazelcast cluster members will be restored from database. This configuration item specifies on which Hazelcast port these members are listening.

steep.cluster.hazelcast.splitBrainProtection.enabled

true if split-brain protection should be enabled. This mechanism makes sure the cluster is only able to operate if there are at least n members, where n is defined by steep.cluster.hazelcast.splitBrainProtection.minClusterSize. If there are less than n members, Steep instances in the cluster will not be able to access cluster-wide data structures and stop to operate until the issue has been resolved.

This mechanism protects against so-called split-brain situations where one part of the cluster loses connection to another part, and the cluster is therefore split into different partitions. If one partition becomes too small, it should stop operating to avoid doing any harm.

See the Hazelcast documentation for more information.

Default: false

steep.cluster.hazelcast.splitBrainProtection.minClusterSize

The minimum number of members the cluster must have to be able operate if split-brain protection is enabled.

Recommendation: Your cluster should have an odd number of members. The minimum cluster size should be even and represent the majority of your cluster. For example, if your cluster has 7 nodes, set this value to 4. This makes sure that when a split-brain situation happens, the majority of your cluster will be able to continue operating while the smaller part will stop.

This configuration item does not have a default value. It must be set if steep.cluster.hazelcast.splitBrainProtection.enable equals true.

steep.cluster.hazelcast.splitBrainProtection.gracefulStartup

true if the split-brain protection mechanism should only start to be in effect once the cluster has reached its minimum size. This allows the cluster to startup gracefully even if the member count is temporarily lower than the defined minimum.

Default: true

steep.cluster.hazelcast.splitBrainProtection.exitProcessAfter

An optional timeout (specified as a duration) defining how long a Steep instance may keep running after a split-brain situation has been detected. When the timeout is reached and the split-brain situation has not been resolved in the meantime, the Steep instance shuts itself down with exit code 16. This mechanism can be used to prevent a Steep instance from doing any harm when it is in a split-brain situation.

HTTP configuration

steep.http.enabled

true if the HTTP interface should be enabled

Default: true

steep.http.host

The host to bind the HTTP server to

Default: localhost

steep.http.port

The port the HTTP server should listen on

Default: 8080

steep.http.postMaxSize

The maximum size of HTTP POST bodies in bytes

Default: 1048576 (1 MB)

steep.http.basePath

The path where the HTTP endpoints and the web-based user interface should be mounted

Default: "" (empty string, i.e. no base path)

steep.http.allowRoutes

A regular expression specifying a whitelist of enabled HTTP routes. Non-matching routes will be disabled. For example, the expression /processchains.* enables all endpoints starting with /processchains but disables all others.

Default: .* (all routes are enabled)

steep.http.cors.enable

true if Cross-Origin Resource Sharing (CORS) should be enabled

Default: false

steep.http.cors.allowOrigin

A regular expression specifying allowed CORS origins. Use *​ to allow all origins.

Default: "$." (match nothing by default)

steep.http.cors.allowCredentials

true if the Access-​Control-​Allow-​Credentials response header should be returned.

Default: false

steep.http.cors.allowHeaders

A string or an array indicating which header field names can be used in a request.

steep.http.cors.allowMethods

A string or an array indicating which HTTP methods can be used in a request.

steep.http.cors.exposeHeaders

A string or an array indicating which headers are safe to expose to the API of a CORS API specification.

steep.http.cors.maxAgeSeconds

The number of seconds the results of a preflight request can be cached in a preflight result cache.

Controller configuration

steep.controller.enabled

true if the controller should be enabled. Set this value to false if your Steep instance does not have access to the shared database.

Default: true

steep.controller.lookupInterval

The interval at which the controller looks for accepted submissions, specified as a duration.

Default: 2s

steep.controller.lookupMaxErrors

The maximum number of consecutive errors (e.g. database connection issues) to tolerate when looking up the status of process chains of a running submission. If there are more errors, the submission will be aborted.

Default: 5

steep.controller.lookupOrphansInterval

The interval at which the controller looks for orphaned running submissions (i.e. submissions that are in the status RUNNING but that are currently not being processed by any controller instance), specified as a duration. If Steep finds such a submission it will try to resume it.

Default: 5m

steep.controller.lookupOrphansInitialDelay

The time the controller should wait after startup before it looks for orphaned running submissions for the first time (specified as a duration). This property is useful if you want to implement a rolling update from one Steep instance to another.

Default: 0s

Scheduler configuration

steep.scheduler.enabled

true if the scheduler should be enabled. Set this value to false if your Steep instance does not have access to the shared database.

Default: true

steep.scheduler.lookupInterval

The interval at which the scheduler looks for registered process chains, specified as a duration.

Default: 20s

steep.scheduler.lookupOrphansInterval

The interval at which the scheduler looks for orphaned running process chains (i.e. process chains that are in the status RUNNING but that are currently not being processed by any scheduler instance), specified as a duration. Note that the scheduler also always looks for orphaned process chains when it detects that another scheduler instance has just left the cluster (regardless of the configured interval).

Default: 5m

steep.scheduler.lookupOrphansInitialDelay

The time the scheduler should wait after startup before it looks for orphaned running process chains for the first time (specified as a duration). This property is useful if you want to implement a rolling update from one Steep instance to another. Note that the scheduler also looks for orphaned process chains when another scheduler instance has just left the cluster, even if the initial delay has not passed by yet.

Default: 0s

Agent configuration

steep.agent.enabled

true if this Steep instance should be able to execute process chains (i.e. if one or more agents should be deployed)

Default: true

steep.agent.instances

The number of agents that should be deployed within this Steep instance (i.e. how many executables the Steep instance can run in parallel)

Default: 1

steep.agent.id

Unique identifier for the first agent instance deployed. Subsequent agent instances will have a consecutive number appended to their IDs.

Default: (an automatically generated unique ID)

steep.agent.capabilities

List of capabilities that the agents provide

Default: [] (empty list)

steep.agent.autoShutdownTimeout

The time any agent instance can remain idle until Steep shuts itself down gracefully (specified as a duration). By default, this value is 0s, which means Steep never shuts itself down.

Default: 0s

steep.agent.busyTimeout

The time that should pass before an idle agent decides that it is not busy anymore (specified as a duration). Normally, the scheduler allocates an agent, sends it a process chain, and then deallocates it after the process chain execution has finished. This value is important if the scheduler crashes while the process chain is being executed and does not deallocate the agent anymore. In this case, the agent deallocates itself after the configured time has passed.

Default: 1m

steep.agent.outputLinesToCollect

The number of output lines to collect at most from each executed service (also applies to error output)

Default: 100

Runtime settings

steep.runtimes.docker.env

Additional environment variables that will be passed to containers created by the Docker runtime

Example: ["key=value", "foo=bar"]

Default: [] (empty list)

steep.runtimes.docker.volumes

Additional volume mounts to be passed to the Docker runtime

Example: ["/data:/data"]

Default: [] (empty list)

Database connection

steep.db.driver

The database driver

Valid values: inmemory, postgresql, mongodb

Default: inmemory

steep.db.url

The database URL

steep.db.username

The database username (only used by the postgresql driver)

steep.db.password

The database password (only used by the postgresql driver)

steep.db.connectionPool.maxSize

The maximum number of connections to keep open (i.e. to keep in the connection pool)

steep.db.connectionPool.maxIdleTime

The maximum time an idle connection should be kept in the connection pool before it is closed

Cloud connection

steep.cloud.enabled

true if Steep should connect to a cloud to acquire remote agents on demand

Default: false

steep.cloud.driver

Defines which cloud driver to use

Valid values: openstack (see the OpenStack cloud driver for more information)

steep.cloud.createdByTag

A metadata tag that should be attached to virtual machines to indicate that they have been created by Steep

steep.cloud.syncInterval

The time that should pass before the cloud manager synchronizes its internal state with the cloud again, specified as a duration.

Default: 2m

steep.cloud.keepAliveInterval

The time that should pass before the cloud manager sends keep-alive messages to a minimum of remote agents again (so that they do not shut down themselves), specified as a duration. See minVMs property of the setups data model.

Default: 30s

steep.cloud.setups.file

The path to the file that describes all available setups. See setups.yaml.

steep.cloud.setups.creation.retries

A retry policy that specifies how many attempts should be made to create a VM from a certain setup (if creation fails) as well as possible (exponential) delays between those attempts.

Default:
retries:
  maxAttempts: 5
  delay: 40s
  exponentialBackoff: 2
steep.cloud.setups.lockAfterRetries

When the maximum number of attempts to create a VM from a certain setup has been reached (see steep.cloud.setups.creation.retries), the setup will be locked and no other VM with this setup will be created. This parameter defines how long it will be locked, specified as a duration.

Default: 20m

steep.cloud.timeouts.sshReady

The maximum time the cloud manager should try to log in to a new VM via SSH (specified as a duration). The cloud manager will make a login attempt every 2 seconds until it is successful or until the maximum number of seconds have passed, in which case it will destroy the VM.

Default: 5m

steep.cloud.timeouts.agentReady

The maximum time the cloud manager should wait for an agent on a new VM to become available (i.e. how long a new Steep instance may take to register with the cluster) before it destroys the VM again (specified as a duration).

Default: 5m

steep.cloud.timeouts.createVM

The maximum time that creating a VM may take before it is aborted with an error (specified as a duration).

Default: 5m

steep.cloud.timeouts.destroyVM

The maximum time that destroying a VM may take before it is aborted with an error (specified as a duration).

Default: 5m

steep.cloud.timeouts.provisioning

The maximum time each individual provisioning step (i.e. executing a provisioning script or uploading files) may take before it is aborted. Running provisioning commands will be killed after this timeout regardless of whether they are still active or not. This value is specified as a duration.

Default: 10m

steep.cloud.agentPool

An array of agent pool parameters describing how many remote agents the cloud manager should keep in its pool how many it is allowed to create for each given set of capabilities.

Default: [] (empty list)

OpenStack cloud driver

steep.cloud.openstack.endpoint

OpenStack authentication endpoint

steep.cloud.openstack.username

OpenStack username used for authentication

steep.cloud.openstack.password

OpenStack password used for authentication

steep.cloud.openstack.domainName

OpenStack domain name used for authentication

steep.cloud.openstack.projectId

The ID of the OpenStack project to which to connect. Either this configuration item or steep.cloud.openstack.projectName must be set but not both at the same time.

steep.cloud.openstack.projectName

The name of the OpenStack project to which to connect. This configuration item will be used in combination with steep.cloud.openstack.domainName if steep.cloud.openstack.projectId is not set.

steep.cloud.openstack.networkId

The ID of the OpenStack network to attach new VMs to

steep.cloud.openstack.usePublicIp

true if new VMs should have a public IP address

Default: false

steep.cloud.openstack.securityGroups

The OpenStack security groups that should be attached to new VMs.

Default: [] (empty list)

steep.cloud.openstack.keypairName

The name of the keypair to deploy to new VMs. The keypair must already exist in OpenStack.

SSH connection to VMs

steep.cloud.ssh.username

Username for SSH access to VMs. Can be overridden by the sshUsername property in each setup. May even be null if all setups define their own username.

steep.cloud.ssh.privateKeyLocation

Location of a private key to use for SSH

Log configuration

steep.logs.level

The default log level for all loggers (console as well as file-based)

Valid values: TRACE, DEBUG, INFO, WARN, ERROR, OFF.

Default: DEBUG

steep.logs.main.enabled

true if logging to the main log file should be enabled

Default: false

steep.logs.main.logFile

The name of the main log file

Default: logs/steep.log

steep.logs.main.dailyRollover.enabled

true if main log files should be renamed every day. The file name will be based on steep.logs.main.logFile and the file’s date in the form YYYY-MM-DD (e.g. steep.2020-11-19.log)

Default: true

steep.logs.main.dailyRollover.maxDays

The maximum number of days’ worth of main log files to keep

Default: 7

steep.logs.main.dailyRollover.maxSize

The total maximum size of all main log files in bytes. Oldest log files will deleted when this size is reached.

Default: 104857600 (100 MB)

steep.logs.processChains.enabled

true if the output of process chains should be logged separately to disk. The output will still also appear on the console and in the main log file (if enabled), but there, it’s not separated by process chain. This feature is useful if you want to record the output of individual process chains and make it available through the process chain logs endpoint.

Default: false

steep.logs.processChains.path

The path where process chain logs will be stored. Individual files will will be named after the ID of the corresponding process chain (e.g. aprsqz6d5f4aiwsdzbsq.log). If a process chain has been executed more than once (for example, due to a retry), the file name will include the run number (e.g. aprsqz6d5f4aiwsdzbsq.2.log).

Default: logs/processchains

steep.logs.processChains.groupByPrefix

Set this configuration item to a value greater than 0 to group process chain log files by prefix in subdirectories under the directory configured through steep.logs.processChains.path. For example, if this configuration item is set to 3, Steep will create a separate subdirectory for all process chains whose ID starts with the same three characters. The name of this subdirectory will be these three characters. The process chains apomaokjbk3dmqovemwa and apomaokjbk3dmqovemsq will be put into a subdirectory called apo, and the process chain ao344a53oyoqwhdelmna will be put into ao3. Note that in practice, 3 is a reasonable value, which will create a new directory about every day. A value of 0 disables grouping.

Default: 0

Garbage collector configuration

steep.garbageCollector.enabled

true if the garbage collector should be enabled. The garbage collector runs in the background and removes outdated objects from the database at the interval specified with steep.garbageCollector.cron

Default: false

steep.garbageCollector.cron

A UNIX-like cron expression specifying the interval at which the garbage collector should be executed. Cron expressions consist of six required fields and one optional field separated by a white space:

SECONDS MINUTES HOURS DAY-OF-MONTH MONTH DAY-OF-WEEK [YEAR].

Use an asterisk * to specify all values (e.g. every second or every minute). Use a question mark ? for DAY-OF-MONTH or DAY-OF-WEEK to specify no value (only one of DAY-OF-MONTH or DAY-OF-WEEK can be specified at the same time). Use a slash / to specify increments (e.g. */5 for every 5 minutes).

More information about the format can be found in the javadoc of the org.quartz.CronExpression class.

Example: 0 0 0 * * ? (daily at 12am)

steep.garbageCollector.retention.submissions

The maximum time a submission should be kept in the database after it has finished (regardless of whether it was successful or not). The time can be specified as a human-readable duration.

Default: null (submissions will be kept indefinitely)

steep.garbageCollector.retention.vms

The maximum time a VM should be kept in the database after it has been destroyed (regardless of its status). The time can be specified as a human-readable duration.

Default: null (VMs will be kept indefinitely)

Cache configuration

steep.cache.plugins.enabled

true if the persistent compiled plugin cache should be enabled. Steep updates this cache on startup when it has first compiled a plugin script or when it detects that a previously compiled script has changed. On subsequent startups, Steep can utilize the cache to skip compilation of known plugins and, therefore, to reduce startup time.

Default: false

steep.cache.plugins.path

The path to a directory where Steep should store compiled plugin scripts if the persistent compiled plugin cache is enabled (see steep.cache.plugins.enabled).

Default: .cache/plugins

Agent pool parameters

Steep’s cloud manager component is able to create virtual machines and deploy remote agent instances to it. The cloud manager keeps every remote agent created in a pool. Use agent pool parameters to define a minimum and maximum number of instances per provided capability set.

PropertyTypeDescription
capabilities
(required)
arrayA set of strings spec­i­fy­ing ca­pa­bil­i­ties that a remote agent must provide so these parameters apply to it
min
(optional)
numberAn optional minimum number of remote agents that the cloud manager should create with the given capabilities
max
(optional)
numberAn optional maximum number of remote agents that the cloud manager is allowed to create with the given capabilities
Example
capabilities:
  - docker
  - python
min: 1
max: 5