steep.yaml contains the main configuration of Steep. This page
describes all configuration keys and values you can set.
Note that keys are specified using the dot notation. You can use them as they are given here or use YAML notation instead. For example, the following configuration item
is identical to:
You may override items in your configuration file with environment variables.
This is particularly useful if you are using Steep inside a Docker container.
The environment variables use a slightly different naming scheme. All variables
are in capital letters and dots are replaced by underscores. For example,
the configuration key
You may use YAML syntax to specify environment variable values. For example,
steep.agent.capabilities can be specified as follows:
- The path to a directory where temporary files should be stored during processing. Steep generates names for the outputs of execute actions in a workflow. If the
storeflag of an output parameter is
false(which is the default), the generated filename will be relative to this temporary directory.
- The path to a directory where output files should be stored. This path will be used instead of
steep.tmpPathto generate a filename for an output parameter if its
- The path to a file that keeps additional configuration. The values of the
overrideConfigFilewill be merged into the main configuration file, so it basically overrides the default values. Note that configuration items in this file can still be overridden with environment variables. This configuration item is useful if you don’t want to change the main configuration file (or if you cannot do so) but still want to set different configuration values. Use it if you run Steep in a Docker container and bind mount the
overrideConfigFileas a volume.
- The path to the configuration files containing service metadata. Either a string pointing to a single file, a glob pattern (e.g.
**/*.yaml), or an array of files or glob patterns.
- The path to the configuration file(s) containing macros. Either a string pointing to a single file, a glob pattern (e.g.
**/*.yaml), or an array of files or glob patterns.
- The path to the configuration file(s) containing plugin descriptors. Either a string pointing to a single file, a glob pattern (e.g.
**/*.yaml), or an array of files or glob patterns.
Use these configuration items to build up a cluster of Steep instances. Under
the hood, Steep uses Vert.x and Hazelcast,
so these configuration items are very similar to the ones found in these two
frameworks. To build up a cluster, you need to configure an event bus connection
and a cluster connection. They should use different ports.
refers to the machine your instance is running on and
publicAddress specify the hostname or IP address that your Steep instance will
use in your network to advertise itself so that other instances can connect to
The IP address (or hostname) to bind the clustered eventbus to
Default: Automatically detected local network interface
The port the clustered eventbus should listen on
Default: A random port
The IP address (or hostname) the eventbus uses to announce itself within in the cluster
Default: Same as
The port that the eventbus uses to announce itself within in the cluster
Default: Same as
An optional cluster name that can be used to separate clusters of Steep instances. Two instances from different clusters (with different names) cannot connect to each other.
By default, no cluster name is set, which means all instances can connect to each other. However, a Steep instance without a cluster name cannot connect to a named cluster.
Heads up: if you have a cluster name set and you’re using a cloud connection to deploy remote agents on demand, make sure these Steep instances use the same cluster name. Otherwise, you won’t be able to connect to them.
The IP address (or hostname) and port Hazelcast uses to announce itself within in the cluster
The port that Hazelcast should listen on
A list of IP address patterns specifying valid interfaces Hazelcast should bind to
A list of IP addresses (or hostnames) of Hazelcast cluster members
trueif Hazelcast should use TCP to connect to other instances,
falseif it should use multicast
An optional name specifying in which group this Hazelcast member should be placed. Steep uses distributed maps to share data between instances. Data in these maps is partitioned (i.e. distributed to the individual cluster members). In a large cluster, no member keeps all the data. Most nodes only keep a small fraction of the data (a partition).
To make sure data is not lost if a member goes down, Hazelcast uses backups to distribute copies of the data across the cluster. By specifying a placement group, you can control how Hazelcast distributes these backups. Hazelcast will always prefer creating backups in a group that does not own the data so that if all members of a group go down, the other group still has all the backup data.
Examples for sensible groups are racks, data centers, or availability zones.
For more information, see the following links:
Note that if you configure a placement group name, all members in your cluster must also have a placement group name. Otherwise, you will receive an exception about mismatching configuration on startup.
trueif this instance should be a Hazelcast lite member. Lite members do not own any in-memory data. They are mainly used for compute-intensive tasks. With regard to Steep, an instance with a controller and a scheduler should not be a lite member, because these components heavily rely on internal state. A Steep instance that only contains an agent and therefore only executes services, however, could be a lite member. See the architecture section for more information about these components.
Your cluster cannot consist of only lite members. Otherwise, it is not able to maintain internal state at all.
Note that since lite members cannot keep data, they are not suitable to keep backups either. See
steep.cluster.hazelcast.placementGroupNamefor more information. For reasons of reliability, a cluster should contain at least three full (i.e. non-lite) members.
The interval at which Steep’s main thread looks for orphaned entries in its internal remote agent registry (specified as a duration). Such entries may (very rarely) happen if there is a network failure during deregistration of an agent. You normally do not have to change this configuration.
trueif Steep should try to load IP addresses of possibly still running VMs from its database during startup and add them to
steep.cluster.hazelcast.members. This is useful if a Steep instance has crashed and should be reintegrated into an existing cluster when it’s back.
true, potential Hazelcast cluster members will be restored from database. This configuration item specifies on which Hazelcast port these members are listening.
trueif split-brain protection should be enabled. This mechanism makes sure the cluster is only able to operate if there are at least
nis defined by
steep.cluster.hazelcast.splitBrainProtection.minClusterSize. If there are less than
nmembers, Steep instances in the cluster will not be able to access cluster-wide data structures and stop to operate until the issue has been resolved.
This mechanism protects against so-called split-brain situations where one part of the cluster loses connection to another part, and the cluster is therefore split into different partitions. If one partition becomes too small, it should stop operating to avoid doing any harm.
See the Hazelcast documentation for more information.
The minimum number of members the cluster must have to be able operate if split-brain protection is enabled.
Recommendation: Your cluster should have an odd number of members. The minimum cluster size should be even and represent the majority of your cluster. For example, if your cluster has 7 nodes, set this value to 4. This makes sure that when a split-brain situation happens, the majority of your cluster will be able to continue operating while the smaller part will stop.
This configuration item does not have a default value. It must be set if
trueif the split-brain protection mechanism should only start to be in effect once the cluster has reached its minimum size. This allows the cluster to startup gracefully even if the member count is temporarily lower than the defined minimum.
An optional timeout (specified as a duration) defining how long a Steep instance may keep running after a split-brain situation has been detected. When the timeout is reached and the split-brain situation has not been resolved in the meantime, the Steep instance shuts itself down with exit code 16. This mechanism can be used to prevent a Steep instance from doing any harm when it is in a split-brain situation.
trueif the HTTP interface should be enabled
The host to bind the HTTP server to
The port the HTTP server should listen on
The maximum size of HTTP POST bodies in bytes
Default: 1048576 (1 MB)
The path where the HTTP endpoints and the web-based user interface should be mounted
""(empty string, i.e. no base path)
A regular expression specifying a whitelist of enabled HTTP routes. Non-matching routes will be disabled. For example, the expression
/processchains.*enables all endpoints starting with
/processchainsbut disables all others.
.*(all routes are enabled)
trueif Cross-Origin Resource Sharing (CORS) should be enabled
A regular expression specifying allowed CORS origins. Use
* to allow all origins.
"$."(match nothing by default)
Access-Control-Allow-Credentialsresponse header should be returned.
A string or an array indicating which header field names can be used in a request.
A string or an array indicating which HTTP methods can be used in a request.
A string or an array indicating which headers are safe to expose to the API of a CORS API specification.
The number of seconds the results of a preflight request can be cached in a preflight result cache.
trueif the controller should be enabled. Set this value to
falseif your Steep instance does not have access to the shared database.
The interval at which the controller looks for accepted submissions, specified as a duration.
The maximum number of consecutive errors (e.g. database connection issues) to tolerate when looking up the status of process chains of a running submission. If there are more errors, the submission will be aborted.
The interval at which the controller looks for orphaned running submissions (i.e. submissions that are in the status
RUNNINGbut that are currently not being processed by any controller instance), specified as a duration. If Steep finds such a submission it will try to resume it.
The time the controller should wait after startup before it looks for orphaned running submissions for the first time (specified as a duration). This property is useful if you want to implement a rolling update from one Steep instance to another.
trueif the scheduler should be enabled. Set this value to
falseif your Steep instance does not have access to the shared database.
The interval at which the scheduler looks for registered process chains, specified as a duration.
The interval at which the scheduler looks for orphaned running process chains (i.e. process chains that are in the status
RUNNINGbut that are currently not being processed by any scheduler instance), specified as a duration. Note that the scheduler also always looks for orphaned process chains when it detects that another scheduler instance has just left the cluster (regardless of the configured interval).
The time the scheduler should wait after startup before it looks for orphaned running process chains for the first time (specified as a duration). This property is useful if you want to implement a rolling update from one Steep instance to another. Note that the scheduler also looks for orphaned process chains when another scheduler instance has just left the cluster, even if the initial delay has not passed by yet.
trueif this Steep instance should be able to execute process chains (i.e. if one or more agents should be deployed)
The number of agents that should be deployed within this Steep instance (i.e. how many executables the Steep instance can run in parallel)
Unique identifier for the first agent instance deployed. Subsequent agent instances will have a consecutive number appended to their IDs.
Default: (an automatically generated unique ID)
List of capabilities that the agents provide
The time any agent instance can remain idle until Steep shuts itself down gracefully (specified as a duration). By default, this value is
0s, which means Steep never shuts itself down.
The time that should pass before an idle agent decides that it is not busy anymore (specified as a duration). Normally, the scheduler allocates an agent, sends it a process chain, and then deallocates it after the process chain execution has finished. This value is important if the scheduler crashes while the process chain is being executed and does not deallocate the agent anymore. In this case, the agent deallocates itself after the configured time has passed.
The number of output lines to collect at most from each executed service (also applies to error output)
Additional environment variables that will be passed to containers created by the Docker runtime
Additional volume mounts to be passed to the Docker runtime
The database driver
The database URL
The database username (only used by the
The database password (only used by the
The maximum number of connections to keep open (i.e. to keep in the connection pool)
The maximum time an idle connection should be kept in the connection pool before it is closed
trueif Steep should connect to a cloud to acquire remote agents on demand
Defines which cloud driver to use
openstack(see the OpenStack cloud driver for more information)
A metadata tag that should be attached to virtual machines to indicate that they have been created by Steep
The time that should pass before the cloud manager synchronizes its internal state with the cloud again, specified as a duration.
The time that should pass before the cloud manager sends keep-alive messages to a minimum of remote agents again (so that they do not shut down themselves), specified as a duration. See
minVMsproperty of the setups data model.
When the maximum number of attempts to create a VM from a certain setup has been reached (see
steep.cloud.setups.creation.retries), the setup will be locked and no other VM with this setup will be created. This parameter defines how long it will be locked, specified as a duration.
The maximum time the cloud manager should try to log in to a new VM via SSH (specified as a duration). The cloud manager will make a login attempt every 2 seconds until it is successful or until the maximum number of seconds have passed, in which case it will destroy the VM.
The maximum time the cloud manager should wait for an agent on a new VM to become available (i.e. how long a new Steep instance may take to register with the cluster) before it destroys the VM again (specified as a duration).
The maximum time that creating a VM may take before it is aborted with an error (specified as a duration).
The maximum time that destroying a VM may take before it is aborted with an error (specified as a duration).
An array of agent pool parameters describing how many remote agents the cloud manager should keep in its pool how many it is allowed to create for each given set of capabilities.
OpenStack cloud driver
OpenStack authentication endpoint
OpenStack username used for authentication
OpenStack password used for authentication
OpenStack domain name used for authentication
The ID of the OpenStack project to which to connect. Either this configuration item or
steep.cloud.openstack.projectNamemust be set but not both at the same time.
The name of the OpenStack project to which to connect. This configuration item will be used in combination with
steep.cloud.openstack.projectIdis not set.
The ID of the OpenStack network to attach new VMs to
trueif new VMs should have a public IP address
The OpenStack security groups that should be attached to new VMs.
The name of the keypair to deploy to new VMs. The keypair must already exist in OpenStack.
SSH connection to VMs
Username for SSH access to VMs. Can be overridden by the
sshUsernameproperty in each setup. May even be
nullif all setups define their own username.
Location of a private key to use for SSH
The default log level for all loggers (console as well as file-based)
trueif logging to the main log file should be enabled
The name of the main log file
trueif main log files should be renamed every day. The file name will be based on
steep.logs.main.logFileand the file’s date in the form
The maximum number of days’ worth of main log files to keep
The total maximum size of all main log files in bytes. Oldest log files will deleted when this size is reached.
Default: 104857600 (100 MB)
trueif the output of process chains should be logged separately to disk. The output will still also appear on the console and in the main log file (if enabled), but there, it’s not separated by process chain. This feature is useful if you want to record the output of individual process chains and make it available through the process chain logs endpoint.
The path where process chain logs will be stored. Individual files will will be named after the ID of the corresponding process chain (e.g.
aprsqz6d5f4aiwsdzbsq.log). If a process chain has been executed more than once (for example, due to a retry), the file name will include the run number (e.g.
Set this configuration item to a value greater than
0to group process chain log files by prefix in subdirectories under the directory configured through
steep.logs.processChains.path. For example, if this configuration item is set to
3, Steep will create a separate subdirectory for all process chains whose ID starts with the same three characters. The name of this subdirectory will be these three characters. The process chains
apomaokjbk3dmqovemsqwill be put into a subdirectory called
apo, and the process chain
ao344a53oyoqwhdelmnawill be put into
ao3. Note that in practice,
3is a reasonable value, which will create a new directory about every day. A value of
Garbage collector configuration
trueif the garbage collector should be enabled. The garbage collector runs in the background and removes outdated objects from the database at the interval specified with
A UNIX-like cron expression specifying the interval at which the garbage collector should be executed. Cron expressions consist of six required fields and one optional field separated by a white space:
SECONDS MINUTES HOURS DAY-OF-MONTH MONTH DAY-OF-WEEK [YEAR].
Use an asterisk
*to specify all values (e.g. every second or every minute). Use a question mark
DAY-OF-WEEKto specify no value (only one of
DAY-OF-WEEKcan be specified at the same time). Use a slash
/to specify increments (e.g.
*/5for every 5 minutes).
More information about the format can be found in the javadoc of the
0 0 0 * * ?(daily at 12am)
null(submissions will be kept indefinitely)
null(VMs will be kept indefinitely)
trueif the persistent compiled plugin cache should be enabled. Steep updates this cache on startup when it has first compiled a plugin script or when it detects that a previously compiled script has changed. On subsequent startups, Steep can utilize the cache to skip compilation of known plugins and, therefore, to reduce startup time.
The path to a directory where Steep should store compiled plugin scripts if the persistent compiled plugin cache is enabled (see
Agent pool parameters
Steep’s cloud manager component is able to create virtual machines and deploy remote agent instances to it. The cloud manager keeps every remote agent created in a pool. Use agent pool parameters to define a minimum and maximum number of instances per provided capability set.