Data models

Process chains

As described in the section on workflow scheduling, Steep transforms a workflow to one or more process chains. A process chain is a sequential list of instructions that will be sent to Steep’s remote agents to execute processing services in a distributed environment.

Some of the properties specified in the table below are only available once Steep has started executing a process chain (e.g. startTime) or after the execution has finished (e.g. endTime).

Also, the /processchains HTTP endpoint, which provides a list of process chains, omits some properties although they are marked as required in the table below (e.g. executables or totalRuns). If you want to get all required properties, you have to use the /processchains/:id HTTP endpoint.

Steep records each execution of a process chain in a separate ‘run’. The property totalRuns specifies how often a process chain has been executed (including any currently running execution). If a process chain has just been created and still has the status REGISTERED, totalRuns equals 0, but as soon as the status switches to RUNNING, a new run is created and totalRuns is incremented to 1. If a process chain fails and is later retried, for example, a new run will be created and totalRuns will be 2, etc.

Requesting a process chain through the HTTP endpoints /processchains or /processchains/:id, always renders the latest run. The properties agentId, startTime, endTime, status, errorMessage, and autoResumeAfter depend on the actual run rendered (e.g. different runs have different start times; or one run might have failed, while a newer one might have succeeded or is still running, so their statuses are different). If you want to list all runs of a process chain or retrieve information about a specific run, use the /processchains/:id/runs or /processchains/:id/runs/:runNumber HTTP endpoints, respectively. The property runNumber from the table below specifies, which run out of totalRuns is rendered.

Property	Type	Description
id (required)	string	Unique process chain identifier
executables (required)	array	A list of executable objects that describe what processing services should be called and with which arguments
submissionId (required)	string	The ID of the submission to which this process chain belongs
agentId (optional)	string	The ID of the agent that currently executes the process chain (if its `status` is `RUNNING`) or has executed it (if it is finished). May be `null` if the execution has not started yet.
startTime (optional)	string	An ISO 8601 timestamp denoting the date and time when the process chain execution was started. May be `null` if the execution has not started yet.
endTime (optional)	string	An ISO 8601 timestamp denoting the date and time when the process chain execution finished. May be `null` if the execution has not finished yet.
status (required)	string	The current status of the process chain
requiredCapabilities (optional)	array	A set of strings specifying capabilities a host system must provide to be able to execute this process chain. See also setups.
priority (optional)	number	A priority used during scheduling. Process chains with higher priorities will be scheduled before those with lower priorities. Negative values are allowed. The default value is `0`.
retries (optional)	object	An optional retry policy specifying how often this process chain will be rescheduled in case an error has occurred.
results (optional)	object	If `status` is `SUCCESS`, this property contains the list of process chain result files grouped by their output variable ID. Otherwise, it is `null`.
totalRuns (required)	number	The number of times the process chain has been executed (including any currently running execution).
runNumber (optional)	number	The number of the run currently rendered. May be `null` if the process chain has not been executed yet.
autoResumeAfter (optional)	string	If the process chain’s status is `PAUSED`, this optional property may specify a point in time (as an ISO 8601 timestamp) after which the process chain will be automatically resumed. It is typically only given, if a retry policy is configured (see `retries` property): if a process chain run has failed and there are still attempts left, `autoResumeAfter` specifies when the next attempt will be performed.
estimatedProgress (optional)	number	A floating point number between `0.0` (0%) and `1.0` (100%) indicating the current execution progress of this process chain. This property will only be provided if the process chain is currently being executed (i.e. if its `status` equals `RUNNING`) and if a progress could actually be estimated. Note that the value is an estimation based on various factors and does not have to represent the real progress. More precise values can be calculated with a progress estimator plugin. Sometimes, progress cannot be estimated at all. In this case, the value will be `null`.
errorMessage (optional)	string	If `status` is `ERROR`, this property contains a human-readable error message. Otherwise, it is `null`.

Example

id: akpm646jjigral4cdyyq
submissionId: akpm6yojjigral4cdxgq
startTime: '2020-05-18T08:44:19.221456Z'
endTime: '2020-05-18T08:44:19.446437Z'
status: SUCCESS
agentId: bakwqka7gk2vrjnxdo5a
requiredCapabilities:
  - nodejs
executables:
  - id: ayj5kegaxngbglzlxibq
    path: ./countdown.js
    serviceId: countdown
    runtime: other
    arguments:
      - id: input
        type: input
        dataType: file
        variable:
          id: input_file
          value: input.txt
      - id: output
        type: output
        dataType: fileOrEmptyList
        variable:
          id: output_file
          value: output.txt
    runtimeArgs: []
results:
  output_file:
    - output.txt
totalRuns: 1
runNumber: 1

Executables

An executable is part of a process chain. It describes how a processing service should be executed and with which parameters.

Property	Type	Description
id (required)	string	An identifier (does not have to be unique). Typically refers to the `id` of the execute action, from which the executable was derived. Possibly suffixed with a dollar sign `$` and a number denoting the iteration of an enclosing for-each action (e.g. `myaction$1`) or nested for-each actions (e.g. `myaction$2$1`).
path (required)	string	The path to the binary of the service to be executed. This property is specific to the `runtime`. For example, for the `docker` and the `kubernetes` runtimes, this property refers to the container image.
serviceId (required)	string	The ID of the processing service to be executed.
arguments (required)	array	A list of arguments to pass to the service. May be empty.
runtime (required)	string	The name of the runtime that will execute the service. Built-in runtimes are currently `other` (for any service that is executable on the target system), `docker` for Docker containers, and `kubernetes` for Kubernetes jobs. More runtimes can be added through plugins
runtimeArgs (optional)	array	A list of arguments to pass to the runtime. May be empty.
retries (optional)	object	An optional retry policy specifying how often this executable should be restarted in case of an error.
maxInactivity (optional)	object	An optional timeout policy that defines how long the executable can run without producing any output (i.e. without writing anything to the standard output and error streams) before it is automatically cancelled or aborted.
maxRuntime (optional)	object	An optional timeout policy that defines how long the executable can run before it is automatically cancelled or aborted, even if the service regularly writes to the standard output and error streams.
deadline (optional)	object	An optional timeout policy that defines how long the executable can run at all (including all retries and their associated delays) until it is cancelled or aborted.

Example

id: ayj5kiwaxngbglzlxica
path: my_docker_image:latest
serviceId: countdown
runtime: docker
arguments:
  - id: input
    type: input
    dataType: file
    variable:
      id: input_file
      value: /data/input.txt
  - id: output
    type: output
    dataType: directory
    variable:
      id: output_file
      value: /data/output
  - id: arg1
    type: input
    dataType: boolean
    label: '--foobar'
    variable:
      id: akqcqqoedcsaoescyhga
      value: 'true'
runtimeArgs:
  - id: akqcqqoedcsaoescyhgq
    type: input
    dataType: string
    label: '-v'
    variable:
      id: data_mount
      value: /data:/data

Arguments

An argument is part of an executable.

Property	Type	Description
id (required)	string	An argument identifier
label (optional)	string	An optional label to use when the argument is passed to the service (e.g. `--input`).
variable (required)	object	A variable that holds the value of this argument.
type (required)	string	The type of this argument. Valid values: `input`, `output`
dataType (required)	string	The type of the argument value. If this property is `directory`, Steep will create a new directory for the service’s output and recursively search it for result files after the service has been executed. Otherwise, this property can be an arbitrary string. New data types with special handling can be added through output adapter plugins.

Example

id: akqcqqoedcsaoescyhgq
type: input
dataType: string
label: '-v'
variable:
  id: data_mount
  value: /data:/data

Argument variables

An argument variable holds the value of an argument.

Property	Type	Description
id (required)	string	The variable’s unique identifier
value (required)	string	The variable’s value

Example

id: data_mount
value: /data:/data

Process chain status

The following table shows the statuses a process chain can have:

Status	Description
`REGISTERED`	The process chain has been created but execution has not started yet
`RUNNING`	The process chain is currently being executed
`PAUSED`	The execution of the process chain is paused
`CANCELLED`	The execution of the process chain was cancelled
`SUCCESS`	The process chain was executed successfully
`ERROR`	The execution of the process chain failed

Workflows

Submissions