Data models

Process chains

As described in the section on workflow scheduling, Steep transforms a workflow to one or more process chains. A process chain is a sequential list of instructions that will be sent to Steep’s remote agents to execute processing services in a distributed environment.

Some of the properties specified in the table below are only available once Steep has started executing a process chain (e.g. startTime) or after the execution has finished (e.g. endTime).

Also, the /processchains HTTP endpoint, which provides a list of process chains, omits some properties although they are marked as required in the table below (e.g. executables or totalRuns). If you want to get all required properties, you have to use the /processchains/:id HTTP endpoint.

Steep records each execution of a process chain in a separate ‘run’. The property totalRuns specifies how often a process chain has been executed (including any currently running execution). If a process chain has just been created and still has the status REGISTERED, totalRuns equals 0, but as soon as the status switches to RUNNING, a new run is created and totalRuns is incremented to 1. If a process chain fails and is later retried, for example, a new run will be created and totalRuns will be 2, etc.

Requesting a process chain through the HTTP endpoints /processchains or /processchains/:id, always renders the latest run. The properties agentId, startTime, endTime, status, errorMessage, and autoResumeAfter depend on the actual run rendered (e.g. different runs have different start times; or one run might have failed, while a newer one might have succeeded or is still running, so their statuses are different). If you want to list all runs of a process chain or retrieve information about a specific run, use the /processchains/:id/runs or /processchains/:id/runs/:runNumber HTTP endpoints, respectively. The property runNumber from the table below specifies, which run out of totalRuns is rendered.

PropertyTypeDescription
id
(required)
stringUnique process chain identifier
executables
(required)
arrayA list of executable objects that describe what processing services should be called and with which arguments
submissionId
(required)
stringThe ID of the submission to which this process chain belongs
agentId
(optional)
stringThe ID of the agent that currently executes the process chain (if its status is RUNNING) or has executed it (if it is finished). May be null if the execution has not started yet.
startTime
(optional)
stringAn ISO 8601 timestamp denoting the date and time when the process chain execution was started. May be null if the execution has not started yet.
endTime
(optional)
stringAn ISO 8601 timestamp denoting the date and time when the process chain execution finished. May be null if the execution has not finished yet.
status
(required)
stringThe current status of the process chain
requiredCapabilities
(optional)
arrayA set of strings specifying capabilities a host system must provide to be able to execute this process chain. See also setups.
priority
(optional)
numberA priority used during scheduling. Process chains with higher priorities will be scheduled before those with lower priorities. Negative values are allowed. The default value is 0.
retries
(optional)
objectAn optional retry policy specifying how often this process chain will be rescheduled in case an error has occurred.
results
(optional)
objectIf status is SUCCESS, this property contains the list of process chain result files grouped by their output variable ID. Otherwise, it is null.
totalRuns
(required)
numberThe number of times the process chain has been executed (including any currently running execution).
runNumber
(optional)
numberThe number of the run currently rendered. May be null if the process chain has not been executed yet.
autoResumeAfter
(optional)
stringIf the process chain’s status is PAUSED, this optional property may specify a point in time (as an ISO 8601 timestamp) after which the process chain will be automatically resumed. It is typically only given, if a retry policy is configured (see retries property): if a process chain run has failed and there are still attempts left, autoResumeAfter specifies when the next attempt will be performed.
estimatedProgress
(optional)
numberA floating point number between 0.0 (0%) and 1.0 (100%) indicating the current execution progress of this process chain. This property will only be provided if the process chain is currently being executed (i.e. if its status equals RUNNING) and if a progress could actually be estimated. Note that the value is an estimation based on various factors and does not have to represent the real progress. More precise values can be calculated with a progress estimator plugin. Sometimes, progress cannot be estimated at all. In this case, the value will be null.
errorMessage
(optional)
stringIf status is ERROR, this property contains a human-readable error message. Otherwise, it is null.
Example
id: akpm646jjigral4cdyyq
submissionId: akpm6yojjigral4cdxgq
startTime: '2020-05-18T08:44:19.221456Z'
endTime: '2020-05-18T08:44:19.446437Z'
status: SUCCESS
agentId: bakwqka7gk2vrjnxdo5a
requiredCapabilities:
  - nodejs
executables:
  - id: ayj5kegaxngbglzlxibq
    path: ./countdown.js
    serviceId: countdown
    runtime: other
    arguments:
      - id: input
        type: input
        dataType: file
        variable:
          id: input_file
          value: input.txt
      - id: output
        type: output
        dataType: fileOrEmptyList
        variable:
          id: output_file
          value: output.txt
    runtimeArgs: []
results:
  output_file:
    - output.txt
totalRuns: 1
runNumber: 1

Executables

An executable is part of a process chain. It describes how a processing service should be executed and with which parameters.

PropertyTypeDescription
id
(required)
stringAn identifier (does not have to be unique). Typically refers to the id of the execute action, from which the executable was derived. Possibly suffixed with a dollar sign $ and a number denoting the iteration of an enclosing for-each action (e.g. myaction$1) or nested for-each actions (e.g. myaction$2$1).
path
(required)
stringThe path to the binary of the service to be executed. This property is specific to the runtime. For example, for the docker and the kubernetes runtimes, this property refers to the container image.
serviceId
(required)
stringThe ID of the processing service to be executed.
arguments
(required)
arrayA list of arguments to pass to the service. May be empty.
runtime
(required)
stringThe name of the runtime that will execute the service. Built-in runtimes are currently other (for any service that is executable on the target system), docker for Docker containers, and kubernetes for Kubernetes jobs. More runtimes can be added through plugins
runtimeArgs
(optional)
arrayA list of arguments to pass to the runtime. May be empty.
retries
(optional)
objectAn optional retry policy specifying how often this executable should be restarted in case of an error.
maxInactivity
(optional)
objectAn optional timeout policy that defines how long the executable can run without producing any output (i.e. without writing anything to the standard output and error streams) before it is automatically cancelled or aborted.
maxRuntime
(optional)
objectAn optional timeout policy that defines how long the executable can run before it is automatically cancelled or aborted, even if the service regularly writes to the standard output and error streams.
deadline
(optional)
objectAn optional timeout policy that defines how long the executable can run at all (including all retries and their associated delays) until it is cancelled or aborted.
Example
id: ayj5kiwaxngbglzlxica
path: my_docker_image:latest
serviceId: countdown
runtime: docker
arguments:
  - id: input
    type: input
    dataType: file
    variable:
      id: input_file
      value: /data/input.txt
  - id: output
    type: output
    dataType: directory
    variable:
      id: output_file
      value: /data/output
  - id: arg1
    type: input
    dataType: boolean
    label: '--foobar'
    variable:
      id: akqcqqoedcsaoescyhga
      value: 'true'
runtimeArgs:
  - id: akqcqqoedcsaoescyhgq
    type: input
    dataType: string
    label: '-v'
    variable:
      id: data_mount
      value: /data:/data

Arguments

An argument is part of an executable.

PropertyTypeDescription
id
(required)
stringAn argument identifier
label
(optional)
stringAn optional label to use when the argument is passed to the service (e.g. --input).
variable
(required)
objectA variable that holds the value of this argument.
type
(required)
stringThe type of this argument. Valid values: input, output
dataType
(required)
stringThe type of the argument value. If this property is directory, Steep will create a new directory for the service’s output and recursively search it for result files after the service has been executed. Otherwise, this property can be an arbitrary string. New data types with special handling can be added through output adapter plugins.
Example
id: akqcqqoedcsaoescyhgq
type: input
dataType: string
label: '-v'
variable:
  id: data_mount
  value: /data:/data

Argument variables

An argument variable holds the value of an argument.

PropertyTypeDescription
id
(required)
stringThe variable’s unique identifier
value
(required)
stringThe variable’s value
Example
id: data_mount
value: /data:/data

Process chain status

The following table shows the statuses a process chain can have:

StatusDescription
REGISTEREDThe process chain has been created but execution has not started yet
RUNNINGThe process chain is currently being executed
PAUSEDThe execution of the process chain is paused
CANCELLEDThe execution of the process chain was cancelled
SUCCESSThe process chain was executed successfully
ERRORThe execution of the process chain failed