Download and get started

Choose from one of the following options to download Steep:

If you downloaded the binary package of Steep, extract the ZIP file and run the start script:

cd steep-6.4.0
bin/steep

Or, start the Docker image as follows:

docker run --name steep -d --rm -p 8080:8080 \
    -e STEEP_HTTP_HOST=0.0.0.0 steep/steep:6.4.0

After a few seconds, you can access Steep’s web interface on http://localhost:8080/.

We will now submit a simple workflow to test if Steep is running correctly. The workflow consists of a single execute action that sleeps for 10 seconds and then quits. Execute the following command:

curl -X POST http://localhost:8080/workflows -d 'api: 4.5.0
actions:
  - type: execute
    service: sleep
    inputs:
      - id: seconds
        value: 10'

The command will return the ID of the submitted workflow. You can monitor the execution in the web interface or by issuing the following command:

curl http://localhost:8080/workflows/<workflow-id>

Replace <workflow-id> with the returned ID.

Congratulations! You successfully installed Steep and ran your first workflow.

Documentation

In this section, we describe the individual features of Steep. The documentation always applies to the latest software version.

Table of contents
  1. How does Steep work?
    1. Workflow scheduling
    2. Software architecture
    3. Processing services
  2. Example workflows
    1. Running two services in parallel
    2. Chaining two services
    3. Splitting and joining results
    4. Processing a dynamic number of results in parallel
    5. Feeding results back into the workflow (cycles/loops)
  3. Data models
    1. Workflows
    2. Variables
    3. Actions
      1. Execute actions
      2. For-each actions
      3. Parameters
      4. Output parameters
    4. Process chains
      1. Process chain status
    5. Executables
      1. Arguments
      2. Argument variables
    6. Submissions
      1. Submission status
    7. Service metadata
      1. Runtime environments
      2. Service parameters
      3. Runtime arguments
    8. Retry policies
    9. Timeout policies
    10. Creation policies
    11. Durations
    12. Agents
    13. VMs
      1. VM status
    14. Setups
      1. Volumes
    15. Pool agent parameters
  4. Full-text search
    1. Query language
    2. Search examples
    3. Search results
    4. Matches
  5. HTTP endpoints
    1. GET information
    2. GET health
    3. GET submissions
    4. GET submission by ID
    5. PUT submission
    6. POST workflow
    7. GET process chains
    8. GET process chain by ID
    9. GET process chain logs
    10. HEAD process chain logs
    11. PUT process chain
    12. GET agents
    13. GET agent by ID
    14. GET VMs
    15. GET VM by ID
    16. GET plugins
    17. GET plugin by name
    18. GET services
    19. GET service by ID
    20. GET search
    21. GET Prometheus metrics
  6. Web-based user interface
  7. Configuration
    1. steep.yaml
      1. General configuration
      2. Cluster settings
      3. HTTP configuration
      4. Controller configuration
      5. Scheduler configuration
      6. Agent configuration
      7. Runtime settings
      8. Database connection
      9. Cloud connection
      10. OpenStack cloud driver
      11. SSH connection to VMs
      12. Log configuration
      13. Garbage collector configuration
      14. Cache configuration
    2. setups.yaml
    3. services/services.yaml
    4. plugins/common.yaml
    5. Advanced configuration topics
      1. Provisioning scripts
        1. Global variables
        2. Environment variables
        3. Configuration values
        4. Read local files
        5. Upload local files to remote
      2. Using YAML anchors
      3. Using a template engine
  8. Extending Steep through plugins
    1. Initializers
    2. Output adapters
    3. Process chain adapters
    4. Process chain consistency checkers
    5. Custom runtime environments
    6. Progress estimators
    7. Parameter injection

1How does Steep work?

In order to answer this question, we will first describe how Steep transforms scientific workflow graphs into executable units. After that, we will have a look at Steep’s software architecture and what kind of processing services it can execute.

(This section is based on the following publication: Krämer, M. (2020). Capability-Based Scheduling of Scientific Workflows in the Cloud. Proceedings of the 9th International Conference on Data Science, Technology, and Applications DATA, 43–54. https://doi.org/10.5220/0009805400430054)

1.1Workflow scheduling

Steep is a scientific workflow management system that can be used to control the processing of very large data sets in a distributed environment.

A scientific workflow is typically represented by a directed acyclic graph that describes how an input data set is processed by certain tasks in a given order to produce a desired outcome. Such workflows can become very large with hundreds up to several thousands of tasks processing data volumes ranging from gigabytes to terabytes. The following figure shows a simple example of such a workflow in an extended Petri Net notation proposed by van der Aalst and van Hee (2004).

ABDEC

In this example, an input file is first processed by a task A. This task produces two results. The first one is processed by task B whose result is in turn sent to C. The second result of A is processed by D. The outcomes of C and D are finally processed by task E.

In order to be able to schedule such a workflow in a distributed environment, the graph has to be transformed to individual executable units. Steep follows a hybrid scheduling approach that applies heuristics on the level of the workflow graph and later on the level of individual executable units. We assume that tasks that access the same data should be executed on the same machine to reduce the communication overhead and to improve file reuse. We therefore group tasks into so-called process chains, which are linear sequential lists (without branches and loops).

Steep transforms workflows to process chains in an iterative manner. In each iteration, it finds the longest linear sequences of tasks and groups them to process chains. The following animation shows how this works for our example workflow:

ABDEC

Task A will be put into a process chain in iteration 1. Steep then schedules the execution of this process chain. After the execution has finished, Steep uses the results to produce a process chain containing B and C and another one containing D. These process chains are then scheduled to be executed in parallel. The results are finally used to generate the fourth process chain containing task E, which is also scheduled for execution.

1.2Software architecture

The following figure shows the main components of Steep: the HTTP server, the controller, the scheduler, the agent, and the cloud manager.

Together, these components form an instance of Steep. In practice, a single instance typically runs on a separate virtual machine, but multiple instances can also be started on the same machine. Each component can be enabled or disabled in a given instance (see the configuration options for more information). That means, in a cluster, there can be instances that have all five components enabled, and others that have only an agent, for example.

All components of all instances communicate with each other through messages sent over an event bus. Further, the HTTP server, the controller, and the scheduler are able to connect to a shared database.

The HTTP server provides information about scheduled, running, and finished workflows to clients. Clients can also upload a new workflow. In this case, the HTTP server puts the workflow into the database and sends a message to one of the instances of the controller.

The controller receives this message, loads the workflow from the database, and starts transforming it iteratively to process chains as described above. Whenever it has generated new process chains, it puts them into the database and sends a message to all instances of the scheduler.

The schedulers then select agents to execute the process chains. They load the process chains from the database, send them via the event bus to the selected agents for execution, and finally write the results into the database. The schedulers also send a message back to the controller so it can continue with the next iteration and generate more process chains until the workflow has been completely transformed.

In case a scheduler does not find an agent suitable for the execution of a process chain, it sends a message to the cloud manager (a component that interacts with the API of the Cloud infrastructure) and asks it to create a new agent.

1.3Processing services

Steep is very flexible and allows a wide range of processing services (or microservices) to be integrated. A typical processing service is a program that reads one or more input files and writes one or more output files. The program may also accept generic parameters. The service can be implemented in any programming language (as long as the binary or script is executable on the machine the Steep agent is running) or can be wrapped in a Docker container.

For a seamless integration, a processing service should adhere to the following guidelines:

  • Every processing service should be a microservice. It should run in its own process and serve one specific purpose.
  • As Steep needs to call the service in a distributed environment, it should not have a graphical user interface or require any human interaction during the runtime. Suitable services are command-line applications that accept arguments to specify input files, output files, and parameters.
  • The service should read from input files, process the data, write results to output files, and then exit. It should not run continuously like a web service. If you need to integrate a web service in your workflow, we recommend using the curl command or something similar.
  • Steep does not require the processing services to implement a specific interface. Instead, the service’s input and output parameters should be described in a special data model called service metadata.
  • According to common conventions for exit codes, a processing service should return 0 (zero) upon successful execution and any number but zero in case an error has occurred (e.g. 1, 2, 128, 255, etc.).
  • In order to ensure deterministic workflow executions, services should be stateless and idempotent. This means that every execution of a service with the same input data and the same set of parameters should produce the same result.

2Example workflows

In this section, we describe example workflows covering patterns we regularly see in real-world use cases. For each workflow, we also provide the required service metadata. For more information about the workflow model and service metadata, please read the section on data models.

2.1Running two services in parallel

This example workflow consists of two execute actions that each copy a file. Since both actions do not depend on each other (i.e. they do not share any variable), Steep converts them to two independent process chains and executes them in parallel (as long as there are at least two agents available).

The two execute actions refer to the copy service. The service metadata of copy defines that this processing service has an input parameter input_​file and an output parameter output_​file, both of which must be specified exactly one time (cardinality equals 1.​.​1).

The input parameters are given (example1.​txt and example2.​txt). The output parameters refer to the variables outputFile1 and outputFile2 and have no value. Steep will create unique values (output file names) for them during the workflow execution.

Workflow:
YAML
JSON
api: 4.5.0
actions:
  - type: execute
    service: copy
    inputs:
      - id: input_file
        value: example1.txt
    outputs:
      - id: output_file
        var: outputFile1
  - type: execute
    service: copy
    inputs:
      - id: input_file
        value: example2.txt
    outputs:
      - id: output_file
        var: outputFile2
{
  "api": "4.5.0",
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "value": "example1.txt"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile1"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "value": "example2.txt"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile2"
    }]
  }]
}
Service metadata:
YAML
JSON
- id: copy
  name: Copy
  description: Copy files
  path: cp
  runtime: other
  parameters:
    - id: input_file
      name: Input file name
      description: Input file name
      type: input
      cardinality: 1..1
      dataType: file
    - id: output_file
      name: Output file name
      description: Output file name
      type: output
      cardinality: 1..1
      dataType: file
[{
  "id": "copy",
  "name": "Copy",
  "description": "Copy files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "dataType": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "dataType": "file"
  }]
}]

2.2Chaining two services

The following example workflow makes a copy of a file and then a copy of the copy (i.e. the file is copied and the result is copied again). The workflow contains two actions that share the same variable: outputFile1 is used as the output of the first action and as the input of the second action. Steep executes them in sequence.

The service metadata for this workflow is the same as for the previous one.

Workflow:
YAML
JSON
api: 4.5.0
actions:
  - type: execute
    service: copy
    inputs:
      - id: input_file
        value: example.txt
    outputs:
      - id: output_file
        var: outputFile1
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: outputFile1
    outputs:
      - id: output_file
        var: outputFile2
{
  "api": "4.5.0",
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "value": "example.txt"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile1"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "outputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile2"
    }]
  }]
}

2.3Splitting and joining results

This example starts with an action that copies a file. Two other actions then run in parallel and make copies of the result of the first action. A final action then joins these copies to a single file. The workflow has a split-and-join pattern because the graph is split into two branches after the first action. These branches are then joined into a single one with the final action.

Workflow:
YAML
JSON
api: 4.5.0
actions:
  - type: execute
    service: copy
    inputs:
      - id: input_file
        value: example.txt
    outputs:
      - id: output_file
        var: outputFile1
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: outputFile1
    outputs:
      - id: output_file
        var: outputFile2
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: outputFile1
    outputs:
      - id: output_file
        var: outputFile3
  - type: execute
    service: join
    inputs:
      - id: i
        var: outputFile2
      - id: i
        var: outputFile3
    outputs:
      - id: o
        var: outputFile4
{
  "api": "4.5.0",
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "value": "example.txt"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile1"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "outputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile2"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "outputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile3"
    }]
  }, {
    "type": "execute",
    "service": "join",
    "inputs": [{
      "id": "i",
      "var": "outputFile2"
    }, {
      "id": "i",
      "var": "outputFile3"
    }],
    "outputs": [{
      "id": "o",
      "var": "outputFile4"
    }]
  }]
}
Service metadata:
YAML
JSON
- id: copy
  name: Copy
  description: Copy files
  path: cp
  runtime: other
  parameters:
    - id: input_file
      name: Input file name
      description: Input file name
      type: input
      cardinality: 1..1
      dataType: file
    - id: output_file
      name: Output file name
      description: Output file name
      type: output
      cardinality: 1..1
      dataType: file
- id: join
  name: Join
  description: Merge one or more files into one
  path: join.sh
  runtime: other
  parameters:
    - id: i
      name: Input files
      description: One or more input files to merge
      type: input
      cardinality: 1..n
      dataType: file
    - id: o
      name: Output file
      description: The output file
      type: output
      cardinality: 1..1
      dataType: file
[{
  "id": "copy",
  "name": "Copy",
  "description": "Copy files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "dataType": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "dataType": "file"
  }]
}, {
  "id": "join",
  "name": "Join",
  "description": "Merge one or more files into one",
  "path": "join.sh",
  "runtime": "other",
  "parameters": [{
    "id": "i",
    "name": "Input files",
    "description": "One or more input files to merge",
    "type": "input",
    "cardinality": "1..n",
    "dataType": "file"
  }, {
    "id": "o",
    "name": "Output file",
    "description": "The output file",
    "type": "output",
    "cardinality": "1..1",
    "dataType": "file"
  }]
}]

2.4Processing a dynamic number of results in parallel

This example demonstrates how to process the results of an action in parallel even if the number of result files is unknown during the design of the workflow. The workflow starts with an action that splits an input file example.​txt into multiple files (e.g. one file per line) stored in a directory outputDirectory. A for-each action then iterates over these files and creates copies. The for-each action has an iterator i that serves as the input for the individual instances of the copy service. The output files (outputFile1) of this service are collected via the yieldToOutput property in a variable called copies. The final join service merges these copies into a single file outputFile2.

Workflow:
YAML
JSON
api: 4.5.0
actions:
  - type: execute
    service: split
    inputs:
      - id: file
        value: example.txt
      - id: lines
        value: 1
    outputs:
      - id: output_directory
        var: outputDirectory
  - type: for
    input: outputDirectory
    enumerator: i
    output: copies
    actions:
      - type: execute
        service: copy
        inputs:
          - id: input_file
            var: i
        outputs:
          - id: output_file
            var: outputFile1
    yieldToOutput: outputFile1
  - type: execute
    service: join
    inputs:
      - id: i
        var: copies
    outputs:
      - id: o
        var: outputFile2
{
  "api": "4.5.0",
  "actions": [{
    "type": "execute",
    "service": "split",
    "inputs": [{
      "id": "file",
      "value": "example.txt"
    }, {
      "id": "lines",
      "value": 1
    }],
    "outputs": [{
      "id": "output_directory",
      "var": "outputDirectory"
    }]
  }, {
    "type": "for",
    "input": "outputDirectory",
    "enumerator": "i",
    "output": "copies",
    "actions": [{
      "type": "execute",
      "service": "copy",
      "inputs": [{
        "id": "input_file",
        "var": "i"
      }],
      "outputs": [{
        "id": "output_file",
        "var": "outputFile1"
      }]
    }],
    "yieldToOutput": "outputFile1"
  }, {
    "type": "execute",
    "service": "join",
    "inputs": [{
      "id": "i",
      "var": "copies"
    }],
    "outputs": [{
      "id": "o",
      "var": "outputFile2"
    }]
  }]
}
Service metadata:
YAML
JSON
- id: split
  name: Split
  description: Split a file into pieces
  path: split
  runtime: other
  parameters:
    - id: lines
      name: Number of lines per file
      description: Create smaller files n lines in length
      type: input
      cardinality: 0..1
      dataType: integer
      label: '-l'
    - id: file
      name: Input file
      description: The input file to split
      type: input
      cardinality: 1..1
      dataType: file
    - id: output_directory
      name: Output directory
      description: The output directory
      type: output
      cardinality: 1..1
      dataType: directory
      fileSuffix: /
- id: copy
  name: Copy
  description: Copy files
  path: cp
  runtime: other
  parameters:
    - id: input_file
      name: Input file name
      description: Input file name
      type: input
      cardinality: 1..1
      dataType: file
    - id: output_file
      name: Output file name
      description: Output file name
      type: output
      cardinality: 1..1
      dataType: file
- id: join
  name: Join
  description: Merge one or more files into one
  path: join.sh
  runtime: other
  parameters:
    - id: i
      name: Input files
      description: One or more input files to merge
      type: input
      cardinality: 1..n
      dataType: file
    - id: o
      name: Output file
      description: The output file
      type: output
      cardinality: 1..1
      dataType: file
[{
  "id": "split",
  "name": "Split",
  "description": "Split a file into pieces",
  "path": "split",
  "runtime": "other",
  "parameters": [{
    "id": "lines",
    "name": "Number of lines per file",
    "description": "Create smaller files n lines in length",
    "type": "input",
    "cardinality": "0..1",
    "dataType": "integer",
    "label": "-l"
  }, {
    "id": "file",
    "name": "Input file",
    "description": "The input file to split",
    "type": "input",
    "cardinality": "1..1",
    "dataType": "file"
  }, {
    "id": "output_directory",
    "name": "Output directory",
    "description": "The output directory",
    "type": "output",
    "cardinality": "1..1",
    "dataType": "directory",
    "fileSuffix": "/"
  }]
}, {
  "id": "copy",
  "name": "Copy",
  "description": "Copy files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "dataType": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "dataType": "file"
  }]
}, {
  "id": "join",
  "name": "Join",
  "description": "Merge one or more files into one",
  "path": "join.sh",
  "runtime": "other",
  "parameters": [{
    "id": "i",
    "name": "Input files",
    "description": "One or more input files to merge",
    "type": "input",
    "cardinality": "1..n",
    "dataType": "file"
  }, {
    "id": "o",
    "name": "Output file",
    "description": "The output file",
    "type": "output",
    "cardinality": "1..1",
    "dataType": "file"
  }]
}]

2.5Feeding results back into the workflow (cycles/​loops)

The following example shows how to create loops with a dynamic number of iterations. Suppose there is a processing service called countdown.​js that reads a number from an input file, decreases this number by 1, and then writes the result to an output file. The service could be implemented in Node.js as follows:

#!/usr/bin/env node

const fs = require("fs").promises

async function countDown(input, output) {
  let value = parseInt(await fs.readFile(input, "utf-8"))
  console.log(`Old value: ${value}`)

  value--
  if (value > 0) {
    console.log(`New value: ${value}`)
    await fs.writeFile(output, "" + value, "utf-8")
  } else {
    console.log("No new value")
  }
}

countDown(process.argv[2], process.argv[3])

The following workflow uses this service in a for-each action to continuously reprocess a file and decrease the number in it until it reaches 0.

In the first iteration of the for-each action, the service reads from a file called input.​txt and writes to an output file with a name generated during runtime. The path of this output file is routed back into the for-each action via yieldToInput. In the second iteration, the service reads from the output file and produces another one. This process continues until the number equals 0. In this case, the service does not write an output file anymore and the workflow finishes.

Note that we use the data type fileOrEmptyList in the service metadata for the output parameter of the countdown service. This is a special data type that either returns the generated file or an empty list if the file does not exist. In the latter case, the for-each action does not have any more input values to process. Think of the input of a for-each action as a queue. If nothing is pushed into the queue and all elements have already been processed, the for-each action can finish.

Workflow:
YAML
JSON
api: 4.5.0
vars:
  - id: input_file
    value: input.txt
actions:
  - type: for
    input: input_file
    enumerator: i
    yieldToInput: output_file
    actions:
      - type: execute
        service: countdown
        inputs:
          - id: input
            var: i
        outputs:
          - id: output
            var: output_file
{
  "api": "4.5.0",
  "vars": [{
    "id": "input_file",
    "value": "input.txt"
  }],
  "actions": [{
    "type": "for",
    "input": "input_file",
    "enumerator": "i",
    "yieldToInput": "output_file",
    "actions": [{
      "type": "execute",
      "service": "countdown",
      "inputs": [{
        "id": "input",
        "var": "i"
      }],
      "outputs": [{
        "id": "output",
        "var": "output_file"
      }]
    }]
  }]
}
Service metadata:
YAML
JSON
- id: countdown
  name: Count Down
  description: Read a number, subtract 1, and write the result
  path: ./countdown.js
  runtime: other
  parameters:
    - id: input
      name: Input file
      description: The input file containing the number to decrease
      type: input
      cardinality: 1..1
      dataType: file
    - id: output
      name: Output file
      description: The path to the output file
      type: output
      cardinality: 1..1
      dataType: fileOrEmptyList
[{
  "id": "countdown",
  "name": "Count Down",
  "description": "Read a number, subtract 1, and write the result",
  "path": "./countdown.js",
  "runtime": "other",
  "parameters": [{
    "id": "input",
    "name": "Input file",
    "description": "The input file containing the number to decrease",
    "type": "input",
    "cardinality": "1..1",
    "dataType": "file"
  }, {
    "id": "output",
    "name": "Output file",
    "description": "The path to the output file",
    "type": "output",
    "cardinality": "1..1",
    "dataType": "fileOrEmptyList"
  }]
}]

3Data models

This section contains a description of all data models used by Steep.

3.1Workflows

The main components of the workflow model are variables and actions. Use variables to specify input files and parameters for your processing services. Variables for output files must also be declared but must not have a value. The names of output files will be generated by Steep during the runtime of the workflow.

PropertyTypeDescription
api
(required)
stringThe API (or data model) version. Should be 4.​5.​0.
name
(optional)
stringAn optional human-readable workflow name
priority
(optional)
numberA priority used during scheduling. Process chains generated from workflows with higher priorities will be scheduled before those with lower priorities. Negative values are allowed. The default value is 0.
vars
(optional)
arrayAn array of variables
actions
(required)
arrayAn array of actions that make up the workflow
Example:

See the section on example workflows.

3.2Variables

A variable holds a value for inputs and outputs of processing services. It can be defined (inputs) or undefined (outputs). Defined values are immutable. Undefined variables will be assigned a value by Steep during the runtime of a workflow.

Variables are also used to link two services together and to define the data flow in the workflow graph. For example, if the output parameter of a service A refers to a variable V, and the input parameter of service B refers to the same variable, Steep will first execute A to determine the value of V and then execute B.

PropertyTypeDescription
id
(required)
stringA unique variable identifier
value
(optional)
anyThe variable’s value or null if the variable is undefined
Example:
YAML
JSON
id: input_file
value: /data/input.txt
{
  "id": "input_file",
  "value": "/data/input.txt"
}

3.3Actions

There are two types of actions in a workflow: execute actions and for-each actions. They are differentiated by their type attribute.

3.3.1Execute actions

An execute action instructs Steep to execute a certain service with given inputs and outputs.

PropertyTypeDescription
id
(optional)
stringAn optional string uniquely identifying the action within the workflow. If not given, a random identifier will be generated.
type
(required)
stringThe type of the action. Must be "execute".
service
(required)
stringThe ID of the service to execute
inputs
(optional)
arrayAn array of input parameters
outputs
(optional)
arrayAn array of output parameters
dependsOn
(optional)
arrayA list of identifiers of actions this action needs to finish first before it is ready to be executed. Note that Steep is able to identify dependencies between actions itself based on outputs and inputs, so this attribute is normally not needed. However, it may be useful if a preceding action does not have an output parameter or if the depending action does not have an input parameter. Execute actions may depend on other execute actions but also on for-each actions and vice versa.
retries
(optional)
objectAn optional retry policy specifying how often this action should be retried in case of an error. Overrides any default retry policy defined in the service metadata.
maxInactivity
(optional)
duration or objectAn optional duration or timeout policy that defines how long the execution of the service can take without producing any output (i.e. without writing anything to the standard output and error streams) before it is automatically cancelled or aborted. Can be combined with maxRuntime and deadline (see below). Overrides any default inactivity timeout defined in the service metadata. Note that a service cancelled due to inactivity is still subject to any configured retry policy, which means its execution may be retried even if one attempt timed out. If you want to cancel a long-running service immediately even if there is a retry policy configured, use a deadline.
maxRuntime
(optional)
duration or objectAn optional duration or timeout policy that defines how long the execution of the service can take before it is automatically cancelled or aborted, even if the service regularly writes to the standard output and error streams. Can be combined with maxInactivity (see above) and deadline (see below). Overrides any default maximum runtime defined in the service metadata. Note that a service cancelled due to a too long runtime is still subject to any configured retry policy, which means its execution may be retried even if one attempt timed out. If you want to cancel a long-running service immediately even if there is a retry policy configured, use a deadline.
deadline
(optional)
duration or objectAn optional duration or timeout policy that defines how long the execution of the service can take at all (including all retries and their associated delays) until it is cancelled or aborted. Can be combined with maxInactivity and maxRuntime (see above). Overrides any default deadline defined in the service metadata.
Example:
YAML
JSON
type: execute
service: my_service
inputs:
  - id: verbose
    var: is_verbose
  - id: resolution
    value: 10
  - id: input_file
    var: my_input_file
outputs:
  - id: output_file
    var: my_output_file
    store: true
{
  "type": "execute",
  "service": "my_service",
  "inputs": [{
    "id": "verbose",
    "var": "is_verbose"
  }, {
    "id": "resolution",
    "value": 10
  }, {
    "id": "input_file",
    "var": "my_input_file"
  }],
  "outputs": [{
    "id": "output_file",
    "var": "my_output_file",
    "store": true
  }]
}
3.3.2For-each actions

A for-each action has an input, a list of sub-actions, and an output. It clones the sub-actions as many times as there are items in its input, executes the actions, and then collects the results in the output.

Although the action is called ‘for-each’, the execution order of the sub-actions is undefined (i.e. the execution is non-sequential and non-deterministic). Instead, Steep always tries to execute as many sub-actions as possible in parallel.

For-each actions may contain execute actions but also nested for-each actions.

PropertyTypeDescription
id
(optional)
stringAn optional string uniquely identifying the action within the workflow. If not given, a random identifier will be generated.
type
(required)
stringThe type of the action. Must be "for".
input
(required)
stringThe ID of a variable containing the items to which to apply the sub-actions
enumerator
(required)
stringThe ID of a variable that holds the current value from input for each iteration
output
(optional)
stringThe ID of a variable that will collect output values from all iterations (see yieldToOutput)
dependsOn
(optional)
arrayA list of identifiers of actions this action needs to finish first before it is ready to be executed. Note that Steep is able to identify dependencies between actions itself based on outputs and inputs, so this attribute is normally not needed. However, it may be useful if a preceding action does not have an output parameter or if the depending action does not have an input parameter. For-each actions may depend on execute actions but also on other for-each actions and vice versa.
actions
(optional)
arrayAn array of sub-actions to execute in each iteration
yieldToOutput
(optional)
stringThe ID of a sub-action’s output variable whose value should be appended to the for-each action’s output
yieldToInput
(optional)
stringThe ID of a sub-action’s output variable whose value should be appended to the for-each action’s input to generate further iterations
Example:
YAML
JSON
type: for
input: all_input_files
output: all_output_files
enumerator: i
yieldToOutput: output_file
actions:
  - type: execute
    service: copy
    inputs:
      - id: input
        var: i
    outputs:
      - id: output
        var: output_file
{
  "type": "for",
  "input": "all_input_files",
  "output": "all_output_files",
  "enumerator": "i",
  "yieldToOutput": "output_file",
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input",
      "var": "i"
    }],
    "outputs": [{
      "id": "output",
      "var": "output_file"
    }]
  }]
}
3.3.3Parameters

This data model represents inputs and generic parameters of execute actions.

PropertyTypeDescription
id
(required)
stringThe ID of the parameter as defined in the service metadata
var
(optional)
stringThe ID of a variable that holds the value for this parameter (required if value is not given)
value
(optional)
anyThe parameter value (required if var is not given)

Note: Either var or value must be given but not both!

Example:
YAML
JSON
id: input
var: i
{
  "id": "input",
  "var": "i"
}
3.3.4Output parameters

Output parameters of execute actions have additional properties compared to inputs.

PropertyTypeDescription
id
(required)
stringThe ID of the parameter as defined in the service metadata
var
(required)
stringThe ID of a variable to which Steep will assign the generated name of the output file. This variable can then be used, for example, as an input parameter of a subsequent action.
prefix
(optional)
stringAn optional string to prepend to the generated name of the output file. For example, if Steep generates the name "name123abc" and the prefix is "my/​dir/​", the output filename will be "my/​dir/​name123abc". Note that the prefix must end with a slash if you want to create a directory. The output filename will be relative to the configured temporary directory or output directory (depending on the store property). You may even specify an absolute path: if the generated name is "name456fgh" and the prefix is "/​absolute/​dir/​", the output filename will be "/​absolute/​dir/​name456fgh".
store
(optional)
booleanIf this property is true, Steep will generate an output filename that is relative to the configured output directory instead of the temporary directory. The default value is false.
Example:
YAML
JSON
id: output
var: o
prefix: some_directory/
store: false
{
  "id": "output",
  "var": "o",
  "prefix": "some_directory/",
  "store": false
}

3.4Process chains

As described above, Steep transforms a workflow to one or more process chains. A process chain is a sequential list of instructions that will be sent to Steep’s remote agents to execute processing services in a distributed environment.

PropertyTypeDescription
id
(required)
stringUnique process chain identifier
executables
(required)
arrayA list of executable objects that describe what processing services should be called and with which arguments
submissionId
(required)
stringThe ID of the submission to which this process chain belongs
startTime
(optional)
stringAn ISO 8601 timestamp denoting the date and time when the process chain execution was started. May be null if the execution has not started yet.
endTime
(optional)
stringAn ISO 8601 timestamp denoting the date and time when the process chain execution finished. May be null if the execution has not finished yet.
status
(required)
stringThe current status of the process chain
requiredCapabilities
(optional)
arrayA set of strings specifying capabilities a host system must provide to be able to execute this process chain. See also setups.
priority
(optional)
numberA priority used during scheduling. Process chains with higher priorities will be scheduled before those with lower priorities. Negative values are allowed. The default value is 0.
results
(optional)
objectIf status is SUCCESS, this property contains the list of process chain result files grouped by their output variable ID. Otherwise, it is null.
estimatedProgress
(optional)
numberA floating point number between 0.​0 (0%) and 1.​0 (100%) indicating the current execution progress of this process chain. This property will only be provided if the process chain is currently being executed (i.e. if its status equals RUNNING) and if a progress could actually be estimated. Note that the value is an estimation based on various factors and does not have to represent the real progress. More precise values can be calculated with a progress estimator plugin. Sometimes, progress cannot be estimated at all. In this case, the value will be null.
errorMessage
(optional)
stringIf status is ERROR, this property contains a human-readable error message. Otherwise, it is null.
Example:
YAML
JSON
id: akpm646jjigral4cdyyq
submissionId: akpm6yojjigral4cdxgq
startTime: '2020-05-18T08:44:19.221456Z'
endTime: '2020-05-18T08:44:19.446437Z'
status: SUCCESS
requiredCapabilities:
  - nodejs
executables:
  - id: ayj5kegaxngbglzlxibq
    path: ./countdown.js
    serviceId: countdown
    runtime: other
    arguments:
      - id: input
        type: input
        dataType: file
        variable:
          id: input_file
          value: input.txt
      - id: output
        type: output
        dataType: fileOrEmptyList
        variable:
          id: output_file
          value: output.txt
    runtimeArgs: []
results:
  output_file:
    - output.txt
{
  "id": "akpm646jjigral4cdyyq",
  "submissionId": "akpm6yojjigral4cdxgq",
  "startTime": "2020-05-18T08:44:19.221456Z",
  "endTime": "2020-05-18T08:44:19.446437Z",
  "status": "SUCCESS",
  "requiredCapabilities": ["nodejs"],
  "executables": [{
    "id": "ayj5kegaxngbglzlxibq",
    "path": "./countdown.js",
    "serviceId": "countdown",
    "runtime": "other",
    "arguments": [{
      "id": "input",
      "type": "input",
      "dataType": "file",
      "variable": {
        "id": "input_file",
        "value": "input.txt"
      }
    }, {
      "id": "output",
      "type": "output",
      "dataType": "fileOrEmptyList",
      "variable": {
        "id": "output_file",
        "value": "output.txt"
      }
    }],
    "runtimeArgs": []
  }],
  "results": {
    "output_file": ["output.txt"]
  }
}
3.4.1Process chain status

The following table shows the statuses a process chain can have:

StatusDescription
REGISTEREDThe process chain has been created but execution has not started yet
RUNNINGThe process chain is currently being executed
CANCELLEDThe execution of the process chain was cancelled
SUCCESSThe process chain was executed successfully
ERRORThe execution of the process chain failed

3.5Executables

An executable is part of a process chain. It describes how a processing service should be executed and with which parameters.

PropertyTypeDescription
id
(required)
stringAn identifier (does not have to be unique). Typically refers to the id of the execute action, from which the executable was derived. Possibly suffixed with a dollar sign $ and a number denoting the iteration of an enclosing for-each action (e.g. myaction$1) or nested for-each actions (e.g. myaction$2$1).
path
(required)
stringThe path to the binary of the service to be executed. This property is specific to the runtime. For example, for the docker runtime, this property refers to the Docker image.
serviceId
(required)
stringThe ID of the processing service to be executed.
arguments
(required)
arrayA list of arguments to pass to the service. May be empty.
runtime
(required)
stringThe name of the runtime that will execute the service. Built-in runtimes are currently other (for any service that is executable on the target system) and docker for Docker containers. More runtimes can be added through plugins
runtimeArgs
(optional)
arrayA list of arguments to pass to the runtime. May be empty.
retries
(optional)
objectAn optional retry policy specifying how often this executable should be restarted in case of an error.
maxInactivity
(optional)
objectAn optional timeout policy that defines how long the executable can run without producing any output (i.e. without writing anything to the standard output and error streams) before it is automatically cancelled or aborted.
maxRuntime
(optional)
objectAn optional timeout policy that defines how long the executable can run before it is automatically cancelled or aborted, even if the service regularly writes to the standard output and error streams.
deadline
(optional)
objectAn optional timeout policy that defines how long the executable can run at all (including all retries and their associated delays) until it is cancelled or aborted.
Example:
YAML
JSON
id: ayj5kiwaxngbglzlxica
path: my_docker_image:latest
serviceId: countdown
runtime: docker
arguments:
  - id: input
    type: input
    dataType: file
    variable:
      id: input_file
      value: /data/input.txt
  - id: output
    type: output
    dataType: directory
    variable:
      id: output_file
      value: /data/output
  - id: arg1
    type: input
    dataType: boolean
    label: '--foobar'
    variable:
      id: akqcqqoedcsaoescyhga
      value: 'true'
runtimeArgs:
  - id: akqcqqoedcsaoescyhgq
    type: input
    dataType: string
    label: '-v'
    variable:
      id: data_mount
      value: /data:/data
{
  "id": "ayj5kiwaxngbglzlxica",
  "path": "my_docker_image:latest",
  "serviceId": "countdown",
  "runtime": "docker",
  "arguments": [{
    "id": "input",
    "type": "input",
    "dataType": "file",
    "variable": {
      "id": "input_file",
      "value": "/data/input.txt"
    }
  }, {
    "id": "output",
    "type": "output",
    "dataType": "directory",
    "variable": {
      "id": "output_file",
      "value": "/data/output"
    }
  }, {
    "id": "arg1",
    "type": "input",
    "dataType": "boolean",
    "label": "--foobar",
    "variable": {
      "id": "akqcqqoedcsaoescyhga",
      "value": "true"
    }
  }],
  "runtimeArgs": [{
    "id": "akqcqqoedcsaoescyhgq",
    "type": "input",
    "dataType": "string",
    "label": "-v",
    "variable": {
      "id": "data_mount",
      "value": "/data:/data"
    }
  }]
}
3.5.1Arguments

An argument is part of an executable.

PropertyTypeDescription
id
(required)
stringAn argument identifier
label
(optional)
stringAn optional label to use when the argument is passed to the service (e.g. --input).
variable
(required)
objectA variable that holds the value of this argument.
type
(required)
stringThe type of this argument. Valid values: input, output
dataType
(required)
stringThe type of the argument value. If this property is directory, Steep will create a new directory for the service’s output and recursively search it for result files after the service has been executed. Otherwise, this property can be an arbitrary string. New data types with special handling can be added through output adapter plugins.
Example:
YAML
JSON
id: akqcqqoedcsaoescyhgq
type: input
dataType: string
label: '-v'
variable:
  id: data_mount
  value: /data:/data
{
  "id": "akqcqqoedcsaoescyhgq",
  "type": "input",
  "dataType": "string",
  "label": "-v",
  "variable": {
    "id": "data_mount",
    "value": "/data:/data"
  }
}
3.5.2Argument variables

An argument variable holds the value of an argument.

PropertyTypeDescription
id
(required)
stringThe variable’s unique identifier
value
(required)
stringThe variable’s value
Example:
YAML
JSON
id: data_mount
value: /data:/data
{
  "id": "data_mount",
  "value": "/data:/data"
}

3.6Submissions

A submission is created when you submit a workflow through the /​workflows endpoint. It contains information about the workflow execution such as the start and end time as well as the current status.

PropertyTypeDescription
id
(required)
stringUnique submission identifier
name
(optional)
stringAn optional human-readable submission name. The value is derived from the submitted workflow’s name if it has any.
workflow
(required)
objectThe submitted workflow
priority
(optional)
numberA priority used during scheduling. Process chains generated from submissions with higher priorities will be scheduled before those with lower priorities. Negative values are allowed. The value is derived from the submitted workflow but can be overridden. The submission’s priority always takes precedence over the workflow’s priority when generating process chains.
startTime
(optional)
stringAn ISO 8601 timestamp denoting the date and time when the workflow execution was started. May be null if the execution has not started yet.
endTime
(optional)
stringAn ISO 8601 timestamp denoting the date and time when the workflow execution finished. May be null if the execution has not finished yet.
status
(required)
stringThe current status of the submission
requiredCapabilities
(required)
arrayA set of strings specifying capabilities a host system must provide to be able to execute this workflow. See also setups.
source
(optional)
stringThe original, unaltered workflow source as it was submitted through the /​workflows endpoint. May be null if the workflow was not submitted through the endpoint or if the source is unavailable.
runningProcessChains
(required)
numberThe number of process chains currently being executed
cancelledProcessChains
(required)
numberThe number of process chains that have been cancelled
succeededProcessChains
(required)
numberThe number of process chains that have finished successfully
failedProcessChains
(required)
numberThe number of process chains whose execution has failed
totalProcessChains
(required)
numberThe current total number of process chains in this submission. May increase during execution when new process chains are generated.
results
(optional)
objectIf status is SUCCESS or PARTIAL_​SUCCESS, this property contains the list of workflow result files grouped by their output variable ID. Otherwise, it is null.
errorMessage
(optional)
stringIf status is ERROR, this property contains a human-readable error message. Otherwise, it is null.
Example:
YAML
JSON
id: aiq7eios7ubxglkcqx5a
workflow:
  api: 4.5.0
  vars:
    - id: myInputFile
      value: /data/input.txt
    - id: myOutputFile
  actions:
    - type: execute
      service: cp
      inputs:
        - id: input_file
          var: myInputFile
      outputs:
        - id: output_file
          var: myOutputFile
          store: true
startTime: '2020-02-13T15:38:58.719382Z'
endTime: '2020-02-13T15:39:00.807715Z'
status: SUCCESS
runningProcessChains: 0
cancelledProcessChains: 0
succeededProcessChains: 1
failedProcessChains: 0
totalProcessChains: 1
results:
  myOutputFile:
    - /data/out/aiq7eios7ubxglkcqx5a/aiq7hygs7ubxglkcrf5a
{
  "id": "aiq7eios7ubxglkcqx5a",
  "workflow": {
    "api": "4.5.0",
    "vars": [{
      "id": "myInputFile",
      "value": "/data/input.txt"
    }, {
      "id": "myOutputFile"
    }],
    "actions": [{
      "type": "execute",
      "service": "cp",
      "inputs": [{
        "id": "input_file",
        "var": "myInputFile"
      }],
      "outputs": [{
        "id": "output_file",
        "var": "myOutputFile",
        "store": true
      }]
    }]
  },
  "startTime": "2020-02-13T15:38:58.719382Z",
  "endTime": "2020-02-13T15:39:00.807715Z",
  "status": "SUCCESS",
  "runningProcessChains": 0,
  "cancelledProcessChains": 0,
  "succeededProcessChains": 1,
  "failedProcessChains": 0,
  "totalProcessChains": 1,
  "results": {
    "myOutputFile": [
      "/data/out/aiq7eios7ubxglkcqx5a/aiq7hygs7ubxglkcrf5a"
    ]
  }
}
3.6.1Submission status

The following table shows the statuses a submission can have:

StatusDescription
ACCEPTEDThe submission has been accepted by Steep but execution has not started yet
RUNNINGThe submission is currently being executed
CANCELLEDThe submission was cancelled
SUCCESSThe execution of the submission finished successfully
PARTIAL_SUCCESSThe submission was executed completely but one or more process chains failed
ERRORThe execution of the submission failed

3.8Retry policies

A retry policy specifies how often the execution of a workflow action should be retried in case of an error. Retry policies can be specified per service in the service metadata or per executable action in the workflow.

PropertyTypeDescription
maxAttempts
(optional)
numberThe maximum number of attempts to perform. This includes the initial attempt. For example, a value of 3 means 1 initial attempt and 2 retries. The default value is 1. A value of -1 means an unlimited (infinite) number of attempts. 0 means there will be no attempt at all (the service or action will be skipped).
delay
(optional)
durationThe amount of time that should pass between two attempts. The default is 0, which means the operation will be retried immediately.
exponentialBackoff
(optional)
numberA factor for an exponential backoff (see description below)
maxDelay
(optional)
durationThe maximum amount of time that should pass between two attempts. Only applies if exponentialBackoff is larger than 1. By default, there is no upper limit.
Exponential backoff:

The exponential backoff factor can be used to gradually increase the delay. The actual delay between two attempts will be calculated as follows:

actualDelay = min(delay * pow(exponentialBackoff, nAttempt - 1), maxDelay)

For example, if delay equals 1s, exponentialBackoff equals 2, and maxDelay equals 10s, the following actual delays will apply:

  • Delay after attempt 1

    min(1s * pow(2, 0), 10s) = 1s

  • Delay after attempt 2:

    min(1s * pow(2, 1), 10s) = 2s

  • Delay after attempt 3:

    min(1s * pow(2, 2), 10s) = 4s

  • Delay after attempt 4:

    min(1s * pow(2, 3), 10s) = 8s

  • Delay after attempt 5:

    min(1s * pow(2, 4), 10s) = 10s

  • Delay after attempt 6:

    min(1s * pow(2, 4), 10s) = 10s

The default value is 1, which means there is no backoff and the actual delay always equals the specified one.

Example:
YAML
JSON
maxAttempts: 5
delay: 1s
exponentialBackoff: 2
maxDelay: 10s
{
  "maxAttempts": 5,
  "delay": "1s",
  "exponentialBackoff": 2,
  "maxDelay": "10s"
}

3.9Timeout policies

A timeout policy defines how long a service or an executable may run before it is automatically cancelled or aborted (with an error). Timeout policies can be specified with the maxInactivity, maxRuntime and deadline attributes, either per service in the service metadata or per executable action in the workflow.

A timeout policy is either a string or an object. If it is a string, it represents a duration specifying a maximum amount of time until the execution is cancelled.

If specified as an object, the timeout policy has the following properties:

PropertyTypeDescription
timeoutdurationThe maximum amount of time that may pass until the execution is cancelled or aborted.
errorOnTimeout
(optional)
booleantrue if an execution that is aborted due to a timeout should lead to an error (i.e. if the process chain’s status should be set to ERROR). false if it should just be cancelled (process chain status CANCELLED). By default, the execution will be cancelled.

Multiple timeout policies can be combined. For example, a service may be cancelled after 5 minutes of inactivity and aborted with an error if its total execution takes longer than 1 hour.

Example:
YAML
JSON
maxInactivity: 5m
deadline:
  timeout: 1h
  errorOnTimeout: true
{
  "maxInactivity": "5m",
  "deadline": {
    "timeout": "1h",
    "errorOnTimeout": true
  }
}

3.10Creation policies

A creation policy defines rules for creating VMs from a certain setup.

PropertyTypeDescription
retries
(optional)
objectAn optional retry policy that specifies how many attempts should be made to create a VM (if creation fails) as well as possible (exponential) delays between those attempts. If this property is null, default values from the steep.​yaml file will be used.
lockAfterRetries
(optional)
durationWhen the maximum number of attempts to create a VM from a certain setup has been reached, the setup will be locked and no other VM with this setup will be created. This property defines how long it will stay locked.
Example:
YAML
JSON
retries:
  maxAttempts: 5
  delay: 40s
  exponentialBackoff: 2
lockAfterRetries: 20m
{
  "retries": {
    "maxAttempts": 5,
    "delay": "40s",
    "exponentialBackoff": 2
  },
  "lockAfterRetries": "20m"
}

3.11Durations

A duration consists of one or more number/unit pairs possibly separated by whitespace characters. Supported units are:

  • milliseconds, millisecond, millis, milli, ms
  • seconds, second, secs, sec, s
  • minutes, minute, mins, min, m
  • hours, hour, hrs, hr, h
  • days, day, d

Numbers must be positive integers. The default unit is milliseconds

Examples:
1000ms
3 secs
5m
20mins
10h 30 minutes
1 hour 10minutes 5s
1d 5h
10 days 1hrs 30m 15 secs

3.12Agents

An agent represents an instance of Steep that can execute process chains.

PropertyTypeDescription
id
(required)
stringA unique agent identifier
available
(required)
booleantrue if the agent is currently idle and new process chains can be assigned to it, false if it is busy executing a process chain
capabilities
(required)
arrayA set of strings specifying capabilities the agent provides. See also setups.
startTime
(required)
stringAn ISO 8601 timestamp denoting the date and time when the agent has started.
stateChangedTime
(required)
stringAn ISO 8601 timestamp denoting the date and time when the value of the available property changed from true to false (i.e. when the agent became busy) or from false to true (when it became available)
processChainId
(optional)
stringThe ID of the process chain currently allocated to this agent. May be null if no process chain has been allocated to the agent (i.e. if it is currently available).
Example:
YAML
JSON
id: akuxryerojbw7mnvovaa
available: true
capabilities:
  - docker
startTime: '2020-05-26T10:50:02.001998Z'
stateChangedTime: '2020-05-26T11:06:52.367121Z'
{
  "id": "akuxryerojbw7mnvovaa",
  "available": true,
  "capabilities": ["docker"],
  "startTime": "2020-05-26T10:50:02.001998Z",
  "stateChangedTime": "2020-05-26T11:06:52.367121Z"
}

3.13VMs

This data model describes virtual machines created by Steep’s cloud manager.

PropertyTypeDescription
id
(required)
stringA unique VM identifier
externalId
(optional)
stringAn identifier generated by the Cloud platform
ipAddress
(optional)
stringThe VM’s IP address
setup
(required)
objectThe setup used to create this VM
creationTime
(optional)
stringAn ISO 8601 timestamp denoting the date and time when the VM was created. This property is null if the VM has not been created yet.
agentJoinTime
(optional)
stringAn ISO 8601 timestamp denoting the date and time when a Steep agent has been deployed to the VM and has joined the cluster. This property is null if the agent has not joined the cluster yet.
destructionTime
(optional)
stringAn ISO 8601 timestamp denoting the date and time when the VM was destroyed. This property is null if the VM has not been destroyed yet.
status
(required)
stringThe status of the VM
reason
(optional)
stringThe reason why the VM has the current status (e.g. an error message if it has the ERROR status or a simple message indicating why it has been DESTROYED)
3.13.1VM status

The following table shows the statuses a VM can have:

StatusDescription
CREATINGThe VM is currently being created
PROVISIONINGThe VM has been created and is currently being provisioned (i.e. provisioning scripts defined in the VM’s setup are being executed and the Steep agent is being deployed)
RUNNINGThe VM has been created and provisioned successfully. It is currently running and registered as a remote agent.
LEFTThe remote agent on this VM has left. It will be destroyed eventually.
DESTROYINGThe VM is currently being destroyed
DESTROYEDThe VM has been destroyed
ERRORThe VM could not be created, provisioned, or failed otherwise. See the VM’s reason property for more information.

3.14Setups

A setup describes how a virtual machine (VM) should be created by Steep’s cloud manager.

PropertyTypeDescription
id
(required)
stringA unique setup identifier
flavor
(required)
stringThe flavor of the new VM
imageName
(required)
stringThe name of the VM image to deploy
availabilityZone
(required)
stringThe availability zone in which to create the VM
blockDeviceSizeGb
(required)
numberThe size of the VM’s main block device in gigabytes
blockDeviceVolumeType
(optional)
stringAn optional type of the VM’s main block device. By default, the type will be selected automatically.
minVMs
(optional)
numberAn optional minimum number of VMs to create with this setup. The default value is 0.
maxVMs
(required)
numberThe maximum number of VMs to create with this setup
maxCreateConcurrent
(optional)
numberThe maximum number of VMs to create and provision concurrently. The default value is 1.
provisioningScripts
(optional)
arrayAn optional list of paths to provisioning scripts that should be executed on the VM after it has been created
providedCapabilities
(optional)
arrayAn optional set of strings specifying capabilities that VMs with this setup will have
sshUsername
(optional)
stringAn optional username for the SSH connection to the created VM. Overrides the main configuration item steep.​cloud.​ssh.​username if it is defined.
additionalVolumes
(optional)
arrayAn optional list of volumes that will be attached to the VM
parameters
(optional)
objectAn optional custom object with user-defined properties. Use this object to pass arbitrary values to provisioning scripts where they can be accessed through the global variable setup.
creation
(optional)
objectAn optional creation policy that defines rules for creating VMs from this setup. Default values for this parameter are defined in the main configuration.
Example:
YAML
JSON
id: default
flavor: 7d217779-4d7b-4689-8a40-c12a377b946d
imageName: Ubuntu 18.04
availabilityZone: nova
blockDeviceSizeGb: 50
minVMs: 0
maxVMs: 4
provisioningScripts:
  - conf/setups/default/01_docker.sh
  - conf/setups/default/02_steep.sh
providedCapabilities:
  - docker
additionalVolumes:
  - sizeGb: 50
    type: SSD
{
  "id": "default",
  "flavor": "7d217779-4d7b-4689-8a40-c12a377b946d",
  "imageName": "Ubuntu 18.04",
  "availabilityZone": "nova",
  "blockDeviceSizeGb": 50,
  "minVMs": 0,
  "maxVMs": 4,
  "provisioningScripts": [
    "conf/setups/default/01_docker.sh",
    "conf/setups/default/02_steep.sh"
  ],
  "providedCapabilities": ["docker"],
  "additionalVolumes": [{
    "sizeGb": 50,
    "type": "SSD"
  }]
}
3.14.1Volumes

This data model describes an additional volume that can be attached to a virtual machine specified by a setup.

PropertyTypeDescription
sizeGb
(required)
numberThe volume’s size in gigabytes
type
(optional)
stringType the volume’s type. By default, the type will be selected automatically.
availabilityZone
(optional)
stringThe availability zone in which to create the volume. By default, it will be created in the same availability zone as the VM to which it will be attached.
Example:
YAML
JSON
sizeGb: 50
type: SSD
availabilityZone: nova
{
  "sizeGb": 50,
  "type": "SSD",
  "availabilityZone": "nova"
}

3.15Pool agent parameters

Steep’s cloud manager component is able to create virtual machines and to deploy remote agent instances to it. The cloud manager keeps every remote agent created in a pool. Use pool agent parameters to define a minimum and maximum number of instances per provided capability set.

PropertyTypeDescription
capabilities
(required)
arrayA set of strings spec­i­fy­ing ca­pa­bil­i­ties that a remote agent must provide so these parameters apply to it
min
(optional)
numberAn optional minimum number of remote agents that the cloud manager should create with the given capabilities
max
(optional)
numberAn optional maximum number of remote agents that the cloud manager is allowed to create with the given capabilities
Example:
YAML
JSON
capabilities:
  - docker
  - python
min: 1
max: 5
{
  "capabilities": ["docker", "python"],
  "min": 1,
  "max": 5
}

5HTTP endpoints

The main way to communicate with Steep (i.e. to submit workflows, to monitor progress, fetch metadata, etc.) is through its HTTP interface. In this section, we describe all HTTP endpoints. By default, Steep listens to incoming connections on port 8080.

5.1GET information

Get information about Steep. This includes:

  • Steep’s version number
  • A build ID
  • A SHA of the Git commit for which the build was created
  • A timestamp of the moment when the build was created
Resource URL
/
Parameters
None
Status codes
200The operation was successful
Example request
GET / HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 131
content-type: application/json

{
    "build": "1253",
    "commit": "cba2995e4f1b0d5982a15c680cb7a6d677d8a555",
    "timestamp": 1667547288565,
    "version": "6.4.0"
}

5.2GET health

An endpoint that can be used by external applications at regular intervals to monitor Steep’s health, i.e. to check if it works correctly and if the underlying database as well as all remote agents are reachable.

The response is a JSON object, which will always contain the attribute health. This attribute can be either true if Steep works correctly or false if some error has occurred. Apart from that, the response may contain additional information about data stored in the database (such as the number of submissions) as well as information about remote agents. However, these attributes are not defined and may change in future versions.

Resource URL
/health
Parameters
None
Status codes
200The operation was successful
503Service unavailable. Steep may not work as expected.
Example request
GET /health HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 105
content-type: application/json

{
    "agents": {
        "count": 4,
        "health": true
    },
    "health": true,
    "services": {
        "count": 14,
        "health": true
    },
    "submissions": {
        "count": 1960,
        "health": true
    },
    "vms": {
        "count": 4173,
        "health": true
    }
}

5.3GET submissions

Get information about all submissions in the database. The response is a JSON array consisting of submission objects without the properties workflow, results, and errorMessage. In order to get the complete details of a submission, use the GET submission by ID endpoint.

The submissions are returned in the order in which they were added to the database with the newest ones at the top.

Resource URL
/workflows
Parameters
size
(optional)
The maximum number of submissions to return. The default value is 10.
offset
(optional)
The offset of the first submission to return. The default value is 0.
status
(optional)
If this parameter is defined, Steep will only return submissions with the given status. Otherwise, it will return all submissions from the database. See the list of submission statuses for valid values.
Response headers
x-page-sizeThe size of the current page (i.e. the maximum number of submission objects returned). See size request parameter.
x-page-offsetThe offset of the first submission returned. See offset request parameter
x-page-totalThe total number of submissions in the database matching the given request parameters.
Status codes
200The operation was successful
400One of the parameters was invalid. See response body for error message.
Example request
GET /workflows HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 662
content-type: application/json
x-page-offset: 0
x-page-size: 10
x-page-total: 2

[
  {
    "id": "akpm6yojjigral4cdxgq",
    "startTime": "2020-05-18T08:44:01.045710Z",
    "endTime": "2020-05-18T08:44:21.218425Z",
    "status": "SUCCESS",
    "requiredCapabilities": [],
    "runningProcessChains": 0,
    "cancelledProcessChains": 0,
    "succeededProcessChains": 10,
    "failedProcessChains": 0,
    "totalProcessChains": 10
  },
  {
    "id": "akttc5kv575splk3ameq",
    "startTime": "2020-05-24T17:20:37.343072Z",
    "status": "RUNNING",
    "requiredCapabilities": [],
    "runningProcessChains": 1,
    "cancelledProcessChains": 0,
    "succeededProcessChains": 391,
    "failedProcessChains": 0,
    "totalProcessChains": 1000
  }
]

5.4GET submission by ID

Get details about a single submission from the database.

Resource URL
/workflows/:id
Parameters
idThe ID of the submission to return
Status codes
200The operation was successful
404The submssion was not found
Example request
GET /workflows/akpm6yojjigral4cdxgq HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 348
content-type: application/json

{
  "id": "akpm6yojjigral4cdxgq",
  "startTime": "2020-05-18T08:44:01.045710Z",
  "endTime": "2020-05-18T08:44:21.218425Z",
  "status": "SUCCESS",
  "requiredCapabilities": [],
  "runningProcessChains": 0,
  "cancelledProcessChains": 0,
  "succeededProcessChains": 10,
  "failedProcessChains": 0,
  "totalProcessChains": 10,
  "workflow": {
    "api": "4.5.0",
    "vars": [
      …
    ],
    "actions": [
      …
    ]
  }
}

5.5PUT submission

Update a submission. The request body is a JSON object with the submission properties to update. At the moment, only the status and priority properties can be updated.

If the operation was successful, the response body contains the updated submission without the properties workflow, results, and errorMessage.

Notes:

  • You can use this endpoint to cancel the execution of a submission (see example below).
  • If you update a submission’s priority, the priorities of all process chains that belong to this submission and have a status of either REGISTERED or RUNNING will be updated too. Also, all process chains generated from this submission in the future will receive the updated priority. Finished process chains will not be modified.
Resource URL
/workflows/:id
Parameters
idThe ID of the submission to update
Status codes
200The operation was successful
400The request body was invalid
404The submission was not found
Example request
PUT /workflows/akujvtkv575splk3saqa HTTP/1.1
Content-Length: 28
Content-Type: application/json

{
  "status": "CANCELLED"
}
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 195
content-type: application/json

{
  "id": "akujvtkv575splk3saqa",
  "startTime": "2020-05-25T19:02:21.610396Z",
  "endTime": "2020-05-25T19:02:33.414032Z",
  "status": "CANCELLED",
  "requiredCapabilities": [],
  "runningProcessChains": 0,
  "cancelledProcessChains": 314,
  "succeededProcessChains": 686,
  "failedProcessChains": 0,
  "totalProcessChains": 1000
}

5.6POST workflow

Create a new submission. The request body contains the workflow to execute.

If the operation was successful, the response body contains submission.

Resource URL
/workflows
Status codes
202The workflow has been accepted (i.e. stored in the database) and is scheduled for execution.
400The posted workflow was invalid. See response body for more information.
Example request
POST /workflows HTTP/1.1
Content-Length: 231
Content-Type: application/json

{
  "api": "4.5.0",
  "vars": [{
    "id": "sleep_seconds",
    "value": 3
  }],
  "actions": [{
    "type": "execute",
    "service": "sleep",
    "inputs": [{
      "id": "seconds",
      "var": "sleep_seconds"
    }]
  }]
}
Example response
HTTP/1.1 202 Accepted
content-encoding: gzip
content-length: 374
content-type: application/json

{
  "id": "akukkcsv575splk3v2ma",
  "status": "ACCEPTED",
  "workflow": {
    "api": "4.5.0",
    "vars": [{
      "id": "sleep_seconds",
      "value": 3
    }],
    "actions": [{
      "type": "execute",
      "service": "sleep",
      "inputs": [],
      "outputs": [],
      "inputs": [{
        "id": "seconds",
        "var": "sleep_seconds"
      }]
    }]
  }
}

5.7GET process chains

Get information about all process chains in the database. The response is a JSON array consisting of process chain objects without the properties executables and results. In order to get the complete details of a process chain, use the GET process chain by ID endpoint.

The process chains are returned in the order in which they were added to the database with the newest ones at the top.

Resource URL
/processchains
Parameters
size
(optional)
The maximum number of process chains to return. The default value is 10.
offset
(optional)
The offset of the first process chain to return. The default value is 0.
submissionId
(optional)
If this parameter is defined, Steep will only return process chains from the submission with the given ID. Otherwise, it will return process chains from all submissions. If there is no submission with the given ID, the result will be an empty array.
status
(optional)
If this parameter is defined, Steep will only return process chains with the given status. Otherwise, it will return all process chains from the database. See the list of process chain statuses for valid values.
Response headers
x-page-sizeThe size of the current page (i.e. the maximum number of process chain objects returned). See size request parameter.
x-page-offsetThe offset of the first process chain returned. See offset request parameter
x-page-totalThe total number of process chains in the database matching the given request parameters.
Status codes
200The operation was successful
400One of the parameters was invalid. See response body for error message.
Example request
GET /processchains HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 262
content-type: application/json
x-page-offset: 0
x-page-size: 10
x-page-total: 7026

[
  {
    "id": "akukkcsv575splk3v2na",
    "submissionId": "akukkcsv575splk3v2ma",
    "startTime": "2020-05-25T19:46:02.532829Z",
    "endTime": "2020-05-25T19:46:05.546807Z",
    "status": "SUCCESS",
    "requiredCapabilities": []
  },
  {
    "id": "akujvtkv575splk3tppq",
    "submissionId": "akujvtkv575splk3saqa",
    "status": "CANCELLED",
    "requiredCapabilities": []
  },
  …
]

5.8GET process chain by ID

Get details about a single process chain from the database.

Resource URL
/processchains/:id
Parameters
idThe ID of the process chain to return
Status codes
200The operation was successful
404The process chain was not found
Example request
GET /processchains/akukkcsv575splk3v2na HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 535
content-type: application/json

{
  "id": "akukkcsv575splk3v2na",
  "submissionId": "akukkcsv575splk3v2ma",
  "startTime": "2020-05-25T19:46:02.532829Z",
  "endTime": "2020-05-25T19:46:05.546807Z",
  "status": "SUCCESS",
  "requiredCapabilities": [],
  "results": {},
  "executables": [{
    "id": "ayj5kboaxngbglzlxiba",
    "path": "sleep",
    "serviceId": "sleep",
    "runtime": "other",
    "runtimeArgs": [],
    "arguments": [{
      "id": "seconds",
      "type": "input",
      "dataType": "integer",
      "variable": {
        "id": "sleep_seconds",
        "value": "3"
      }
    }]
  }]
}

5.9GET process chain logs

Get contents of the log file of a process chain.

This endpoint will only return something if process chain logging is enabled in the log configuration. Otherwise, the endpoint will always return HTTP status code 404.

Also note that process chain logs are stored on the machine where the Steep agent that has executed the process chain is running. The log files will only be available as long as the machine exists and the agent is still available. If you want to persist log files use the log configuration and define a location on a shared file system.

This endpoint supports HTTP range requests, which allows you to only fetch a portion of a process chain’s log file.

Resource URL
/logs/processchains/:id
Parameters
idThe ID of the process chain whose log file to return
forceDownload
(optional)
true if the Content-Disposition header should be set in the response. This is useful if you link to a log file from a web page and want the browser to download the file instead of displaying its contents. The default value is false.
Request headers
Range
(optional)
The part of the log file that should be returned (see HTTP Range header)
Response headers
Accept-RangesA marker to advertise support of range requests (see HTTP Accept-Ranges header)
Content-Range
(optional)
Indicates what part of the log file is delivered (see HTTP Content-Range header)
Status codes
200The operation was successful
206Partial content will be delivered in response to a range request
404Process chain log file could not be found. Possible reasons: (1) the process chain has not produced any output (yet), (2) the agent that has executed the process chain is not available anymore, (3) process chain logging is disabled in Steep’s configuration
416Range not satisfiable
Example request
GET /logs/processchains/akukkcsv575splk3v2na HTTP/1.1
Example response
HTTP/1.1 200 OK
content-type: text/plain
accept-ranges: bytes
content-encoding: gzip
transfer-encoding: chunked

2021-03-04 15:34:17 - Hello world!

5.10HEAD process chain logs

This endpoint can be used to check if a process chain log file exists and how large it is. For more information, please refer to the GET process chain logs endpoint.

Resource URL
/logs/processchains/:id
Parameters
idThe ID of the process chain whose log file to check
Response headers
Content-LengthThe total size of the log file
Accept-RangesA marker to advertise support of range requests (see HTTP Accept-Ranges header)
Status codes
200The operation was successful
404Process chain log file could not be found. Possible reasons: (1) the process chain has not produced any output (yet), (2) the agent that has executed the process chain is not available anymore, (3) process chain logging is disabled in Steep’s configuration
Example request
HEAD /logs/processchains/akukkcsv575splk3v2na HTTP/1.1
Example response
HTTP/1.1 200 OK
content-type: text/plain
content-length: 1291
accept-ranges: bytes

5.11PUT process chain

Update a process chain. The request body is a JSON object with the process chain properties to update. At the moment, only the status and priority properties can be updated.

If the operation was successful, the response body contains the updated process chain without the properties executables and results.

Notes:

  • You can use this endpoint to cancel the execution of a process chain (see example below).
  • The priority can only be modified if the process chain status is REGISTERED or RUNNING. Otherwise, the operation will result in HTTP error 422.
Resource URL
/processchains/:id
Parameters
idThe ID of the process chain to update
Status codes
200The operation was successful
400The request body was invalid
404The process chain was not found
422The process chain’s priority could not be modified because the process chain is already finished, i.e. the process chain status is not REGISTERED or RUNNING
Example request
PUT /processchains/akuxzp4rojbw7mnvovcq HTTP/1.1
Content-Length: 28
Content-Type: application/json

{
  "status": "CANCELLED"
}
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 222
content-type: application/json

{
  "id": "akuxzp4rojbw7mnvovcq",
  "submissionId": "akuxzp4rojbw7mnvovbq",
  "startTime": "2020-05-26T11:06:24.055225Z",
  "endTime": "2020-05-26T11:06:52.367194Z",
  "status": "CANCELLED",
  "requiredCapabilities": []
}

5.12GET agents

Get information about all agents currently connected to the cluster. In order to get details about a single agent, use the GET agent by ID endpoint.

Resource URL
/agents
Parameters
None
Status codes
200The operation was successful
Example request
GET /agents HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 195
content-type: application/json

[
  {
    "id": "akuxryerojbw7mnvovaa",
    "available": false,
    "capabilities": [],
    "startTime": "2020-05-26T10:50:02.001998Z",
    "stateChangedTime": "2020-05-26T11:06:52.367121Z"
  },
  {
    "id": "akvn7r3szw5wiztrnotq",
    "available": true,
    "capabilities": [],
    "startTime": "2020-05-27T12:21:24.548640Z",
    "stateChangedTime": "2020-05-27T12:21:24.548640Z"
  }
]

5.13GET agent by ID

Get details about a single agent.

Resource URL
/agents/:id
Parameters
idThe ID of the agent to return
Status codes
200The operation was successful
404The agent was not found
Example request
GET /processchains/akuxryerojbw7mnvovaa HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 177
content-type: application/json

{
  "id": "akuxryerojbw7mnvovaa",
  "available": false,
  "capabilities": [],
  "startTime": "2020-05-26T10:50:02.001998Z",
  "stateChangedTime": "2020-05-26T11:06:52.367121Z"
}

5.14GET VMs

Get information about all VMs in the database. To get details about a single VM, use the GET VM by ID endpoint.

The VMs are returned in the order in which they were added to the database with the newest ones at the top.

Resource URL
/vms
Parameters
size
(optional)
The maximum number of VMs to return. The default value is 10.
offset
(optional)
The offset of the first VM to return. The default value is 0.
status
(optional)
If this parameter is defined, Steep will only return VMs with the given status. Otherwise, it will return all VMs from the database. See the list of VM statuses for valid values.
Response headers
x-page-sizeThe size of the current page (i.e. the maximum number of VM objects returned). See size request parameter.
x-page-offsetThe offset of the first VM returned. See offset request parameter
x-page-totalThe total number of VMs in the database matching the given request parameters.
Status codes
200The operation was successful
400One of the parameters was invalid. See response body for error message.
Example request
GET /vms HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 2402
content-type: application/json
x-page-offset: 0
x-page-size: 10
x-page-total: 614

[
  {
    "id": "akvn5rmvrozqzj5k3n7a",
    "externalId": "cc6bb115-5852-4646-87c0-d61a9e275722",
    "ipAddress": "10.90.5.10",
    "creationTime": "2020-05-27T12:17:01.861596Z",
    "agentJoinTime": "2020-05-27T12:18:27.957192Z",
    "status": "LEFT",
    "setup": {
      "id": "default",
      "flavor": "7d217779-4d7b-4689-8a40-c12a377b946d",
      "imageName": "Ubuntu 18.04",
      "availabilityZone": "nova",
      "blockDeviceSizeGb": 50,
      "minVMs": 0,
      "maxVMs": 4,
      "provisioningScripts": [
        "conf/setups/default/01_docker.sh",
        "conf/setups/default/02_steep.sh"
      ],
      "providedCapabilities": ["docker"]
    }
  },
  {
    "id": "akvnmkuvrozqzj5k3mza",
    "externalId": "f9ecfb9c-d0a1-45c9-87fc-3595bebc85c6",
    "ipAddress": "10.90.5.24",
    "creationTime": "2020-05-27T11:40:19.142991Z",
    "agentJoinTime": "2020-05-27T11:41:42.349354Z",
    "destructionTime": "2020-05-27T11:50:58.961455Z",
    "status": "DESTROYED",
    "reason": "Agent has left the cluster",
    "setup": {
      "id": "default",
      "flavor": "7d217779-4d7b-4689-8a40-c12a377b946d",
      "imageName": "Ubuntu 18.04",
      "availabilityZone": "nova",
      "blockDeviceSizeGb": 50,
      "minVMs": 0,
      "maxVMs": 4,
      "provisioningScripts": [
        "conf/setups/default/01_docker.sh",
        "conf/setups/default/02_steep.sh"
      ],
      "providedCapabilities": ["docker"]
    }
  },
  …
]

5.15GET VM by ID

Get details about a single VM from the database.

Resource URL
/vms/:id
Parameters
idThe ID of the VM to return
Status codes
200The operation was successful
404The VM was not found
Example request
GET /vms/akvn5rmvrozqzj5k3n7a HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 617
content-type: application/json

{
  "id": "akvn5rmvrozqzj5k3n7a",
  "externalId": "cc6bb115-5852-4646-87c0-d61a9e275722",
  "ipAddress": "10.90.5.10",
  "creationTime": "2020-05-27T12:17:01.861596Z",
  "agentJoinTime": "2020-05-27T12:18:27.957192Z",
  "status": "LEFT",
  "setup": {
    "id": "default",
    "flavor": "7d217779-4d7b-4689-8a40-c12a377b946d",
    "imageName": "Ubuntu 18.04",
    "availabilityZone": "nova",
    "blockDeviceSizeGb": 50,
    "minVMs": 0,
    "maxVMs": 4,
    "provisioningScripts": [
      "conf/setups/default/01_docker.sh",
      "conf/setups/default/02_steep.sh"
    ],
    "providedCapabilities": ["docker"]
  }
}

5.16GET plugins

Get information about all configured plugins. To get information about a single plugin, use the GET plugin by name endpoint.

Resource URL
/plugins
Parameters
None
Status codes
200The operation was successful
Example request
GET /plugins HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 159
content-type: application/json

[
    {
        "type": "outputAdapter",
        "name": "fileOrEmptyList",
        "version": "1.0.0",
        "scriptFile": "conf/plugins/fileOrEmptyList.kt",
        "supportedDataType": "fileOrEmptyList"
    }
]

5.17GET plugin by name

Get information about a single plugin.

Resource URL
/services/:name
Parameters
nameThe name of the plugin to return
Status codes
200The operation was successful
404The plugin was not found
Example request
GET /plugins/fileOrEmptyList HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 136
content-type: application/json

{
    "type": "outputAdapter",
    "name": "fileOrEmptyList",
    "version": "1.0.0",
    "scriptFile": "conf/plugins/fileOrEmptyList.kt",
    "supportedDataType": "fileOrEmptyList"
}

5.18GET services

Get information about all configured service metadata. To get metadata of a single service, use the GET service by ID endpoint.

Resource URL
/services
Parameters
None
Status codes
200The operation was successful
Example request
GET /services HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 2167
content-type: application/json

[
  {
    "id": "cp",
    "name": "cp",
    "description": "Copies files",
    "path": "cp",
    "runtime": "other",
    "parameters": [{
      "id": "no_overwrite",
      "name": "No overwrite",
      "description": "Do not overwrite existing file",
      "type": "input",
      "cardinality": "1..1",
      "dataType": "boolean",
      "default": false,
      "label": "-n"
    }, {
      "id": "input_file",
      "name": "Input file name",
      "description": "Input file name",
      "type": "input",
      "cardinality": "1..1",
      "dataType": "file"
    }, {
      "id": "output_file",
      "name": "Output file name",
      "description": "Output file name",
      "type": "output",
      "cardinality": "1..1",
      "dataType": "file"
    }],
    "runtimeArgs": [],
    "requiredCapabilities": []
  }, {
    "id": "sleep",
    "name": "sleep",
    "description": "sleeps for the given amount of seconds",
    "path": "sleep",
    "runtime": "other",
    "parameters": [{
      "id": "seconds",
      "name": "seconds to sleep",
      "description": "The number of seconds to sleep",
      "type": "input",
      "cardinality": "1..1",
      "dataType": "integer"
    }],
    "runtimeArgs": [],
    "requiredCapabilities": []
  },
  …
]

5.19GET service by ID

Get configured metadata of a single service.

Resource URL
/services/:id
Parameters
idThe ID of the service to return
Status codes
200The operation was successful
404The service metadata was not found
Example request
GET /services/sleep HTTP/1.1
Example response
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 401
content-type: application/json

{
  "id": "sleep",
  "name": "sleep",
  "description": "sleeps for the given amount of seconds",
  "path": "sleep",
  "runtime": "other",
  "parameters": [{
    "id": "seconds",
    "name": "seconds to sleep",
    "description": "The number of seconds to sleep",
    "type": "input",
    "cardinality": "1..1",
    "dataType": "integer"
  }],
  "runtimeArgs": [],
  "requiredCapabilities": []
}

5.21GET Prometheus metrics

Steep can provide metrics to Prometheus. Besides statistics about the Java Virtual Machine that Steep is running in, the following metrics are included:

MetricDescription
steep_​remote_​agentsThe number of registered remote agents
steep_​controller_​process_​chainsThe number of generated process chains the controller is currently waiting for grouped by submissionId
steep_​scheduler_​process_​chainsThe number of process chains with a given status (indicated by the label status)
steep_​local_​agent_​retriesThe total number of times an executable with a given service ID (indicated by the label serviceId) had to be retried
steep_​eventbus_​compressedjson_​messagesTotal number of compressed JSON messages sent/received over the event bus (label operation is either sent or recv)
steep_​eventbus_​compressedjson_​bytesTotal number of compressed JSON bytes. Label operation is either sent or recv, and label stage is either compressed or uncompressed. Uncompressed bytes will be compressed before they are sent over the event bus, and received compressed bytes will be uncompressed.
steep_​eventbus_​compressedjson_​timeTotal number of milliseconds spent compressing or decompressing JSON (label operation is either sent or recv and specifies if sent bytes have been compressed or received bytes have been uncompressed)
Resource URL
/metrics
Parameters
None
Status codes
200The operation was successful
Example request
GET /metrics HTTP/1.1
Example response
HTTP/1.1 200 OK
content-type: text/plain
content-encoding: gzip
content-length: 1674

# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 2.1695392E8
jvm_memory_bytes_used{area="nonheap",} 1.46509968E8
…
# HELP steep_remote_agents Number of registered remote agents
# TYPE steep_remote_agents gauge
steep_remote_agents 1.0# HELP steep_scheduler_process_chains Number of process chains with a given status
# TYPE steep_scheduler_process_chains gauge
steep_scheduler_process_chains{status="SUCCESS",} 2.0
steep_scheduler_process_chains{status="RUNNING",} 0.0
steep_scheduler_process_chains{status="ERROR",} 1.0

6Web-based user interface

Steep has a web-based user interface that allows you to monitor the execution of running workflows, process chains, agents, and VMS, as well as to browse the database contents.

Start Steep and visit any of the above-mentioned HTTP endpoints with your web browser to open the user interface.

7Configuration

After you have downloaded and extracted Steep, you can find its configuration files under the conf directory. The following sections describe each of these files in detail.

7.1steep.yaml

The file steep.​yaml contains the main configuration of Steep. In this section, we describe all configuration keys and values you can set.

Note that keys are specified using the dot notation. You can use them as they are given here or use YAML notation instead. For example, the following configuration item

steep.cluster.eventBus.publicPort: 41187

is identical to:

steep:
  cluster:
    eventBus:
      publicPort: 41187

You may override items in your configuration file with environment variables. This is particularly useful if you are using Steep inside a Docker container. The environment variables use a slightly different naming scheme. All variables are in capital letters and dots are replaced by underscores. For example, the configuration key steep.​http.​host becomes STEEP_​HTTP_​HOST and steep.​cluster.​eventBus.​publicPort becomes STEEP_​CLUSTER_​EVENTBUS_​PUBLICPORT. You may use YAML syntax to specify environment variable values. For example, the array steep.​agent.​capabilities can be specified as follows:

STEEP_AGENT_CAPABILITIES=["docker", "python"]
7.1.1General configuration

steep.​tmpPath

The path to a directory where temporary files should be stored during processing. Steep generates names for the outputs of execute actions in a workflow. If the store flag of an output parameter is false (which is the default), the generated filename will be relative to this temporary directory.

steep.​outPath

The path to a directory where output files should be stored. This path will be used instead of steep.​tmpPath to generate a filename for an output parameter if its store flag is true.

steep.​overrideConfigFile

The path to a file that keeps additional configuration. The values of the overrideConfigFile will be merged into the main configuration file, so it basically overrides the default values. Note that configuration items in this file can still be overridden with environment variables. This configuration item is useful if you don’t want to change the main configuration file (or if you cannot do so) but still want to set different configuration values. Use it if you run Steep in a Docker container and bind mount the overrideConfigFile as a volume.

steep.​services

The path to the configuration files containing service metadata. Either a string pointing to a single file, a glob pattern (e.g. **/​*.​yaml), or an array of files or glob patterns.

steep.​plugins

The path to the configuration file(s) containing plugin descriptors. Either a string pointing to a single file, a glob pattern (e.g. **/​*.​yaml), or an array of files or glob patterns.

7.1.2Cluster settings

Use these configuration items to build up a cluster of Steep instances. Under the hood, Steep uses Vert.x and Hazelcast, so these configuration items are very similar to the ones found in these two frameworks. To build up a cluster, you need to configure an event bus connection and a cluster connection. They should use different ports. host typically refers to the machine your instance is running on and publicHost or publicAddress specify the hostname or IP address that your Steep instance will use in your network to advertise itself so that other instances can connect to it.

For more information, please read the documentation of Vert.x and Hazelcast.

steep.​cluster.​eventBus.​host

The IP address (or hostname) to bind the clustered eventbus to

Default: Automatically detected local network interface

steep.​cluster.​eventBus.​port

The port the clustered eventbus should listen on

Default: A random port

steep.​cluster.​eventBus.​publicHost

The IP address (or hostname) the eventbus uses to announce itself within in the cluster

Default: Same as steep.​cluster.​eventBus.​host

steep.​cluster.​eventBus.​publicPort

The port that the eventbus uses to announce itself within in the cluster

Default: Same as steep.​cluster.​eventBus.​port

steep.​cluster.​hazelcast.​clusterName

An optional cluster name that can be used to separate clusters of Steep instances. Two instances from different clusters (with different names) cannot connect to each other.

By default, no cluster name is set, which means all instances can connect to each other. However, a Steep instance without a cluster name cannot connect to a named cluster.

Heads up: if you have a cluster name set and you’re using a cloud connection to deploy remote agents on demand, make sure these Steep instances use the same cluster name. Otherwise, you won’t be able to connect to them.

steep.​cluster.​hazelcast.​publicAddress

The IP address (or hostname) and port Hazelcast uses to announce itself within in the cluster

steep.​cluster.​hazelcast.​port

The port that Hazelcast should listen on

steep.​cluster.​hazelcast.​interfaces

A list of IP address patterns specifying valid interfaces Hazelcast should bind to

steep.​cluster.​hazelcast.​members

A list of IP addresses (or hostnames) of Hazelcast cluster members

steep.​cluster.​hazelcast.​tcpEnabled

true if Hazelcast should use TCP to connect to other instances, false if it should use multicast

Default: false

steep.​cluster.​hazelcast.​restoreMembersOnStartup.​enabled

true if Steep should try to load IP addresses of possibly still running VMs from its database during startup and add them to steep.​cluster.​hazelcast.​members. This is useful if a Steep instance has crashed and should be reintegrated into an existing cluster when it’s back.

Default: false

steep.​cluster.​hazelcast.​restoreMembersOnStartup.​defaultPort

If steep.​cluster.​hazelcast.​restoreMembersOnStartup.​enabled is true, potential Hazelcast cluster members will be restored from database. This configuration item specifies on which Hazelcast port these members are listening.

steep.​cluster.​hazelcast.​placementGroupName

An optional name specifying in which group this Hazelcast member should be placed. Steep uses distributed maps to share data between instances. Data in these maps is partitioned (i.e. distributed to the individual cluster members). In a large cluster, no member keeps all the data. Most nodes only keep a small fraction of the data (a partition).

To make sure data is not lost if a member goes down, Hazelcast uses backups to distribute copies of the data across the cluster. By specifying a placement group, you can control how Hazelcast distributes these backups. Hazelcast will always prefer creating backups in a group that does not own the data so that if all members of a group go down, the other group still has all the backup data.

Examples for sensible groups are racks, data centers, or availability zones.

For more information, see the following links:

Note that if you configure a placement group name, all members in your cluster must also have a placement group name. Otherwise, you will receive an exception about mismatching configuration on startup.

steep.​cluster.​hazelcast.​liteMember

true if this instance should be a Hazelcast lite member. Lite members do not own any in-memory data. They are mainly used for compute-intensive tasks. With regard to Steep, an instance with a controller and a scheduler should not be a lite member, because these components heavily rely on internal state. A Steep instance that only contains an agent and therefore only executes services, however, could be a lite member. See the architecture section for more information about these components.

Your cluster cannot consist of only lite members. Otherwise, it is not able to maintain internal state at all.

Note that since lite members cannot keep data, they are not suitable to keep backups either. See steep.​cluster.​hazelcast.​placementGroupName for more information. For reasons of reliability, a cluster should contain at least three full (i.e. non-lite) members.

steep.​cluster.​lookupOrphansInterval

The interval at which Steep’s main thread looks for orphaned entries in its internal remote agent registry (specified as a duration). Such entries may (very rarely) happen if there is a network failure during deregistration of an agent. You normally do not have to change this configuration.

Default: 5m

7.1.3HTTP configuration

steep.​http.​enabled

true if the HTTP interface should be enabled

Default: true

steep.​http.​host

The host to bind the HTTP server to

Default: localhost

steep.​http.​port

The port the HTTP server should listen on

Default: 8080

steep.​http.​postMaxSize

The maximum size of HTTP POST bodies in bytes

Default: 1048576 (1 MB)

steep.​http.​basePath

The path where the HTTP endpoints and the web-based user interface should be mounted

Default: "" (empty string, i.e. no base path)

steep.​http.​cors.​enable

true if Cross-Origin Resource Sharing (CORS) should be enabled

Default: false

steep.​http.​cors.​allowOrigin

A regular expression specifying allowed CORS origins. Use *​ to allow all origins.

Default: "$.​" (match nothing by default)

steep.​http.​cors.​allowCredentials

true if the Access-​Control-​Allow-​Credentials response header should be returned.

Default: false

steep.​http.​cors.​allowHeaders

A string or an array indicating which header field names can be used in a request.

steep.​http.​cors.​allowMethods

A string or an array indicating which HTTP methods can be used in a request.

steep.​http.​cors.​exposeHeaders

A string or an array indicating which headers are safe to expose to the API of a CORS API specification.

steep.​http.​cors.​maxAgeSeconds

The number of seconds the results of a preflight request can be cached in a preflight result cache.

7.1.4Controller configuration

steep.​controller.​enabled

true if the controller should be enabled. Set this value to false if your Steep instance does not have access to the shared database.

Default: true

steep.​controller.​lookupInterval

The interval at which the controller looks for accepted submissions, specified as a duration.

Default: 2s

steep.​controller.​lookupMaxErrors

The maximum number of consecutive errors (e.g. database connection issues) to tolerate when looking up the status of process chains of a running submission. If there are more errors, the submission will be aborted.

Default: 5

steep.​controller.​lookupOrphansInterval

The interval at which the controller looks for orphaned running submissions (i.e. submissions that are in the status RUNNING but that are currently not being processed by any controller instance), specified as a duration. If Steep finds such a submission it will try to resume it.

Default: 5m

steep.​controller.​lookupOrphansInitialDelay

The time the controller should wait after startup before it looks for orphaned running submissions for the first time (specified as a duration). This property is useful if you want to implement a rolling update from one Steep instance to another.

Default: 0s

7.1.5Scheduler configuration

steep.​scheduler.​enabled

true if the scheduler should be enabled. Set this value to false if your Steep instance does not have access to the shared database.

Default: true

steep.​scheduler.​lookupInterval

The interval at which the scheduler looks for registered process chains, specified as a duration.

Default: 20s

steep.​scheduler.​lookupOrphansInterval

The interval at which the scheduler looks for orphaned running process chains (i.e. process chains that are in the status RUNNING but that are currently not being processed by any scheduler instance), specified as a duration. Note that the scheduler also always looks for orphaned process chains when it detects that another scheduler instance has just left the cluster (regardless of the configured interval).

Default: 5m

steep.​scheduler.​lookupOrphansInitialDelay

The time the scheduler should wait after startup before it looks for orphaned running process chains for the first time (specified as a duration). This property is useful if you want to implement a rolling update from one Steep instance to another. Note that the scheduler also looks for orphaned process chains when another scheduler instance has just left the cluster, even if the initial delay has not passed by yet.

Default: 0s

7.1.6Agent configuration

steep.​agent.​enabled

true if this Steep instance should be able to execute process chains (i.e. if one or more agents should be deployed)

Default: true

steep.​agent.​instances

The number of agents that should be deployed within this Steep instance (i.e. how many executables the Steep instance can run in parallel)

Default: 1

steep.​agent.​id

Unique identifier for the first agent instance deployed. Subsequent agent instances will have a consecutive number appended to their IDs.

Default: (an automatically generated unique ID)

steep.​agent.​capabilities

List of capabilities that the agents provide

Default: [] (empty list)

steep.​agent.​autoShutdownTimeout

The time any agent instance can remain idle until Steep shuts itself down gracefully (specified as a duration). By default, this value is 0s, which means Steep never shuts itself down.

Default: 0s

steep.​agent.​busyTimeout

The time that should pass before an idle agent decides that it is not busy anymore (specified as a duration). Normally, the scheduler allocates an agent, sends it a process chain, and then deallocates it after the process chain execution has finished. This value is important if the scheduler crashes while the process chain is being executed and does not deallocate the agent anymore. In this case, the agent deallocates itself after the configured time has passed.

Default: 1m

steep.​agent.​outputLinesToCollect

The number of output lines to collect at most from each executed service (also applies to error output)

Default: 100

7.1.7Runtime settings

steep.​runtimes.​docker.​env

Additional environment variables that will be passed to containers created by the Docker runtime

Example: ["key=value", "foo=bar"]

Default: [] (empty list)

steep.​runtimes.​docker.​volumes

Additional volume mounts to be passed to the Docker runtime

Example: ["/​data:/​data"]

Default: [] (empty list)

7.1.8Database connection

steep.​db.​driver

The database driver

Valid values: inmemory, postgresql, mongodb

Default: inmemory

steep.​db.​url

The database URL

steep.​db.​username

The database username (only used by the postgresql driver)

steep.​db.​password

The database password (only used by the postgresql driver)

steep.​db.​connectionPool.​maxSize

The maximum number of connections to keep open (i.e. to keep in the connection pool)

steep.​db.​connectionPool.​maxIdleTime

The maximum time an idle connection should be kept in the connection pool before it is closed

7.1.9Cloud connection

steep.​cloud.​enabled

true if Steep should connect to a cloud to acquire remote agents on demand

Default: false

steep.​cloud.​driver

Defines which cloud driver to use

Valid values: openstack (see the OpenStack cloud driver for more information)

steep.​cloud.​createdByTag

A metadata tag that should be attached to virtual machines to indicate that they have been created by Steep

steep.​cloud.​syncInterval

The time that should pass before the cloud manager synchronizes its internal state with the cloud again, specified as a duration.

Default: 2m

steep.​cloud.​keepAliveInterval

The time that should pass before the cloud manager sends keep-alive messages to a minimum of remote agents again (so that they do not shut down themselves), specified as a duration. See minVMs property of the setups data model.

Default: 30s

steep.​cloud.​setups.​file

The path to the file that describes all available setups. See setups.yaml.

steep.​cloud.​setups.​creation.​retries

A retry policy that specifies how many attempts should be made to create a VM from a certain setup (if creation fails) as well as possible (exponential) delays between those attempts.

Default:

retries:
  maxAttempts: 5
  delay: 40s
  exponentialBackoff: 2

steep.​cloud.​setups.​lockAfterRetries

When the maximum number of attempts to create a VM from a certain setup has been reached (see steep.​cloud.​setups.​creation.​retries), the setup will be locked and no other VM with this setup will be created. This parameter defines how long it will be locked, specified as a duration.

Default: 20m

steep.​cloud.​timeouts.​sshReady

The maximum time the cloud manager should try to log in to a new VM via SSH (specified as a duration). The cloud manager will make a login attempt every 2 seconds until it is successful or until the maximum number of seconds have passed, in which case it will destroy the VM.

Default: 5m

steep.​cloud.​timeouts.​agentReady

The maximum time the cloud manager should wait for an agent on a new VM to become available (i.e. how long a new Steep instance may take to register with the cluster) before it destroys the VM again (specified as a duration).

Default: 5m

steep.​cloud.​timeouts.​createVM

The maximum time that creating a VM may take before it is aborted with an error (specified as a duration).

Default: 5m

steep.​cloud.​timeouts.​destroyVM

The maximum time that destroying a VM may take before it is aborted with an error (specified as a duration).

Default: 5m

steep.​cloud.​agentPool

An array of agent pool parameters describing how many remote agents the cloud manager should keep in its pool how many it is allowed to create for each given set of capabilities.

Default: [] (empty list)

7.1.10OpenStack cloud driver

steep.​cloud.​openstack.​endpoint

OpenStack authentication endpoint

steep.​cloud.​openstack.​username

OpenStack username used for authentication

steep.​cloud.​openstack.​password

OpenStack password used for authentication

steep.​cloud.​openstack.​domainName

OpenStack domain name used for authentication

steep.​cloud.​openstack.​projectId

The ID of the OpenStack project to which to connect. Either this configuration item or steep.​cloud.​openstack.​projectName must be set but not both at the same time.

steep.​cloud.​openstack.​projectName

The name of the OpenStack project to which to connect. This configuration item will be used in combination with steep.​cloud.​openstack.​domainName if steep.​cloud.​openstack.​projectId is not set.

steep.​cloud.​openstack.​networkId

The ID of the OpenStack network to attach new VMs to

steep.​cloud.​openstack.​usePublicIp

true if new VMs should have a public IP address

Default: false

steep.​cloud.​openstack.​securityGroups

The OpenStack security groups that should be attached to new VMs.

Default: [] (empty list)

steep.​cloud.​openstack.​keypairName

The name of the keypair to deploy to new VMs. The keypair must already exist in OpenStack.

7.1.11SSH connection to VMs

steep.​cloud.​ssh.​username

Username for SSH access to VMs. Can be overriden by the sshUsername property in each setup. May even be null if all setups define their own username.

steep.​cloud.​ssh.​privateKeyLocation

Location of a private key to use for SSH

7.1.12Log configuration

steep.​logs.​level

The default log level for all loggers (console as well as file-based)

Valid values: TRACE, DEBUG, INFO, WARN, ERROR, OFF.

Default: DEBUG

steep.​logs.​main.​enabled

true if logging to the main log file should be enabled

Default: false

steep.​logs.​main.​logFile

The name of the main log file

Default: logs/​steep.​log

steep.​logs.​main.​dailyRollover.​enabled

true if main log files should be renamed every day. The file name will be based on steep.​logs.​main.​logFile and the file’s date in the form YYYY-MM-DD (e.g. steep.​2020-11-19.​log)

Default: true

steep.​logs.​main.​dailyRollover.​maxDays

The maximum number of days’ worth of main log files to keep

Default: 7

steep.​logs.​main.​dailyRollover.​maxSize

The total maximum size of all main log files in bytes. Oldest log files will deleted when this size is reached.

Default: 104857600 (100 MB)

steep.​logs.​processChains.​enabled

true if the output of process chains should be logged separately to disk. The output will still also appear on the console and in the main log file (if enabled), but there, it’s not separated by process chain. This feature is useful if you want to record the output of individual process chains and make it available through the process chain logs endpoint.

Default: false

steep.​logs.​processChains.​path

The path where process chain logs will be stored. Individual files will will be named after the ID of the corresponding process chain (e.g. aprsqz6d5f4aiwsdzbsq.​log).

Default: logs/​processchains

steep.​logs.​processChains.​groupByPrefix

Set this configuration item to a value greater than 0 to group process chain log files by prefix in subdirectories under the directory configured through steep.​logs.​processChains.​path. For example, if this configuration item is set to 3, Steep will create a separate subdirectory for all process chains whose ID starts with the same three characters. The name of this subdirectory will be these three characters. The process chains apomaokjbk3dmqovemwa and apomaokjbk3dmqovemsq will be put into a subdirectory called apo, and the process chain ao344a53oyoqwhdelmna will be put into ao3. Note that in practice, 3 is a reasonable value, which will create a new directory about every day. A value of 0 disables grouping.

Default: 0

7.1.13Garbage collector configuration

steep.​garbageCollector.​enabled

true if the garbage collector should be enabled. The garbage collector runs in the background and removes outdated objects from the database at the interval specified with steep.​garbageCollector.​cron

Default: false

steep.​garbageCollector.​cron

A UNIX-like cron expression specifying the interval at which the garbage collector should be executed. Cron expressions consist of six required fields and one optional field separated by a white space:

SECONDS MINUTES HOURS DAY-OF-MONTH MONTH DAY-OF-WEEK [YEAR].

Use an asterisk * to specify all values (e.g. every second or every minute). Use a question mark ? for DAY-OF-MONTH or DAY-OF-WEEK to specify no value (only one of DAY-OF-MONTH or DAY-OF-WEEK can be specified at the same time). Use a slash /​ to specify increments (e.g. */​5 for every 5 minutes).

More information about the format can be found in the javadoc of the org.​quartz.​CronExpression class.

Example: 0 0 0 * * ? (daily at 12am)

steep.​garbageCollector.​retention.​submissions

The maximum time a submission should be kept in the database after it has finished (regardless of whether it was successful or not). The time can be specified as a human-readable duration.

Default: null (submissions will be kept indefinitely)

steep.​garbageCollector.​retention.​vms

The maximum time a VM should be kept in the database after it has been destroyed (regardless of its status). The time can be specified as a human-readable duration.

Default: null (VMs will be kept indefinitely)

7.1.14Cache configuration

steep.​cache.​plugins.​enabled

true if the persistent compiled plugin cache should be enabled. Steep updates this cache on startup when it has first compiled a plugin script or when it detects that a previously compiled script has changed. On subsequent startups, Steep can utilize the cache to skip compilation of known plugins and, therefore, to reduce startup time.

Default: false

steep.​cache.​plugins.​path

The path to a directory where Steep should store compiled plugin scripts if the persistent compiled plugin cache is enabled (see steep.​cache.​plugins.​enabled).

Default: .​cache/​plugins

7.2setups.yaml

The configuration file setups.​yaml contains an array of setup objects that Steep’s cloud manager component uses to create new virtual machines and to deploy remote agents to it.

7.3services/​services.yaml

The file services/​services.​yaml contains an array of service metadata objects describing the interfaces of all processing services Steep can execute.

7.4plugins/​common.yaml

The configuration file plugins/​common.​yaml describes all plugins so Steep can compile and use them during runtime. The file contains an array of descriptor objects with the properties specified in the section on extending Steep through plugins.

7.5Advanced configuration topics

7.5.1Provisioning scripts

After Steep’s cloud manager has created a VM, it uploads the provisioning scripts defined in the corresponding setup to the VM and executes them. The scripts can be used to deploy software to the VM and to make sure all required services (including Steep) are running.

Each provisioning script is a simple executable script file with a valid UNIX shebang at the beginning.

Script files will be processed by a template engine before they are uploaded. The following sections describe supported template instructions.

7.5.1.1Global variables

Here is a list of the global variables defined in provisioning scripts:

VariableDescription
agentIdThe ID of the agent that is about to be deployed on the provisioned VM
ipAddressThe IP address of the provisioned VM
setupA Setup object that describes the configuration of the provisioned VM
Example provisioning script
#!/usr/bin/env bash

echo {{ ipAddress }}
7.5.1.2Environment variables

Access any environment variable defined on the system where Steep is running (not the VM to be provisioned) as follows:

{{ env["name_of_the_environment_variable"] }}
Example provisioning script
#!/usr/bin/env bash

echo "{{ env["TOKEN"] }}" > /etc/token.conf
7.5.1.3Configuration values

Use values from Steep’s main configuration file (steep.yaml) as follows:

{{ config["name_of_the_configuration_item"] }}
Example provisioning script
#!/usr/bin/env bash

docker run -u root -d -p 5701:5701 -p 41187:41187 \
  --restart=on-failure --name steep \
  -e "STEEP_OUTPATH={{ config["steep.outPath"] }}" \
  -e "STEEP_TMPPATH={{ config["steep.tmpPath"] }}" \
  steep:latest
7.5.1.4Read local files

Insert the contents of a local file into the provisioning script.

{{ readFile("path_to_the_file_to_insert") }}
Example provisioning script
#!/usr/bin/env bash

echo "{{ readFile("hello_world.txt") }}"
7.5.1.5Upload local files to remote

Upload one or more local files to the remote VM.

{{ upload("source", "destination") }}

source may contain a glob pattern (e.g. *.​sh or **/​*.​yaml) if you want to upload multiple files or if you want to recursively transfer a whole directory tree. In this case, destination should be a directory and it should exist.

Example provisioning script
#!/usr/bin/env bash

mkdir -p /etc/steep
{{ upload("conf/**/*.yaml", "/etc/steep") }}
7.5.2Using YAML anchors

Configuration files for setups (setups.yaml) and service metadata (services/services.yaml) often contain repeated information (e.g. two setups might offer the same capabilities). You can use YAML anchors to simplify your configuration files.

The following example shows two services sharing a parameter:

- id: sleep
  name: sleep
  description: sleeps for the given amount of seconds
  path: sleep
  runtime: other

  parameters:
    - &sleep_seconds
      id: seconds
      name: seconds to sleep
      description: The number of seconds to sleep
      type: input
      cardinality: 1..1
      dataType: integer

- id: docker_sleep
  name: Docker sleep
  description: Run sleep inside an alpine container
  path: alpine
  runtime: docker

  parameters:
    - id: sleep
      name: sleep command
      description: The sleep command
      type: input
      cardinality: 1..1
      dataType: string
      default: sleep

    - *sleep_seconds
7.5.3Using a template engine

You can use a template engine in all of Steep’s YAML-based configuration files (steep.​yaml, setups.​yaml, services.​yaml, etc.) to dynamically generate a configuration at startup.

You have to explicitly enable this feature by adding front matter to your YAML file and setting the attribute template to true:

---
template: true
---

Steep uses Pebble Templates to compile your configuration files. Please refer to their website for a full documentation on the tags, functions, and filters you can use.

Within your template, you may access the following variables:

VariableDescription
envA dictionary of environment variables. Use subscript notation ([]) to access its contents. Example: {{ env["PATH"] }}
configA dictionary of configuration properties. This variable is not available in steep.​yaml (or any override configuration file). Use subscript notation ([]) to access its contents. Example: {{ config["steep.​cluster.​hazelcast.​publicAddress"] }}

Here is a full example of a setups.​yaml file that uses the templating feature to create two setups with the same parameters, an image name from an environment variable, but different availability zones:

---
template: true
---

{% for az in ["us-east-1", "eu-central-1"] %}
- id: my_setup_{{ az }}
  flavor: m4.2xlarge
  availabilityZone: {{ az }}
  imageName: {{ env["MY_SETUP_IMAGE_NAME"] }
  blockDeviceSizeGb: 50
  minVMs: 0
  maxVMs: 10
{% endfor %}

8Extending Steep through plugins

Steep can be extended through plugins. In this section, we will describe the interfaces of all plugins and how they need to be configured so Steep can compile and execute them at runtime.

Each plugin is a Kotlin script with the file extension .​kt or .​kts. Inside this script, there should be a single function with the same name as the plugin and a signature that depends on the plugin type. Function interfaces are described in the sub-sections below.

All plugins must be referenced in the plugins/common.yaml file. This file is an array of descriptor objects with at least the following properties:

PropertyTypeDescription
name
(required)
stringA unique name of the plugin (the function inside the plugin’s script file must have the same name)
type
(required)
stringThe plugin type. Valid values are: initializer, outputAdapter, processChainAdapter, processChainConsistencyChecker, and runtime.
scriptFile
(required)
stringThe path to the plugin’s Kotlin script file. The file should have the extension .​kt or .​kts. The path is relative to Steep’s application directory, so a valid example is conf/​plugins/​fileOrEmptyList.​kt.
version
(optional)
stringThe plugin’s version. Must follow the Semantic Versioning Specification.

Specific plugin types may require additional properties described in the sub-sections below.

Plugins are compiled on demand when Steep is starting. This can take some time depending on the number of the plugins and their size. For faster start times, you can configure a persistent compiled plugin cache. Steep updates this cache on startup when it has first compiled a script or when it detects that a previously compiled script has changed. On subsequent startups, Steep can utilize the cache to skip compilation of known plugins and, therefore, to reduce startup time.

8.1Initializers

An initializer plugin is a function that will be called during the initialization phase of Steep just before components such as the controller or the scheduler are deployed. If required, the function can be a suspend function.

Type

initializer

Additional propertiesType
dependsOn
(optional)
arrayAn optional list of names of other initializer plugins that need to be executed before this plugin. If this property is not given, the order in which initializer plugins are executed is undefined.
Function interface
suspend fun myInitializer(vertx: io.vertx.core.Vertx)
Example descriptor
- name: myInitializer
  type: initializer
  scriptFile: conf/plugins/myInitializer.kt
Example plugin script
suspend fun myInitializer(vertx: io.vertx.core.Vertx) {
  println("Hello from my initializer plugin")
}

8.2Output adapters

An output adapter plugin is a function that can manipulate the output of services depending on their produced data type (see the dataType property of the service parameter data model, as well as the dataType property of the process chain argument data model).

In other words, if an output parameter of a processing service has a specific dataType defined in the service’s metadata and this data type matches the one given in the output adapter’s descriptor, then the plugin’s function will be called after the service has been executed. Steep will pass the output argument and the whole process chain (for reference) to the plugin. The output argument’s value will be the current result (i.e. the output file or directory). The plugin can modify this file or directory (if necessary) and return a new list of files that will then be used by Steep for further processing.

Steep has a built-in output adapter for the data type directory. Whenever you specify this data type in the service metadata, Steep will pass the output directory to an internal function that recursively collects all files from this directory and returns them as a list.

The example output adapter fileOrEmptyList described below is also included in Steep. It checks if an output file exists (i.e. if the processing service has actually created it) and either returns a list with a single element (the file) or an empty list. This is useful if a processing service has an optional output that you want to use as input of a subsequent for-each action or of the current for-each action via yieldToInput.

If required, the function can be a suspend function.

Type

outputAdapter

Additional propertiesType
supportedDataType
(required)
stringThe dataType that this plugin can handle
Function interface
suspend fun myOutputAdapter(output: model.processchain.Argument,
  processChain: model.processchain.ProcessChain,
  vertx: io.vertx.core.Vertx): List<Any>
Example descriptor
- name: fileOrEmptyList
  type: outputAdapter
  scriptFile: conf/plugins/fileOrEmptyList.kt
  supportedDataType: fileOrEmptyList
Example plugin script
import io.vertx.core.Vertx
import io.vertx.kotlin.core.file.existsAwait
import model.processchain.Argument
import model.processchain.ProcessChain

suspend fun fileOrEmptyList(output: Argument, processChain: ProcessChain,
    vertx: Vertx): List<String> {
  return if (!vertx.fileSystem().existsAwait(output.variable.value)) {
    emptyList()
  } else {
    listOf(output.variable.value)
  }
}

8.3Process chain adapters

A process chain adapter plugin is a function that can manipulate generated process chains before they are executed.

It takes a list of generated process chains and a reference to the workflow from which the process chains have been generated. It returns a new list of process chains to execute or the given list if no modification was made. If required, the function can be a suspend function.

Type

processChainAdapter

Additional propertiesType
dependsOn
(optional)
arrayAn optional list of names of other process chain adapter plugins that need to be executed before this plugin. If this property is not given, the order in which process chain adapter plugins are applied to process chains is undefined.
Function interface
suspend fun myProcessChainAdapter(processChains: List<model.processchain.ProcessChain>,
  workflow: model.workflow.Workflow, vertx: io.vertx.core.Vertx): List<model.processchain.ProcessChain>
Example descriptor
- name: myProcessChainAdapter
  type: processChainAdapter
  scriptFile: conf/plugins/myProcessChainAdapter.kt
Example plugin script
import model.processchain.ProcessChain
import model.workflow.Workflow
import io.vertx.core.Vertx

suspend fun myProcessChainAdapter(processChains: List<ProcessChain>,
    workflow: Workflow, vertx: Vertx): List<ProcessChain> {
  val result = mutableListOf<ProcessChain>()

  for (pc in processChains) {
    // never execute the 'sleep' service
    val executables = pc.executables.filter { e -> e.id != "sleep" }
    result.add(pc.copy(executables = executables))
  }

  return result
}

8.4Process chain consistency checkers

A process chain consistency checker plugin is a function that checks if an execute action can be added to a process chain or if adding it would lead to inconsistencies that render the process chain inexecutable.

The function is called before the execute action is converted to an executable and before it is added to the process chain. It takes a list of executables that have been collected so far during process chain generation, the execute action that is about to be converted, the workflow from which the process chain is being created, and a Vert.x instance.

It returns true if the execute action can safely be converted and added to the list of executables to make up a new process chain, or false if adding it would lead to a process chain that could not be executed later.

Steep will not discard actions that a consistency checker plugin has rejected. Instead, it will try to add them to another process chain later. At this point, the plugin will be called again.

If required, the function can be a suspend function.

Type

processChainConsistencyChecker

Additional propertiesType
dependsOn
(optional)
arrayAn optional list of names of other process chain consistency checker plugins that need to be executed before this plugin. If this property is not given, the order in which consistency checkers are applied to process chains is undefined.
Function interface
suspend fun myProcessChainConsistencyChecker(processChain: List<model.processchain.Executable>,
  action: model.workflow.ExecuteAction, workflow: model.workflow.Workflow,
  vertx: io.vertx.core.Vertx): Boolean
Example descriptor
- name: myProcessChainConsistencyChecker
  type: processChainConsistencyChecker
  scriptFile: conf/plugins/myProcessChainConsistencyChecker.kt
Example plugin script
import io.vertx.core.Vertx
import model.processchain.Executable
import model.workflow.ExecuteAction
import model.workflow.Workflow

fun dummyProcessChainConsistencyChecker(processChain: List<Executable>,
    action: ExecuteAction, workflow: Workflow, vertx: Vertx): Boolean {
  // do not add a service more than once to a process chain
  return processChain.none { it.serviceId == action.service }
}

8.5Custom runtime environments

A runtime plugin is a function that can run process chain executables inside a certain runtime environment. See the runtime property of service metadata.

The plugin’s function takes an executable to run and an output collector. The output collector is a simple interface, to which the executable’s standard output should be forwarded. If required, the function can be a suspend function.

Use this plugin if you want to implement a special way to execute processing services. For example, you can implement a remote web service call, or you can use one of the existing runtimes and run a certain service in a special way (like in the example plugin below).

Type

runtime

Additional propertiesType
supportedRuntime
(required)
stringThe name of the runtime this plugin provides. Use this value in your service metadata.
Function interface
suspend fun myRuntime(executable: model.processchain.Executable,
  outputCollector: helper.OutputCollector, vertx: io.vertx.core.Vertx)
Example descriptor (Source)
- name: ignoreError
  type: runtime
  scriptFile: conf/plugins/ignoreError.kt
  supportedRuntime: ignoreError
Example plugin script (Source)
import helper.OutputCollector
import helper.Shell.ExecutionException
import io.vertx.core.Vertx
import model.processchain.Executable
import runtime.OtherRuntime

fun ignoreError(executable: Executable, outputCollector: OutputCollector, vertx: Vertx) {
  try {
    OtherRuntime().execute(executable, outputCollector)
  } catch (e: ExecutionException) {
    if (e.exitCode != 1) {
      throw e
    }
  }
}

8.6Progress estimators

A progress estimator plugin is a function that analyses the log output of a running processing service to estimate its current progress. For example, the plugin can look for percentages or number of bytes processed. The returned value contributes to the execution progress of a process chain (see the estimatedProgress property of the process chain data model).

The function takes the executable that is currently being run and a list of recently collected output lines. It returns an estimated progress between 0.0 (0%) and 1.0 (100%) or null if the progress could not be determined. The function will be called for each output line collected and the newest line is always at the end of the given list. If required, the function can be a suspend function.

Type

progressEstimator

Additional propertiesType
supportedServiceIds
(required)
arrayAn array of IDs of the services this estimator plugin supports
Function interface
suspend fun myProgressEstimator(executable: model.processchain.Executable,
  recentLines: List<String>, vertx: io.vertx.core.Vertx): Double?
Example descriptor
- name: extractArchiveProgressEstimator
  type: progressEstimator
  scriptFile: conf/plugins/extractArchiveProgressEstimator.kt
  supportedServiceIds:
    - extract-archive
Example plugin script
import model.processchain.Executable
import io.vertx.core.Vertx

suspend fun extractArchiveProgressEstimator(executable: Executable,
    recentLines: List<String>, vertx: Vertx): Double? {
  val lastLine = recentLines.last()
  val percentSign = lastLine.indexOf('%')
  if (percentSign > 0) {
    val percentStr = lastLine.substring(0, percentSign)
    val percent = percentStr.trim().toIntOrNull()
    if (percent != null) {
      return percent / 100.0
    }
  }
  return null
}

8.7Parameter injection

For compatibility reasons, the plugin function signatures described above are not fixed. Whenever a plugin function is called, its arguments are injected based on their type. For example, if the function requires an instance of io.​vertx.​core.​Vertx, such an object will be passed. This is independent of where this argument appears in the parameter list. In the function signatures described above, the Vert.x instance is always the last parameter, but actually, it could be at any position and the injection mechanism would still work.

This feature allows us as developers to add new parameters in the future to the function signatures without requiring all users to change their plugins. It also allows plugin developers to specify parameters in any order and to even leave out parameters if they are not needed. For example, a process chain adapter could also have the following signature:

fun myProcessChainAdapter(workflow: Workflow)

Note that a parameter can only be injected into a plugin of a certain type, if it is described in the function signature of this plugin type above. For example, the function signature of a runtime plugin does not specify a workflow, so a workflow cannot be injected. Trying to define a runtime plugin with such a parameter will result in an error.

About

Steep’s development is led by the competence center for Spatial Information Management of the Fraunhofer Institute for Computer Graphics Research IGD in Darmstadt, Germany. Fraunhofer IGD is the international leading research institution for applied visual computing. The competence center for Spatial Information Management offers expertise and innovative technologies that enable successful communication and efficient cooperation with the help of geographic information.

Steep was initially developed within the research project “IQmulus” (A High-volume Fusion and Analysis Platform for Geospatial Point Clouds, Coverages and Volumetric Data Sets) funded from the 7th Framework Programme of the European Commission, call identifier FP7-ICT-2011-8, under the Grant agreement no. 318787 from 2012 to 2016. It was previously called the ‘IQmulus JobManager’ or just the ‘JobManager’.

Presentations

Slides

This presentation was given by Michel Krämer at the DATA conference 2020. He presented his paper entitled “Capability-based scheduling of scientific workflows in the cloud”. Michel talked about Steep’s software architecture and its scheduling algorithm that assigns process chains to virtual machines.

Publications

Steep and its predecessor JobManager have appeared in at least the following publications:

Preview
Krämer, M. (2021). Efficient Scheduling of Scientific Workflow Actions in the Cloud Based on Required Capabilities. In S. Hammoudi, C. Quix, & J. Bernardino (Eds.), Data Management Technologies and Applications. Communications in Computer and Information Science (Vol. 1446, pp. 32–55). Springer. https://doi.org/10.1007/978-3-030-83014-4_2
[ PDF ]
Preview
Krämer, M., Würz, H. M., & Altenhofen, C. (2021). Executing cyclic scientific workflows in the cloud. Journal of Cloud Computing, 10(25), 1–26. https://doi.org/10.1186/s13677-021-00229-7
[ PDF ]
Preview
Krämer, M. (2020). Capability-based Scheduling of Scientific Workflows in the Cloud. Proceedings of the 9th International Conference on Data Science, Technology, and Applications DATA, 43–54. https://doi.org/10.5220/0009805400430054
[ PDF ]
Preview
Krämer, M. (2018). A Microservice Architecture for the Processing of Large Geospatial Data in the Cloud (Doctoral dissertation). Technische Universität Darmstadt. https://doi.org/10.13140/RG.2.2.30034.66248
[ PDF ]
Preview
Böhm, J., Bredif, M., Gierlinger, T., Krämer, M., Lindenbergh, R., Liu, K., … Sirmacek, B. (2016). The IQmulus Urban Showcase: Automatic Tree Classification and Identification in Huge Mobile Mapping Point Clouds. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLI-B3, 301–307. https://doi.org/10.5194/isprs-archives-XLI-B3-301-2016
[ PDF ]
Preview
Krämer, M., & Senner, I. (2015). A modular software architecture for processing of big geospatial data in the cloud. Computers & Graphics, 49, 69–81. https://doi.org/10.1016/j.cag.2015.02.005