Download and get started

Choose from one of the fol­low­ing op­tions to down­load Steep:

If you down­loaded the bi­nary pack­age of Steep, ex­tract the ZIP file and run the start script:

cd steep-5.1.0
bin/steep

Or, start the Docker image as fol­lows:

docker run --name steep -d --rm -p 8080:8080 \
    -e STEEP_HTTP_HOST=0.0.0.0 steep/steep:5.1.0

After a few sec­onds, you can ac­cess Steep’s web in­ter­face on http://lo­cal­host:8080/.

We will now sub­mit a sim­ple work­flow to test if Steep is run­ning cor­rectly. The work­flow con­sists of a sin­gle ex­e­cute ac­tion that sleeps for 10 sec­onds and then quits. Ex­e­cute the fol­low­ing com­mand:

curl -X POST http://localhost:8080/workflows -d 'api: 4.0.0
vars:
  - id: sleep_seconds
    value: 10
actions:
  - type: execute
    service: sleep
    parameters:
      - id: seconds
        var: sleep_seconds'

The com­mand will re­turn the ID of the sub­mit­ted work­flow. You can mon­i­tor the ex­e­cu­tion in the web in­ter­face or by is­su­ing the fol­low­ing com­mand:

curl http://localhost:8080/workflows/<workflow-id>

Re­place <workflow-id> with the re­turned ID.

Con­grat­u­la­tions! You suc­cess­fully in­stalled Steep and ran your first work­flow.

Documentation

In this sec­tion, we de­scribe the in­di­vid­ual fea­tures of Steep. The doc­u­men­ta­tion al­ways ap­plies to the lat­est soft­ware ver­sion.

Table of contents

1 How does Steep work?

In order to an­swer this ques­tion, we will first de­scribe how Steep trans­forms sci­en­tific work­flow graphs into ex­e­cutable units. After that, we will have a look at Steep’s soft­ware ar­chi­tec­ture and what kind of pro­cess­ing ser­vices it can ex­e­cute.

(This sec­tion is based on the fol­low­ing pub­li­ca­tion: Krämer, M. (2020). Capability-​Based Sched­ul­ing of Sci­en­tific Work­flows in the Cloud. Pro­ceed­ings of the 9th In­ter­na­tional Con­fer­ence on Data Sci­ence, Tech­nol­ogy, and Ap­pli­ca­tions DATA.)

1.1 Workflow scheduling

Steep is a sci­en­tific work­flow man­age­ment sys­tem that can be used to con­trol the pro­cess­ing of very large data sets in a dis­trib­uted en­vi­ron­ment.

A sci­en­tific work­flow is typ­i­cally rep­re­sented by a di­rected acyclic graph that de­scribes how an input data set is processed by cer­tain tasks in a given order to pro­duce a de­sired out­come. Such work­flows can be­come very large with hun­dreds up to sev­eral thou­sands of tasks pro­cess­ing data vol­umes rang­ing from gi­ga­bytes to ter­abytes. The fol­low­ing fig­ure shows a sim­ple ex­am­ple of such a work­flow in an ex­tended Petri Net no­ta­tion pro­posed by van der Aalst and van Hee (2004).

ABDEC

In this ex­am­ple, an input file is first processed by a task A. This task pro­duces two re­sults. The first one is processed by task B whose re­sult is in turn sent to C. The sec­ond re­sult of A is processed by D. The out­comes of C and D are fi­nally processed by task E.

In order to be able to sched­ule such a work­flow in a dis­trib­uted en­vi­ron­ment, the graph has to be trans­formed to in­di­vid­ual ex­e­cutable units. Steep fol­lows a hy­brid sched­ul­ing ap­proach that ap­plies heuris­tics on the level of the work­flow graph and later on the level of in­di­vid­ual ex­e­cutable units. We as­sume that tasks that ac­cess the same data should be ex­e­cuted on the same ma­chine to re­duce the com­mu­ni­ca­tion over­head and to im­prove file reuse. We there­fore group tasks into so-​called process chains, which are lin­ear se­quen­tial lists (with­out branches and loops).

Steep trans­forms work­flows to process chains in an it­er­a­tive man­ner. In each it­er­a­tion, it finds the longest lin­ear se­quences of tasks and groups them to process chains. The fol­low­ing an­i­ma­tion shows how this works for our ex­am­ple work­flow:

ABDEC

Task A will be put into a process chain in it­er­a­tion 1. Steep then sched­ules the ex­e­cu­tion of this process chain. After the ex­e­cu­tion has fin­ished, Steep uses the re­sults to pro­duce a process chain con­tain­ing B and C and an­other one con­tain­ing D. These process chains are then sched­uled to be ex­e­cuted in par­al­lel. The re­sults are fi­nally used to gen­er­ate the fourth process chain con­tain­ing task E, which is also sched­uled for ex­e­cu­tion.

1.2 Software architecture

The fol­low­ing fig­ure shows the main com­po­nents of Steep: the HTTP server, the con­troller, the sched­uler, the agent, and the cloud man­ager.

To­gether, these com­po­nents form an in­stance of Steep. In prac­tice, a sin­gle in­stance typ­i­cally runs on a sep­a­rate vir­tual ma­chine, but mul­ti­ple in­stances can also be started on the same ma­chine. Each com­po­nent can be en­abled or dis­abled in a given in­stance (see the con­fig­u­ra­tion op­tions for more in­for­ma­tion). That means, in a clus­ter, there can be in­stances that have all five com­po­nents en­abled, and oth­ers that have only an agent, for ex­am­ple.

All com­po­nents of all in­stances com­mu­ni­cate with each other through mes­sages sent over an event bus. Fur­ther, the HTTP server, the con­troller, and the sched­uler are able to con­nect to a shared data­base.

The HTTP server pro­vides in­for­ma­tion about sched­uled, run­ning, and fin­ished work­flows to clients. Clients can also up­load a new work­flow. In this case, the HTTP server puts the work­flow into the data­base and sends a mes­sage to one of the in­stances of the con­troller.

The con­troller re­ceives this mes­sage, loads the work­flow from the data­base, and starts trans­form­ing it it­er­a­tively to process chains as de­scribed above. When­ever it has gen­er­ated new process chains, it puts them into the data­base and sends a mes­sage to all in­stances of the sched­uler.

The sched­ulers then se­lect agents to ex­e­cute the process chains. They load the process chains from the data­base, send them via the event bus to the se­lected agents for ex­e­cu­tion, and fi­nally write the re­sults into the data­base. The sched­ulers also send a mes­sage back to the con­troller so it can con­tinue with the next it­er­a­tion and gen­er­ate more process chains until the work­flow has been com­pletely trans­formed.

In case a sched­uler does not find an agent suit­able for the ex­e­cu­tion of a process chain, it sends a mes­sage to the cloud man­ager (a com­po­nent that in­ter­acts with the API of the Cloud in­fra­struc­ture) and asks it to cre­ate a new agent.

1.3 Processing services

Steep is very flex­i­ble and al­lows a wide range of pro­cess­ing ser­vices (or mi­croser­vices) to be in­te­grated. A typ­i­cal pro­cess­ing ser­vice is a pro­gram that reads one or more input files and writes one or more out­put files. The pro­gram may also ac­cept generic pa­ra­me­ters. The ser­vice can be im­ple­mented in any pro­gram­ming lan­guage (as long as the bi­nary or script is ex­e­cutable on the ma­chine the Steep agent is run­ning) or can be wrapped in a Docker con­tainer.

For a seam­less in­te­gra­tion, a pro­cess­ing ser­vice should ad­here to the fol­low­ing guide­lines:

  • Every pro­cess­ing ser­vice should be a mi­croser­vice. It should run in its own process and serve one spe­cific pur­pose.
  • As Steep needs to call the ser­vice in a dis­trib­uted en­vi­ron­ment, it should not have a graph­i­cal user in­ter­face or re­quire any human in­ter­ac­tion dur­ing the run­time. Suit­able ser­vices are command-​line ap­pli­ca­tions that ac­cept ar­gu­ments to spec­ify input files, out­put files, and pa­ra­me­ters.
  • The ser­vice should read from input files, process the data, write re­sults to out­put files, and then exit. It should not run con­tin­u­ously like a web ser­vice. If you need to in­te­grate a web ser­vice in your work­flow, we rec­om­mend using the curl com­mand or some­thing sim­i­lar.
  • Steep does not re­quire the pro­cess­ing ser­vices to im­ple­ment a spe­cific in­ter­face. In­stead, the ser­vice’s input and out­put pa­ra­me­ters should be de­scribed in a spe­cial data model called ser­vice meta­data.
  • Ac­cord­ing to com­mon con­ven­tions for exit codes, a pro­cess­ing ser­vice should re­turn 0 (zero) upon suc­cess­ful ex­e­cu­tion and any num­ber but zero in case an error has oc­curred (e.g. 1, 2, 128, 255, etc.).
  • In order to en­sure de­ter­min­is­tic work­flow ex­ceu­tions, ser­vices should be state­less and idem­po­tent. This means that every ex­e­cu­tion of a ser­vice with the same input data and the same set of pa­ra­me­ters should pro­duce the same re­sult.

2 Example workflows

In this sec­tion, we de­scribe ex­am­ple work­flows cov­er­ing pat­terns we reg­u­larly see in real-​world use cases. For each work­flow, we also pro­vide the re­quired ser­vice meta­data. For more in­for­ma­tion about the work­flow model and ser­vice meta­data, please read the sec­tion on data mod­els.

2.1 Running two services in parallel

This ex­am­ple work­flow con­sists of two ac­tions that each copy a file. Since both ac­tions do not de­pend on each other (i.e. they do not share any vari­able), Steep con­verts them to two in­de­pen­dent process chains and ex­e­cutes them in par­al­lel (as long as there are at least two agents avail­able).

The work­flow de­fines four vari­ables. inputFile1 and inputFile2 point to the two files to be copied. outputFile1 and outputFile2 have no value. Steep will cre­ate unique val­ues (out­put file names) for them dur­ing the work­flow ex­e­cu­tion.

The work­flow then spec­i­fies two ex­e­cute ac­tions for the copy ser­vice. The ser­vice meta­data of copy de­fines that this pro­cess­ing ser­vice has an input pa­ra­me­ter input_file and an out­put pa­ra­me­ter output_file, both of which must be spec­i­fied ex­actly one time (cardinality equals 1..1).

For each ex­e­cute ac­tion, Steep as­signs the input vari­ables to the input pa­ra­me­ters, gen­er­ates file names for the out­put vari­ables, and then ex­e­cutes the pro­cess­ing ser­vices.

Workflow:
YAML
JSON
api: 4.0.0
vars:
  - id: inputFile1
    value: example1.txt
  - id: outputFile1
  - id: inputFile2
    value: example2.txt
  - id: outputFile2
actions:
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: inputFile1
    outputs:
      - id: output_file
        var: outputFile1
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: inputFile2
    outputs:
      - id: output_file
        var: outputFile2
{
  "api": "4.0.0",
  "vars": [{
    "id": "inputFile1",
    "value": "example1.txt"
  }, {
    "id": "outputFile1"
  }, {
    "id": "inputFile2",
    "value": "example2.txt"
  }, {
    "id": "outputFile2"
  }],
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "inputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile1"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "inputFile2"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile2"
    }]
  }]
}
Service metadata:
YAML
JSON
- id: copy
  name: Copy
  description: Copy files
  path: cp
  runtime: other
  parameters:
    - id: input_file
      name: Input file name
      description: Input file name
      type: input
      cardinality: 1..1
      data_type: file
    - id: output_file
      name: Output file name
      description: Output file name
      type: output
      cardinality: 1..1
      data_type: file
[{
  "id": "copy",
  "name": "Copy",
  "description": "Copy files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}]

2.2 Chaining two services

The fol­low­ing ex­am­ple work­flow makes a copy of a file and then a copy of the copy (i.e. the file is copied and the re­sult is copied again). The work­flow con­tains two ac­tions that share the same vari­able: outputFile1 is used as the out­put of the first ac­tion and as the input of the sec­ond ac­tion. Steep ex­e­cutes them in se­quence.

The ser­vice meta­data for this work­flow is the same as for the pre­vi­ous one.

Workflow:
YAML
JSON
api: 4.0.0
vars:
  - id: inputFile
    value: example.txt
  - id: outputFile1
  - id: outputFile2
actions:
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: inputFile
    outputs:
      - id: output_file
        var: outputFile1
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: outputFile1
    outputs:
      - id: output_file
        var: outputFile2
{
  "api": "4.0.0",
  "vars": [{
    "id": "inputFile",
    "value": "example.txt"
  }, {
    "id": "outputFile1"
  }, {
    "id": "outputFile2"
  }],
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "inputFile"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile1"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "outputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile2"
    }]
  }]
}

2.3 Splitting and joining results

This ex­am­ple starts with an ac­tion that copies a file. Two other ac­tions then run in par­al­lel and make copies of the re­sult of the first ac­tion. A final ac­tion then joins these copies to a sin­gle file. The work­flow has a split-​and-join pat­tern be­cause the graph is split into two branches after the first ac­tion. These branches are then joined into a sin­gle one with the final ac­tion.

Workflow:
YAML
JSON
api: 4.0.0
vars:
  - id: inputFile
    value: example.txt
  - id: outputFile1
  - id: outputFile2
  - id: outputFile3
  - id: outputFile4
actions:
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: inputFile
    outputs:
      - id: output_file
        var: outputFile1
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: outputFile1
    outputs:
      - id: output_file
        var: outputFile2
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: outputFile1
    outputs:
      - id: output_file
        var: outputFile3
  - type: execute
    service: join
    inputs:
      - id: i
        var: outputFile2
      - id: i
        var: outputFile3
    outputs:
      - id: o
        var: outputFile4
{
  "api": "4.0.0",
  "vars": [{
    "id": "inputFile",
    "value": "example.txt"
  }, {
    "id": "outputFile1"
  }, {
    "id": "outputFile2"
  }, {
    "id": "outputFile3"
  }, {
    "id": "outputFile4"
  }],
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "inputFile"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile1"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "outputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile2"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "outputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile3"
    }]
  }, {
    "type": "execute",
    "service": "join",
    "inputs": [{
      "id": "i",
      "var": "outputFile2"
    }, {
      "id": "i",
      "var": "outputFile3"
    }],
    "outputs": [{
      "id": "o",
      "var": "outputFile4"
    }]
  }]
}
Service metadata:
YAML
JSON
- id: copy
  name: Copy
  description: Copy files
  path: cp
  runtime: other
  parameters:
    - id: input_file
      name: Input file name
      description: Input file name
      type: input
      cardinality: 1..1
      data_type: file
    - id: output_file
      name: Output file name
      description: Output file name
      type: output
      cardinality: 1..1
      data_type: file
- id: join
  name: Join
  description: Merge one or more files into one
  path: join.sh
  runtime: other
  parameters:
    - id: i
      name: Input files
      description: One or more input files to merge
      type: input
      cardinality: 1..n
      data_type: file
    - id: o
      name: Output file
      description: The output file
      type: output
      cardinality: 1..1
      data_type: file
[{
  "id": "copy",
  "name": "Copy",
  "description": "Copy files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}, {
  "id": "join",
  "name": "Join",
  "description": "Merge one or more files into one",
  "path": "join.sh",
  "runtime": "other",
  "parameters": [{
    "id": "i",
    "name": "Input files",
    "description": "One or more input files to merge",
    "type": "input",
    "cardinality": "1..n",
    "data_type": "file"
  }, {
    "id": "o",
    "name": "Output file",
    "description": "The output file",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}]

2.4 Processing a dynamic number of results in parallel

This ex­am­ple demon­strates how to process the re­sults of an ac­tion in par­al­lel even if the num­ber of re­sult files is un­known dur­ing the de­sign of the work­flow. The work­flow starts with an ac­tion that splits an input file inputFile into mul­ti­ple files (e.g. one file per line) stored in a di­rec­tory outputDirectory. A for-​each ac­tion then it­er­ates over these files and cre­ates copies. The for-​each ac­tion has an it­er­a­tor i that serves as the input for the in­di­vid­ual in­stances of the copy ser­vice. The out­put files (outputFile1) of this ser­vice are col­lected via the yieldToOutput prop­erty in a vari­able called copies. The final join ser­vice merges these copies into a sin­gle file outputFile2.

Workflow:
YAML
JSON
api: 4.0.0
vars:
  - id: inputFile
    value: example.txt
  - id: lines
    value: 1
  - id: outputDirectory
  - id: i
  - id: outputFile1
  - id: copies
  - id: outputFile2
actions:
  - type: execute
    service: split
    parameters:
      - id: lines
        var: lines
    inputs:
      - id: file
        var: inputFile
    outputs:
      - id: output_directory
        var: outputDirectory
  - type: for
    input: outputDirectory
    enumerator: i
    output: copies
    actions:
      - type: execute
        service: copy
        inputs:
          - id: input_file
            var: i
        outputs:
          - id: output_file
            var: outputFile1
    yieldToOutput: outputFile1
  - type: execute
    service: join
    inputs:
      - id: i
        var: copies
    outputs:
      - id: o
        var: outputFile2
{
  "api": "4.0.0",
  "vars": [{
    "id": "inputFile",
    "value": "example.txt"
  }, {
    "id": "lines",
    "value": 1
  }, {
    "id": "outputDirectory"
  }, {
    "id": "i"
  }, {
    "id": "outputFile1"
  }, {
    "id": "copies"
  }, {
    "id": "outputFile2"
  }],
  "actions": [{
    "type": "execute",
    "service": "split",
    "parameters": [{
      "id": "lines",
      "var": "lines"
    }],
    "inputs": [{
      "id": "file",
      "var": "inputFile"
    }],
    "outputs": [{
      "id": "output_directory",
      "var": "outputDirectory"
    }]
  }, {
    "type": "for",
    "input": "outputDirectory",
    "enumerator": "i",
    "output": "copies",
    "actions": [{
      "type": "execute",
      "service": "copy",
      "inputs": [{
        "id": "input_file",
        "var": "i"
      }],
      "outputs": [{
        "id": "output_file",
        "var": "outputFile1"
      }]
    }],
    "yieldToOutput": "outputFile1"
  }, {
    "type": "execute",
    "service": "join",
    "inputs": [{
      "id": "i",
      "var": "copies"
    }],
    "outputs": [{
      "id": "o",
      "var": "outputFile2"
    }]
  }]
}
Service metadata:
YAML
JSON
- id: split
  name: Split
  description: Split a file into pieces
  path: split
  runtime: other
  parameters:
    - id: lines
      name: Number of lines per file
      description: Create smaller files n lines in length
      type: argument
      cardinality: 0..1
      data_type: integer
      label: '-l'
    - id: file
      name: Input file
      description: The input file to split
      type: input
      cardinality: 1..1
      data_type: file
    - id: output_directory
      name: Output directory
      description: The output directory
      type: output
      cardinality: 1..1
      data_type: directory
      file_suffix: /
- id: copy
  name: Copy
  description: Copy files
  path: cp
  runtime: other
  parameters:
    - id: input_file
      name: Input file name
      description: Input file name
      type: input
      cardinality: 1..1
      data_type: file
    - id: output_file
      name: Output file name
      description: Output file name
      type: output
      cardinality: 1..1
      data_type: file
- id: join
  name: Join
  description: Merge one or more files into one
  path: join.sh
  runtime: other
  parameters:
    - id: i
      name: Input files
      description: One or more input files to merge
      type: input
      cardinality: 1..n
      data_type: file
    - id: o
      name: Output file
      description: The output file
      type: output
      cardinality: 1..1
      data_type: file
[{
  "id": "split",
  "name": "Split",
  "description": "Split a file into pieces",
  "path": "split",
  "runtime": "other",
  "parameters": [{
    "id": "lines",
    "name": "Number of lines per file",
    "description": "Create smaller files n lines in length",
    "type": "argument",
    "cardinality": "0..1",
    "data_type": "integer",
    "label": "-l"
  }, {
    "id": "file",
    "name": "Input file",
    "description": "The input file to split",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output_directory",
    "name": "Output directory",
    "description": "The output directory",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "directory",
    "file_suffix": "/"
  }]
}, {
  "id": "copy",
  "name": "Copy",
  "description": "Copy files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}, {
  "id": "join",
  "name": "Join",
  "description": "Merge one or more files into one",
  "path": "join.sh",
  "runtime": "other",
  "parameters": [{
    "id": "i",
    "name": "Input files",
    "description": "One or more input files to merge",
    "type": "input",
    "cardinality": "1..n",
    "data_type": "file"
  }, {
    "id": "o",
    "name": "Output file",
    "description": "The output file",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}]

2.5 Feeding results back into the workflow (cycles/loops)

The fol­low­ing ex­am­ple shows how to cre­ate loops with a dy­namic num­ber of it­er­a­tions. Sup­pose there is a pro­cess­ing ser­vice called countdown.js that reads a num­ber from an input file, de­creases this num­ber by 1, and then writes the re­sult to an out­put file. The ser­vice could be im­ple­mented in Node.js as fol­lows:

#!/usr/bin/env node

const fs = require("fs").promises

async function countDown(input, output) {
  let value = parseInt(await fs.readFile(input, "utf-8"))
  console.log(`Old value: ${value}`)

  value--
  if (value > 0) {
    console.log(`New value: ${value}`)
    await fs.writeFile(output, "" + value, "utf-8")
  } else {
    console.log("No new value")
  }
}

countDown(process.argv[2], process.argv[3])

The fol­low­ing work­flow uses this ser­vice in a for-​each ac­tion to con­tin­u­ously re­process a file and de­crease the num­ber in it until it reaches 0.

In the first it­er­a­tion of the for-​each ac­tion, the ser­vice reads from a file called input.txt and writes to an out­put file with a name gen­er­ated dur­ing run­time. The path of this out­put file is routed back into the for-​each ac­tion via yieldToInput. In the sec­ond it­er­a­tion, the ser­vice reads from the out­put file and pro­duces an­other one. This process con­tin­ues until the num­ber equals 0. In this case, the ser­vice does not write an out­put file any­more and the work­flow fin­ishes.

Note that we use the data type fileOrEmptyList in the ser­vice meta­data for the out­put pa­ra­me­ter of the countdown ser­vice. This is a spe­cial data type that ei­ther re­turns the gen­er­ated file or an empty list if the file does not exist. In the lat­ter case, the for-​each ac­tion does not have any more input val­ues to process. Think of the input of a for-​each ac­tion as a queue. If noth­ing is pushed into the queue and all el­e­ments have al­ready been processed, the for-​each ac­tion can fin­ish.

Workflow:
YAML
JSON
api: 4.0.0
vars:
  - id: input_file
    value: input.txt
  - id: i
  - id: output_file
actions:
  - type: for
    input: input_file
    enumerator: i
    yieldToInput: output_file
    actions:
      - type: execute
        service: countdown
        inputs:
          - id: input
            var: i
        outputs:
          - id: output
            var: output_file
{
  "api": "4.0.0",
  "vars": [{
    "id": "input_file",
    "value": "input.txt"
  }, {
    "id": "i"
  }, {
    "id": "output_file"
  }],
  "actions": [{
    "type": "for",
    "input": "input_file",
    "enumerator": "i",
    "yieldToInput": "output_file",
    "actions": [{
      "type": "execute",
      "service": "countdown",
      "inputs": [{
        "id": "input",
        "var": "i"
      }],
      "outputs": [{
        "id": "output",
        "var": "output_file"
      }]
    }]
  }]
}
Service metadata:
YAML
JSON
- id: countdown
  name: Count Down
  description: 'Read a number, subtract 1, and write the result'
  path: ./countdown.js
  runtime: other
  parameters:
    - id: input
      name: Input file
      description: The input file containing the number to decrease
      type: input
      cardinality: 1..1
      data_type: file
    - id: output
      name: Output file
      description: The path to the output file
      type: output
      cardinality: 1..1
      data_type: fileOrEmptyList
[{
  "id": "countdown",
  "name": "Count Down",
  "description": "Read a number, subtract 1, and write the result",
  "path": "./countdown.js",
  "runtime": "other",
  "parameters": [{
    "id": "input",
    "name": "Input file",
    "description": "The input file containing the number to decrease",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output",
    "name": "Output file",
    "description": "The path to the output file",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "fileOrEmptyList"
  }]
}]

3 Data models

This sec­tion con­tains a de­scrip­tion of all data mod­els used by Steep.

3.1 Workflows

The main com­po­nents of the work­flow model are vari­ables and ac­tions. Use vari­ables to spec­ify input files and pa­ra­me­ters for your pro­cess­ing ser­vices. Vari­ables for out­put files must also be de­clared but must not have a value. The names of out­put files will be gen­er­ated by Steep dur­ing the run­time of the work­flow.

Prop­ertyTypeDe­scrip­tion
api
(re­quired)
stringThe API (or data model) ver­sion. Should be 4.0.0.
name
(op­tional)
stringAn op­tional human-​readable work­flow name
vars
(re­quired)
arrayAn array of vari­ables
ac­tions
(re­quired)
arrayAn array of ac­tions that make up the work­flow
Example:

See the sec­tion on ex­am­ple work­flows.

3.2 Variables

A vari­able holds a value for in­puts, out­puts, and generic pa­ra­me­ters of pro­cess­ing ser­vices. It can be de­fined (input vari­ables and generic pa­ra­me­ters) or un­de­fined (out­put pa­ra­me­ters). De­fined val­ues are im­mutable. Un­de­fined vari­ables will be as­signed a value by Steep dur­ing the run­time of a work­flow.

Vari­ables are also used to link two ser­vices to­gether and to de­fine the data flow in the work­flow graph. For ex­am­ple, if the out­put pa­ra­me­ter of a ser­vice A refers to a vari­able V, and the input pa­ra­me­ter of ser­vice B refers to the same vari­able, Steep will first ex­e­cute A to de­ter­mine the value of V and then ex­e­cute B.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringA unique vari­able iden­ti­fier
value
(op­tional)
anyThe vari­able’s value or null if the vari­able is un­de­fined
Example:
YAML
JSON
id: input_file
value: /data/input.txt
{
  "id": "input_file",
  "value": "/data/input.txt"
}

3.3 Actions

There are two types of ac­tions in a work­flow: ex­e­cute ac­tions and for-​each ac­tions. They are dif­fer­en­ti­ated by their type at­tribute.

3.3.1 Execute actions

An ex­e­cute ac­tion in­structs Steep to ex­e­cute a cer­tain ser­vice with given in­puts, out­puts, and generic pa­ra­me­ters.

Prop­ertyTypeDe­scrip­tion
type
(re­quired)
stringThe type of the ac­tion. Must be "execute".
ser­vice
(re­quired)
stringThe ID of the ser­vice to ex­e­cute
in­puts
(op­tional)
arrayAn array of input pa­ra­me­ters
out­puts
(op­tional)
arrayAn array of out­put pa­ra­me­ters
pa­ra­me­ters
(op­tional)
arrayAn array of generic pa­ra­me­ters
Example:
YAML
JSON
type: execute
service: my_service
inputs:
  - id: input_file
    var: my_input_file
outputs:
  - id: output_file
    var: my_output_file
    store: true
parameters:
  - id: verbose
    var: is_verbose
  - id: resolution
    var: resolution_pixels
{
  "type": "execute",
  "service": "my_service",
  "inputs": [{
    "id": "input_file",
    "var": "my_input_file"
  }],
  "outputs": [{
    "id": "output_file",
    "var": "my_output_file",
    "store": true
  }],
  "parameters": [{
    "id": "verbose",
    "var": "is_verbose"
  }, {
    "id": "resolution",
    "var": "resolution_pixels"
  }]
}
3.3.2 For-each actions

A for-​each ac­tion has an input, a list of sub-​actions, and an out­put. It clones the sub-​actions as many times as there are items in its input, ex­e­cutes the ac­tions, and then col­lects the re­sults in the out­put.

Al­though the ac­tion is called ‘for-​each’, the ex­e­cu­tion order of the sub-​actions is un­de­fined (i.e. the ex­e­cu­tion is non-​sequential and non-​deterministic). In­stead, Steep al­ways tries to ex­e­cute as many sub-​actions as pos­si­ble in par­al­lel.

For-​each ac­tions may con­tain ex­e­cute ac­tions but also nested for-​each ac­tions.

Prop­ertyTypeDe­scrip­tion
type
(re­quired)
stringThe type of the ac­tion. Must be "for".
input
(re­quired)
stringThe ID of a vari­able con­tain­ing the items to which to apply the sub-​actions
enu­mer­a­tor
(re­quired)
stringThe ID of a vari­able that holds the cur­rent value from input for each it­er­a­tion
out­put
(op­tional)
stringThe ID of a vari­able that will col­lect out­put val­ues from all it­er­a­tions (see yieldToOutput)
ac­tions
(op­tional)
arrayAn array of sub-​actions to ex­e­cute in each it­er­a­tion
yield­ToOut­put
(op­tional)
stringThe ID of a sub-​action’s out­put vari­able whose value should be ap­pended to the for-​each ac­tion’s output
yield­ToIn­put
(op­tional)
stringThe ID of a sub-​action’s out­put vari­able whose value should be ap­pended to the for-​each ac­tion’s input to gen­er­ate fur­ther it­er­a­tions
Example:
YAML
JSON
type: for
input: all_input_files
output: all_output_files
enumerator: i
yieldToOutput: output_file
actions:
  - type: execute
    service: copy
    inputs:
      - id: input
        var: i
    outputs:
      - id: output
        var: output_file
{
  "type": "for",
  "input": "all_input_files",
  "output": "all_output_files",
  "enumerator": "i",
  "yieldToOutput": "output_file",
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input",
      "var": "i"
    }],
    "outputs": [{
      "id": "output",
      "var": "output_file"
    }]
  }]
}
3.3.3 Parameters

This data model rep­re­sents in­puts and generic pa­ra­me­ters of ex­e­cute ac­tions.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringThe ID of the pa­ra­me­ter as de­fined in the ser­vice meta­data
var
(re­quired)
stringThe ID of a vari­able that holds the value for this pa­ra­me­ter
Example:
YAML
JSON
id: input
var: i
{
  "id": "input",
  "var": "i"
}
3.3.4 Output parameters

Out­put pa­ra­me­ters of ex­e­cute ac­tions have ad­di­tional prop­er­ties com­pared to input pa­ra­me­ters and generic pa­ra­me­ters.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringThe ID of the pa­ra­me­ter as de­fined in the ser­vice meta­data
var
(re­quired)
stringThe ID of a vari­able to which Steep will as­sign the gen­er­ated name of the out­put file. This vari­able can then be used, for ex­am­ple, as an input pa­ra­me­ter of a sub­se­quent ac­tion.
pre­fix
(op­tional)
stringAn op­tional string to prepend to the gen­er­ated name of the out­put file. For ex­am­ple, if Steep gen­er­ates the name "name123abc" and the pre­fix is "my/dir/", the out­put file­name will be "my/dir/name123abc". Note that the pre­fix must end with a slash if you want to cre­ate a di­rec­tory. The out­put file­name will be rel­a­tive to the con­fig­ured tem­po­rary di­rec­tory or out­put di­rec­tory (de­pend­ing on the store prop­erty). You may even spec­ify an ab­solute path: if the gen­er­ated name is "name456fgh" and the pre­fix is "/absolute/dir/", the out­put file­name will be "/absolute/dir/name456fgh".
store
(op­tional)
booleanIf this prop­erty is true, Steep will gen­er­ate an out­put file­name that is rel­a­tive to the con­fig­ured out­put di­rec­tory in­stead of the tem­po­rary di­rec­tory. The de­fault value is false.
Example:
YAML
JSON
id: output
var: o
prefix: some_directory/
store: false
{
  "id": "output",
  "var": "o",
  "prefix": "some_directory/",
  "store": false
}

3.4 Process chains

As de­scribed above, Steep trans­forms a work­flow to one or more process chains. A process chain is a se­quen­tial list of in­struc­tions that will be sent to Steep’s re­mote agents to ex­e­cute pro­cess­ing ser­vices in a dis­trib­uted en­vi­ron­ment.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringUnique process chain iden­ti­fier
ex­e­cuta­bles
(re­quired)
arrayA list of ex­e­cutable ob­jects that de­scribe what pro­cess­ing ser­vices should be called and with which ar­gu­ments
sub­mis­sionId
(re­quired)
stringThe ID of the sub­mis­sion to which this process chain be­longs
start­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the process chain ex­e­cu­tion was started. May be null if the ex­e­cu­tion has not started yet.
end­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the process chain ex­e­cu­tion fin­ished. May be null if the ex­e­cu­tion has not fin­ished yet.
sta­tus
(re­quired)
stringThe cur­rent sta­tus of the process chain
re­quired­Ca­pa­bil­i­ties
(op­tional)
arrayA set of strings spec­i­fy­ing ca­pa­bil­i­ties a host sys­tem must pro­vide to be able to ex­e­cute this process chain. See also se­tups.
re­sults
(op­tional)
ob­jectIf status is SUCCESS, this prop­erty con­tains the list of process chain re­sult files grouped by their out­put vari­able ID. Oth­er­wise, it is null.
er­rorMes­sage
(op­tional)
stringIf status is ERROR, this prop­erty con­tains a human-​readable error mes­sage. Oth­er­wise, it is null.
Example:
YAML
JSON
id: akpm646jjigral4cdyyq
submissionId: akpm6yojjigral4cdxgq
startTime: '2020-05-18T08:44:19.221456Z'
endTime: '2020-05-18T08:44:19.446437Z'
status: SUCCESS
requiredCapabilities:
  - nodejs
executables:
  - id: Count Down
    path: ./countdown.js
    runtime: other
    arguments:
      - id: input
        type: input
        dataType: file
        variable:
          id: input_file
          value: input.txt
      - id: output
        type: output
        dataType: fileOrEmptyList
        variable:
          id: output_file
          value: output.txt
    runtimeArgs: []
results:
  output_file:
    - output.txt
{
  "id": "akpm646jjigral4cdyyq",
  "submissionId": "akpm6yojjigral4cdxgq",
  "startTime": "2020-05-18T08:44:19.221456Z",
  "endTime": "2020-05-18T08:44:19.446437Z",
  "status": "SUCCESS",
  "requiredCapabilities": ["nodejs"],
  "executables": [{
    "id": "Count Down",
    "path": "./countdown.js",
    "runtime": "other",
    "arguments": [{
      "id": "input",
      "type": "input",
      "dataType": "file",
      "variable": {
        "id": "input_file",
        "value": "input.txt"
      }
    }, {
      "id": "output",
      "type": "output",
      "dataType": "fileOrEmptyList",
      "variable": {
        "id": "output_file",
        "value": "output.txt"
      }
    }],
    "runtimeArgs": []
  }],
  "results": {
    "output_file": ["output.txt"]
  }
}

3.5 Executables

An ex­e­cutable is part of a process chain. It de­scribes how a pro­cess­ing ser­vice should be ex­e­cuted and with which pa­ra­me­ters.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringAn iden­ti­fier (does not have to be unique. Typ­i­cally refers to the ID or name of the ser­vice to be ex­e­cuted)
path
(re­quired)
stringThe path to the bi­nary of the ser­vice to be ex­e­cuted. This prop­erty is spe­cific to the runtime. For ex­am­ple, for the docker run­time, this prop­erty refers to the Docker image.
ar­gu­ments
(re­quired)
arrayA list of ar­gu­ments to pass to the ser­vice. May be empty.
run­time
(re­quired)
stringThe name of the run­time that will ex­e­cute the ser­vice. Built-​in run­times are cur­rently other (for any ser­vice that is ex­e­cutable on the tar­get sys­tem) and docker for Docker con­tain­ers. More run­times can be added through plu­g­ins
run­timeArgs
(op­tional)
arrayA list of ar­gu­ments to pass to the run­time. May be empty.
Example:
YAML
JSON
id: Count Down
path: 'my_docker_image:latest'
runtime: docker
arguments:
  - id: input
    type: input
    dataType: file
    variable:
      id: input_file
      value: /data/input.txt
  - id: output
    type: output
    dataType: directory
    variable:
      id: output_file
      value: /data/output
  - id: arg1
    type: argument
    dataType: boolean
    label: '--foobar'
    variable:
      id: akqcqqoedcsaoescyhga
      value: 'true'
runtimeArgs:
  - id: akqcqqoedcsaoescyhgq
    type: argument
    dataType: string
    label: '-v'
    variable:
      id: data_mount
      value: '/data:/data'
{
  "id": "Count Down",
  "path": "my_docker_image:latest",
  "runtime": "docker",
  "arguments": [{
    "id": "input",
    "type": "input",
    "dataType": "file",
    "variable": {
      "id": "input_file",
      "value": "/data/input.txt"
    }
  }, {
    "id": "output",
    "type": "output",
    "dataType": "directory",
    "variable": {
      "id": "output_file",
      "value": "/data/output"
    }
  }, {
    "id": "arg1",
    "type": "argument",
    "dataType": "boolean",
    "label": "--foobar",
    "variable": {
      "id": "akqcqqoedcsaoescyhga",
      "value": "true"
    }
  }],
  "runtimeArgs": [{
    "id": "akqcqqoedcsaoescyhgq",
    "type": "argument",
    "dataType": "string",
    "label": "-v",
    "variable": {
      "id": "data_mount",
      "value": "/data:/data"
    }
  }]
}
3.5.1 Arguments

An ar­gu­ment is part of an ex­e­cutable.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringAn ar­gu­ment iden­ti­fier
label
(op­tional)
stringAn op­tional label to use when the ar­gu­ment is passed to the ser­vice (e.g. --input).
vari­able
(re­quired)
ob­jectA vari­able that holds the value of this ar­gu­ment.
type
(re­quired)
stringThe type of this ar­gu­ment. Valid val­ues: input, output, argument
dataType
(re­quired)
stringThe type of the ar­gu­ment value. If this prop­erty is directory, Steep will cre­ate a new di­rec­tory for the ser­vice’s out­put and re­cur­sively search it for re­sult files after the ser­vice has been ex­e­cuted. Oth­er­wise, this prop­erty can be an ar­bi­trary string. New data types with spe­cial han­dling can be added through out­put adapter plu­g­ins.
Example:
YAML
JSON
id: akqcqqoedcsaoescyhgq
type: argument
dataType: string
label: '-v'
variable:
  id: data_mount
  value: '/data:/data'
{
  "id": "akqcqqoedcsaoescyhgq",
  "type": "argument",
  "dataType": "string",
  "label": "-v",
  "variable": {
    "id": "data_mount",
    "value": "/data:/data"
  }
}
3.5.2 Argument variables

An ar­gu­ment vari­able holds the value of an ar­gu­ment.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringThe vari­able’s unique iden­ti­fier
value
(re­quired)
stringThe vari­able’s value
Example:
YAML
JSON
id: data_mount
value: '/data:/data'
{
  "id": "data_mount",
  "value": "/data:/data"
}

3.6 Submissions

A sub­mis­sion is cre­ated when you sub­mit a work­flow through the /workflows end­point. It con­tains in­for­ma­tion about the work­flow ex­e­cu­tion such as the start and end time as well as the cur­rent sta­tus.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringUnique sub­mis­sion iden­ti­fier
work­flow
(re­quired)
ob­jectThe sub­mit­ted work­flow
start­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the work­flow ex­e­cu­tion was started. May be null if the ex­e­cu­tion has not started yet.
end­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the work­flow ex­e­cu­tion fin­ished. May be null if the ex­e­cu­tion has not fin­ished yet.
sta­tus
(re­quired)
stringThe cur­rent sta­tus of the sub­mis­sion
re­quired­Ca­pa­bil­i­tiesarrayA set of strings spec­i­fy­ing ca­pa­bil­i­ties a host sys­tem must pro­vide to be able to ex­e­cute this work­flow. See also se­tups.
run­ning­Process­Chains
(re­quired)
num­berThe num­ber of process chains cur­rently being ex­e­cuted
can­celled­Process­Chains
(re­quired)
num­berThe num­ber of process chains that have been can­celled
suc­ceed­ed­Process­Chains
(re­quired)
num­berThe num­ber of process chains that have fin­ished suc­cess­fully
failed­Process­Chains
(re­quired)
num­berThe num­ber of process chains whose ex­e­cu­tion has failed
to­tal­Process­Chains
(re­quired)
num­berThe cur­rent total num­ber of process chains in this sub­mis­sion. May in­crease dur­ing ex­e­cu­tion when new process chains are gen­er­ated.
re­sults
(op­tional)
ob­jectIf status is SUCCESS or PARTIAL_SUCCESS, this prop­erty con­tains the list of work­flow re­sult files grouped by their out­put vari­able ID. Oth­er­wise, it is null.
er­rorMes­sage
(op­tional)
stringIf status is ERROR, this prop­erty con­tains a human-​readable error mes­sage. Oth­er­wise, it is null.
Example:
YAML
JSON
id: aiq7eios7ubxglkcqx5a
workflow:
  api: 4.0.0
  vars:
    - id: myInputFile
      value: /data/input.txt
    - id: myOutputFile
  actions:
    - type: execute
      service: cp
      inputs:
        - id: input_file
          var: myInputFile
      outputs:
        - id: output_file
          var: myOutputFile
          store: true
      parameters: []
startTime: '2020-02-13T15:38:58.719382Z'
endTime: '2020-02-13T15:39:00.807715Z'
status: SUCCESS
runningProcessChains: 0
cancelledProcessChains: 0
succeededProcessChains: 1
failedProcessChains: 0
totalProcessChains: 1
results:
  myOutputFile:
    - /data/out/aiq7eios7ubxglkcqx5a/aiq7hygs7ubxglkcrf5a
{
  "id": "aiq7eios7ubxglkcqx5a",
  "workflow": {
    "api": "4.0.0",
    "vars": [{
      "id": "myInputFile",
      "value": "/data/input.txt"
    }, {
      "id": "myOutputFile"
    }],
    "actions": [{
      "type": "execute",
      "service": "cp",
      "inputs": [{
        "id": "input_file",
        "var": "myInputFile"
      }],
      "outputs": [{
        "id": "output_file",
        "var": "myOutputFile",
        "store": true
      }],
      "parameters": []
    }]
  },
  "startTime": "2020-02-13T15:38:58.719382Z",
  "endTime": "2020-02-13T15:39:00.807715Z",
  "status": "SUCCESS",
  "runningProcessChains": 0,
  "cancelledProcessChains": 0,
  "succeededProcessChains": 1,
  "failedProcessChains": 0,
  "totalProcessChains": 1,
  "results": {
    "myOutputFile": [
      "/data/out/aiq7eios7ubxglkcqx5a/aiq7hygs7ubxglkcrf5a"
    ]
  }
}

3.7 Submission status

The fol­low­ing table shows the sta­tuses a sub­mis­sion can have:

Sta­tusDe­scrip­tion
AC­CEPTEDThe sub­mis­sion has been ac­cepted by Steep but ex­e­cu­tion has not started yet
RUN­NINGThe sub­mis­sion is cur­rently being ex­e­cuted
CAN­CELLEDThe sub­mis­sion was can­celled
SUC­CESSThe ex­e­cu­tion of the sub­mis­sion fin­ished suc­cess­fully
PAR­TIAL_SUC­CESSThe sub­mis­sion was ex­e­cuted com­pletely but one or more process chains failed
ERRORThe ex­e­cu­tion of the sub­mis­sion failed

3.8 Process chain status

The fol­low­ing table shows the sta­tuses a process chain can have:

Sta­tusDe­scrip­tion
REG­IS­TEREDThe process chain has been cre­ated but ex­e­cu­tion has not started yet
RUN­NINGThe process chain is cur­rently being ex­e­cuted
CAN­CELLEDThe ex­e­cu­tion of the process chain was can­celled
SUC­CESSThe process chain was ex­e­cuted suc­cess­fully
ERRORThe ex­e­cu­tion of the process chain failed

3.9 Service metadata

Ser­vice meta­data is used to de­scribe the in­ter­face of a pro­cess­ing ser­vice so it can be ex­e­cuted by Steep.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringA unique ser­vice iden­ti­fier
name
(re­quired)
stringA human-​readable name
de­scrip­tion
(re­quired)
stringA human-​readable de­scrip­tion
path
(re­quired)
stringRel­a­tive path to the ser­vice ex­e­cutable in the ser­vice arte­fact (or a Docker image if runtime equals docker)
run­time
(re­quired)
stringThe run­time en­vi­ron­ment
pa­ra­me­ters
(re­quired)
arrayA list of ser­vice pa­ra­me­ters
run­time_args
(op­tional)
arrayAn op­tional list of ar­gu­ments to pass to the run­time
re­quired_ca­pa­bil­i­ties
(op­tional)
arrayA set of strings spec­i­fy­ing ca­pa­bil­i­ties a host sys­tem must pro­vide to be able to ex­e­cute this ser­vice. See also se­tups.
Example:
YAML
JSON
id: cp
name: cp
description: Copies files
path: cp
runtime: other
parameters:
  - id: no_overwrite
    name: No overwrite
    description: Do not overwrite existing file
    type: argument
    cardinality: 1..1
    label: '-n'
    data_type: boolean
    default: false
  - id: input_file
    name: Input file name
    description: Input file name
    type: input
    cardinality: 1..1
    data_type: file
  - id: output_file
    name: Output file name
    description: Output file name
    type: output
    cardinality: 1..1
    data_type: file
{
  "id": "cp",
  "name": "cp",
  "description": "Copies files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "no_overwrite",
    "name": "No overwrite",
    "description": "Do not overwrite existing file",
    "type": "argument",
    "cardinality": "1..1",
    "label": "-n",
    "data_type": "boolean",
    "default": false
  }, {
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}

3.10 Runtime environments

Steep pro­vides a set of de­fault run­time en­vi­ron­ments that de­fine how pro­cess­ing ser­vices are ex­e­cuted. More en­vi­ron­ments can be added through run­time en­vi­ron­ment plu­g­ins.

NameDe­scrip­tion
dockerThe ser­vice will be ex­e­cuted through Docker. The ser­vice meta­data at­tribute path spec­i­fies the Docker image to run. The at­tribute runtime_args spec­i­fies pa­ra­me­ters that should be for­warded to the docker run com­mand.
otherThe ser­vice will be ex­e­cuted like a nor­mal ex­e­cutable pro­gram (bi­nary or shell script)

3.11 Service parameters

This data model de­scribes the pa­ra­me­ters that can be passed to a pro­cess­ing ser­vice. It is part of the ser­vice meta­data.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringA unique pa­ra­me­ter iden­ti­fier
name
(re­quired)
stringA human-​readable name
de­scrip­tion
(re­quired)
stringA human-​readable de­scrip­tion
type
(re­quired)
stringThe type of this pa­ra­me­ter. Valid val­ues: input, output, argument
car­di­nal­ity
(re­quired)
stringA string in the form lower..upper spec­i­fy­ing how many times the pa­ra­me­ter must ap­pear at least (lower limit) and how many times it can ap­pear at most (upper limit). The char­ac­ter n can be used for the upper limit to spec­ify an ar­bi­trary num­ber. The lower limit must not be greater that the upper limit. Ex­am­ples car­di­nal­i­ties are listed below.
data_type
(op­tional)
stringThe type of the pa­ra­me­ter value. Steep treats pa­ra­me­ters dif­fer­ently de­pend­ing on the data type (see de­scrip­tion below).
de­fault
(op­tional)
stringAn op­tional de­fault value for this pa­ra­me­ter that will be used if the lower limit of cardinality is 1 but no pa­ra­me­ter value is given in the work­flow.
file_suf­fix
(op­tional)
stringAn op­tional suf­fix that should be ap­pended to the gen­er­ated file­name of an output pa­ra­me­ter. This prop­erty is typ­i­cally used for file ex­ten­sions (in­clud­ing the dot), e.g. ".xml" or ".json".
label
(op­tional)
stringAn op­tional string that will be used as a label for the pa­ra­me­ter in the ser­vice call. Ex­am­ples are -i, --input, --resolution, etc.
Example cardinalities:
  • "0..1" means the pa­ra­me­ter is op­tional (it can ap­pear 0 times or 1 time)
  • "1..1" means the pa­ra­me­ter is manda­tory (it must ap­pear 1 time)
  • "1..n" means it must ap­pear at least once or many times (no upper limit)
Data type:

Steep treats pa­ra­me­ters dif­fer­ently de­pend­ing on the type and data_type:

  • If type is "output" and data_type is "directory", Steep will cre­ate a new di­rec­tory for the ser­vice’s out­put and re­cur­sively search it for re­sult files after the ser­vice has been ex­e­cuted.

  • If type is "input" and data_type is "directory", Steep will find the com­mon par­ent di­rec­tory of the files from the pa­ra­me­ter’s value and pass it to the ser­vice. For ex­am­ple, if the pa­ra­me­ter’s value is an array with the el­e­ments ["/tmp/a.txt", "/tmp/b.txt", "/tmp/subdir/c.txt"], Steep will pass "/tmp/" to the ser­vice.

  • If type is "input", data_type is not "directory", but the pa­ra­me­ter’s value is an array, Steep will du­pli­cate the pa­ra­me­ter as many times as there are items in the array (given that the car­di­nal­ity has no upper limit).

  • If type is "argument", data_type is "boolean", and the pa­ra­me­ter has a label, Steep will pass the label to the ser­vice if the pa­ra­me­ter’s value is true and ig­nore the pa­ra­me­ter if the value is false.

  • Oth­er­wise, this prop­erty can be an ar­bi­trary string. New data types with spe­cial han­dling can be added through out­put adapter plu­g­ins.

Example:
YAML
JSON
id: no_overwrite
name: No overwrite
description: Do not overwrite existing file
type: argument
cardinality: 1..1
label: '-n'
data_type: boolean
default: false
{
  "id": "no_overwrite",
  "name": "No overwrite",
  "description": "Do not overwrite existing file",
  "type": "argument",
  "cardinality": "1..1",
  "label": "-n",
  "data_type": "boolean",
  "default": false
}

3.12 Runtime arguments

Run­time ar­gu­ments are sim­i­lar to ser­vice pa­ra­me­ters, ex­cept they are passed to the run­time that ex­e­cutes the ser­vice (e.g. Docker) in­stead of the ser­vice it­self.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringA unique ar­gu­ment iden­ti­fier
name
(re­quired)
stringA human-​readable name
de­scrip­tion
(re­quired)
stringA human-​readable de­scrip­tion
data_type
(op­tional)
stringThe type of the pa­ra­me­ter value. Typ­i­cally "string" or "boolean". The same rules apply as for ser­vice pa­ra­me­ters.
label
(op­tional)
stringAn op­tional string that will be used as a label for the pa­ra­me­ter. Ex­am­ples are -v, --volume, --entrypoint, etc.
value
(op­tional)
stringAn op­tional value for this pa­ra­me­ter.
Example:
YAML
JSON
id: volume
name: Volume mount
description: Mount data directory
label: '-v'
value: '/data:/data'
{
  "id": "volume",
  "name": "Volume mount",
  "description": "Mount data directory",
  "label": "-v",
  "value": "/data:/data"
}

3.13 Setups

A setup de­scribes how a vir­tual ma­chine (VM) should be cre­ated by Steep’s cloud man­ager.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringA unique setup iden­ti­fier
fla­vor
(re­quired)
stringThe fla­vor of the new VM
im­a­ge­Name
(re­quired)
stringThe name of the VM image to de­ploy
avail­abil­i­ty­Zone
(re­quired)
stringThe avail­abil­ity zone in which to cre­ate the VM
block­De­vice­SizeGb
(re­quired)
num­berThe size of the VM’s block de­vice in gi­ga­bytes
block­De­viceVol­ume­Type
(op­tional)
stringAn op­tional type of the VM’s block de­vice. By de­fault, the type will be se­lected au­to­mat­i­cally
min­VMs
(op­tional)
num­berAn op­tional min­i­mum num­ber of VMs to cre­ate with this setup. The de­fault value is 0.
maxVMs
(re­quired)
num­berThe max­i­mum num­ber of VMs to cre­ate with this setup
max­Cre­ate­Con­cur­rent
(op­tional)
num­berThe max­i­mum num­ber of VMs to cre­ate and pro­vi­sion con­cur­rently. The de­fault value is 1.
pro­vi­sion­ingScripts
(op­tional)
arrayAn op­tional list of paths to scripts that should be ex­e­cuted on the VM after it has been cre­ated
pro­vid­ed­Ca­pa­bil­i­ties
(op­tional)
arrayAn op­tional list of ca­pa­bil­i­ties that VMs with this setup will have
sshUser­name
(op­tional)
stringAn op­tional user­name for the SSH con­nec­tion to the cre­ated VM. Over­rides the main con­fig­u­ra­tion item steep.cloud.ssh.username if it is de­fined.
Example:
YAML
JSON
id: default
flavor: 7d217779-4d7b-4689-8a40-c12a377b946d
imageName: Ubuntu 18.04
availabilityZone: nova
blockDeviceSizeGb: 50
minVMs: 0
maxVMs: 4
provisioningScripts:
  - conf/setups/default/01_docker.sh
  - conf/setups/default/02_steep.sh
providedCapabilities:
  - docker
{
  "id": "default",
  "flavor": "7d217779-4d7b-4689-8a40-c12a377b946d",
  "imageName": "Ubuntu 18.04",
  "availabilityZone": "nova",
  "blockDeviceSizeGb": 50,
  "minVMs": 0,
  "maxVMs": 4,
  "provisioningScripts": [
    "conf/setups/default/01_docker.sh",
    "conf/setups/default/02_steep.sh"
  ],
  "providedCapabilities": ["docker"]
}

4 HTTP endpoints

The main way to com­mu­ni­cate with Steep (i.e. to sub­mit work­flows, to mon­i­tor progress, fetch meta­data, etc.) is through its HTTP in­ter­face. In this sec­tion, we de­scribe all HTTP end­points. By de­fault, Steep lis­tens to in­com­ing con­nec­tions on port 8080.

4.1 GET information

Get in­for­ma­tion about Steep. This in­cludes:

  • Steep’s ver­sion num­ber
  • A build ID
  • A SHA of the Git com­mit for which the build was cre­ated
  • A time­stamp of the mo­ment when the build was cre­ated
Re­source URL
/
Pa­ra­me­ters
None
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
Ex­am­ple re­quest
GET / HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-type: application/json
content-length: 136

{
    "build": "83",
    "commit": "2e54898b3e15da0015a1831bf6f6abc94a43eaee",
    "timestamp": 1590049676916,
    "version": "5.2.0-beta.1"
}

4.2 GET submissions

Get in­for­ma­tion about all sub­mis­sions in the data­base. The re­sponse is a JSON array con­sist­ing of sub­mis­sion ob­jects with­out the prop­er­ties workflow, results, and errorMessage. In order to get the com­plete de­tails of a sub­mis­sion, use the GET sub­mis­sion by ID end­point.

The sub­mis­sions are re­turned in the order in which they were added to the data­base with the newest ones at the top.

Re­source URL
/workflows
Pa­ra­me­ters
size
(op­tional)
The max­i­mum num­ber of sub­mis­sions to re­turn. The de­fault value is 10.
off­set
(op­tional)
The off­set of the first sub­mis­sion to re­turn. The de­fault value is 0.
sta­tus
(op­tional)
If this pa­ra­me­ter is de­fined, Steep will only re­turn sub­mis­sions with the given sta­tus. See the list of sub­mis­sion sta­tuses for valid val­ues. Oth­er­wise, it will re­turn all sub­mis­sions from the data­base.
Re­sponse head­ers
x-​page-sizeThe size of the cur­rent page (i.e. the max­i­mum num­ber of sub­mis­sion ob­jects re­turned). See size re­quest pa­ra­me­ter.
x-​page-offsetThe off­set of the first sub­mis­sion re­turned. See offset re­quest pa­ra­me­ter
x-​page-totalThe total num­ber of sub­mis­sions in the data­base match­ing the given re­quest pa­ra­me­ters.
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
400One of the pa­ra­me­ters was in­valid. See re­sponse body for error mes­sage.
Ex­am­ple re­quest
GET /workflows HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 707
content-type: application/json
x-page-offset: 0
x-page-size: 10
x-page-total: 2

[
    {
      "id": "akpm6yojjigral4cdxgq",
      "startTime": "2020-05-18T08:44:01.045710Z",
      "endTime": "2020-05-18T08:44:21.218425Z",
      "status": "SUCCESS",
      "requiredCapabilities": [],
      "runningProcessChains": 0,
      "cancelledProcessChains": 0,
      "succeededProcessChains": 10,
      "failedProcessChains": 0,
      "totalProcessChains": 10
    },
    {
      "id": "akttc5kv575splk3ameq",
      "startTime": "2020-05-24T17:20:37.343072Z",
      "status": "RUNNING",
      "requiredCapabilities": [],
      "runningProcessChains": 1,
      "cancelledProcessChains": 0,
      "succeededProcessChains": 391,
      "failedProcessChains": 0,
      "totalProcessChains": 1000
    }
]

4.3 GET submission by ID

Get de­tails about a sin­gle sub­mis­sion from the data­base.

Re­source URL
/workflows/:id
Pa­ra­me­ters
idThe ID of the sub­mis­sion to re­turn
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
404The subms­sion was not found
Ex­am­ple re­quest
GET /workflows/akpm6yojjigral4cdxgq HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 348
content-type: application/json

{
  "id": "akpm6yojjigral4cdxgq",
  "startTime": "2020-05-18T08:44:01.045710Z",
  "endTime": "2020-05-18T08:44:21.218425Z",
  "status": "SUCCESS",
  "requiredCapabilities": [],
  "runningProcessChains": 0,
  "cancelledProcessChains": 0,
  "succeededProcessChains": 10,
  "failedProcessChains": 0,
  "totalProcessChains": 10,
  "workflow": {
    "api": "4.0.0",
    "vars": [
      ...
    ],
    "actions": [
      ...
    ]
  }
}

4.4 PUT submission

Up­date a sub­mis­sion. The re­quest body is a JSON ob­ject with the sub­mis­sion prop­er­ties to up­date. At the mo­ment, only the status prop­erty can be up­dated.

Note: You can use this end­point to can­cel the ex­e­cu­tion of a sub­mis­sion (see ex­am­ple below).

Re­source URL
/workflows/:id
Pa­ra­me­ters
idThe ID of the sub­mis­sion to up­date
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
400The re­quest body was in­valid
404The subms­sion was not found
Ex­am­ple re­quest
PUT /workflows/akujvtkv575splk3saqa HTTP/1.1
Content-Length: 28
Content-Type: application/json

{
  "status": "CANCELLED"
}
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 168
content-type: application/json

{
  "id": "akujvtkv575splk3saqa",
  "startTime": "2020-05-25T19:02:21.610396Z",
  "endTime": "2020-05-25T19:02:33.414032Z",
  "status": "CANCELLED",
  "runningProcessChains": 0,
  "cancelledProcessChains": 314,
  "succeededProcessChains": 686,
  "failedProcessChains": 0,
  "totalProcessChains": 1000
}

4.5 POST workflow

Cre­ate a new sub­mis­sion. The re­quest body con­tains the work­flow to ex­e­cute.

If the op­er­a­tion was suc­cess­ful, the re­sponse body con­tains the cre­ated sub­mis­sion.

Re­source URL
/workflows
Sta­tus codes
202The work­flow has been ac­cepted (i.e. stored in the data­base) and is sched­uled for ex­e­cu­tion.
400The posted work­flow was in­valid. See re­sponse body for more in­for­ma­tion.
Ex­am­ple re­quest
POST /workflows HTTP/1.1
Content-Length: 231
Content-Type: application/json

{
  "api": "3.0.0",
  "vars": [{
    "id": "sleep_seconds",
    "value": 3
  }],
  "actions": [{
    "type": "execute",
    "service": "sleep",
    "parameters": [{
      "id": "seconds",
      "var": "sleep_seconds"
    }]
  }]
}
Ex­am­ple re­sponse
HTTP/1.1 202 Accepted
content-encoding: gzip
content-length: 374
content-type: application/json

{
  "id": "akukkcsv575splk3v2ma",
  "status": "ACCEPTED",
  "workflow": {
    "api": "3.0.0",
    "vars": [{
      "id": "sleep_seconds",
      "value": 3
    }],
    "actions": [{
      "type": "execute",
      "service": "sleep",
      "inputs": [],
      "outputs": [],
      "parameters": [{
        "id": "seconds",
        "var": "sleep_seconds"
      }]
    }]
  }
}

4.6 GET process chains

TODO

4.7 GET process chain by ID

TODO

4.8 PUT process chain

TODO

4.9 GET agents

TODO

4.10 GET agent by ID

TODO

4.11 GET VMs

TODO

4.12 GET VM by ID

TODO

4.13 GET services

TODO

4.14 GET service by ID

TODO

4.15 GET Prometheus metrics

TODO

5 Web-based user interface

TODO

6 Configuration

TODO Overview of con­fig­u­ra­tion files

6.1 steep.yaml

TODO

6.2 setups.yaml

TODO

6.3 services/services.yaml

TODO

6.4 plugins/commons.yaml

TODO

7 Extending Steep through plugins

TODO

7.1 Custom runtime environments

TODO

7.2 Output adapters

TODO

7.3 Process chain adapters

TODO

7.4 Initializers

TODO

About

Steep’s de­vel­op­ment is led by the com­pe­tence cen­ter for Spa­tial In­for­ma­tion Man­age­ment of the Fraun­hofer In­sti­tute for Com­puter Graph­ics Re­search IGD in Darm­stadt, Ger­many. Fraun­hofer IGD is the in­ter­na­tional lead­ing re­search in­sti­tu­tion for ap­plied vi­sual com­put­ing. The com­pe­tence cen­ter for Spa­tial In­for­ma­tion Man­age­ment of­fers ex­per­tise and in­no­v­a­tive tech­nolo­gies that en­able suc­cess­ful com­mu­ni­ca­tion and ef­fi­cient co­op­er­a­tion with the help of ge­o­graphic in­for­ma­tion.

Steep was ini­tially de­veloped within the re­search pro­ject “IQmu­lus” (A High-​volume Fu­sion and Ana­lysis Plat­form for Geo­spa­tial Point Clouds, Cov­er­ages and Volu­met­ric Data Sets) fun­ded from the 7th Frame­work Pro­gramme of the Eu­ro­pean Com­mis­sion, call iden­ti­fier FP7-ICT-2011-8, un­der the Grant agree­ment no. 318787 from 2012 to 2016. It was pre­vi­ously called the ‘IQmu­lus Job­Man­ager’ or just the ‘Job­Man­ager’.

Publications

Steep and its pre­de­ces­sor Job­Man­ager have ap­peared in at least the fol­low­ing pub­lic­a­tions:

Krämer, M. (2018). A Mi­croservice Ar­chi­tec­ture for the Pro­cessing of Large Geo­spa­tial Data in the Cloud (Doc­toral dis­ser­ta­tion). Tech­nis­che Uni­versität Darm­stadt. ht­tps://​doi.org/​10.13140/​RG.2.2.30034.66248
[ PDF ]
Böhm, J., Bredif, M., Gi­er­linger, T., Krämer, M., Linden­bergh, R., Liu, K., … Sir­ma­cek, B. (2016). The IQmu­lus Urban Show­case: Auto­matic Tree Clas­si­fic­a­tion and Iden­ti­fic­a­tion in Huge Mo­bile Map­ping Point Clouds. IS­PRS - In­ter­na­tional Archives of the Pho­to­gram­metry, Re­mote Sens­ing and Spa­tial In­form­a­tion Sci­ences, XLI-B3, 301–307. ht­tps://​doi.org/​10.5194/​is­prs-archives-XLI-B3-301-2016
[ PDF ]
Krämer, M., & Sen­ner, I. (2015). A mod­u­lar soft­ware ar­chi­tec­ture for pro­cessing of big geo­spa­tial data in the cloud. Com­puters & Graph­ics, 49, 69–81. ht­tps://​doi.org/​10.1016/​j.cag.2015.02.005