Download and get started

Choose from one of the fol­low­ing op­tions to down­load Steep:

If you down­loaded the bi­nary pack­age of Steep, ex­tract the ZIP file and run the start script:

cd steep-5.4.0
bin/steep

Or, start the Docker image as fol­lows:

docker run --name steep -d --rm -p 8080:8080 \
    -e STEEP_HTTP_HOST=0.0.0.0 steep/steep:5.4.0

After a few sec­onds, you can ac­cess Steep’s web in­ter­face on http://lo­cal­host:8080/.

We will now sub­mit a sim­ple work­flow to test if Steep is run­ning cor­rectly. The work­flow con­sists of a sin­gle ex­e­cute ac­tion that sleeps for 10 sec­onds and then quits. Ex­e­cute the fol­low­ing com­mand:

curl -X POST http://localhost:8080/workflows -d 'api: 4.1.0
vars:
  - id: sleep_seconds
    value: 10
actions:
  - type: execute
    service: sleep
    inputs:
      - id: seconds
        var: sleep_seconds'

The com­mand will re­turn the ID of the sub­mit­ted work­flow. You can mon­i­tor the ex­e­cu­tion in the web in­ter­face or by is­su­ing the fol­low­ing com­mand:

curl http://localhost:8080/workflows/<workflow-id>

Re­place <workflow-id> with the re­turned ID.

Con­grat­u­la­tions! You suc­cess­fully in­stalled Steep and ran your first work­flow.

Documentation

In this section, we describe the individual features of Steep. The documentation always applies to the latest software version.

Table of contents

1How does Steep work?

In order to an­swer this ques­tion, we will first de­scribe how Steep trans­forms sci­en­tific work­flow graphs into ex­e­cutable units. After that, we will have a look at Steep’s soft­ware ar­chi­tec­ture and what kind of pro­cess­ing ser­vices it can ex­e­cute.

(This sec­tion is based on the fol­low­ing pub­li­ca­tion: Krämer, M. (2020). Capability-​Based Sched­ul­ing of Sci­en­tific Work­flows in the Cloud. Pro­ceed­ings of the 9th In­ter­na­tional Con­fer­ence on Data Sci­ence, Tech­nol­ogy, and Ap­pli­ca­tions DATA, 43–54. https://doi.org/10.5220/0009805400430054)

1.1Workflow scheduling

Steep is a sci­en­tific work­flow man­age­ment sys­tem that can be used to con­trol the pro­cess­ing of very large data sets in a dis­trib­uted en­vi­ron­ment.

A sci­en­tific work­flow is typ­i­cally rep­re­sented by a di­rected acyclic graph that de­scribes how an input data set is processed by cer­tain tasks in a given order to pro­duce a de­sired out­come. Such work­flows can be­come very large with hun­dreds up to sev­eral thou­sands of tasks pro­cess­ing data vol­umes rang­ing from gi­ga­bytes to ter­abytes. The fol­low­ing fig­ure shows a sim­ple ex­am­ple of such a work­flow in an ex­tended Petri Net no­ta­tion pro­posed by van der Aalst and van Hee (2004).

ABDEC

In this ex­am­ple, an input file is first processed by a task A. This task pro­duces two re­sults. The first one is processed by task B whose re­sult is in turn sent to C. The sec­ond re­sult of A is processed by D. The out­comes of C and D are fi­nally processed by task E.

In order to be able to sched­ule such a work­flow in a dis­trib­uted en­vi­ron­ment, the graph has to be trans­formed to in­di­vid­ual ex­e­cutable units. Steep fol­lows a hy­brid sched­ul­ing ap­proach that ap­plies heuris­tics on the level of the work­flow graph and later on the level of in­di­vid­ual ex­e­cutable units. We as­sume that tasks that ac­cess the same data should be ex­e­cuted on the same ma­chine to re­duce the com­mu­ni­ca­tion over­head and to im­prove file reuse. We there­fore group tasks into so-​called process chains, which are lin­ear se­quen­tial lists (with­out branches and loops).

Steep trans­forms work­flows to process chains in an it­er­a­tive man­ner. In each it­er­a­tion, it finds the longest lin­ear se­quences of tasks and groups them to process chains. The fol­low­ing an­i­ma­tion shows how this works for our ex­am­ple work­flow:

ABDEC

Task A will be put into a process chain in it­er­a­tion 1. Steep then sched­ules the ex­e­cu­tion of this process chain. After the ex­e­cu­tion has fin­ished, Steep uses the re­sults to pro­duce a process chain con­tain­ing B and C and an­other one con­tain­ing D. These process chains are then sched­uled to be ex­e­cuted in par­al­lel. The re­sults are fi­nally used to gen­er­ate the fourth process chain con­tain­ing task E, which is also sched­uled for ex­e­cu­tion.

1.2Software architecture

The fol­low­ing fig­ure shows the main com­po­nents of Steep: the HTTP server, the con­troller, the sched­uler, the agent, and the cloud man­ager.

To­gether, these com­po­nents form an in­stance of Steep. In prac­tice, a sin­gle in­stance typ­i­cally runs on a sep­a­rate vir­tual ma­chine, but mul­ti­ple in­stances can also be started on the same ma­chine. Each com­po­nent can be en­abled or dis­abled in a given in­stance (see the con­fig­u­ra­tion op­tions for more in­for­ma­tion). That means, in a clus­ter, there can be in­stances that have all five com­po­nents en­abled, and oth­ers that have only an agent, for ex­am­ple.

All com­po­nents of all in­stances com­mu­ni­cate with each other through mes­sages sent over an event bus. Fur­ther, the HTTP server, the con­troller, and the sched­uler are able to con­nect to a shared data­base.

The HTTP server pro­vides in­for­ma­tion about sched­uled, run­ning, and fin­ished work­flows to clients. Clients can also up­load a new work­flow. In this case, the HTTP server puts the work­flow into the data­base and sends a mes­sage to one of the in­stances of the con­troller.

The con­troller re­ceives this mes­sage, loads the work­flow from the data­base, and starts trans­form­ing it it­er­a­tively to process chains as de­scribed above. When­ever it has gen­er­ated new process chains, it puts them into the data­base and sends a mes­sage to all in­stances of the sched­uler.

The sched­ulers then se­lect agents to ex­e­cute the process chains. They load the process chains from the data­base, send them via the event bus to the se­lected agents for ex­e­cu­tion, and fi­nally write the re­sults into the data­base. The sched­ulers also send a mes­sage back to the con­troller so it can con­tinue with the next it­er­a­tion and gen­er­ate more process chains until the work­flow has been com­pletely trans­formed.

In case a sched­uler does not find an agent suit­able for the ex­e­cu­tion of a process chain, it sends a mes­sage to the cloud man­ager (a com­po­nent that in­ter­acts with the API of the Cloud in­fra­struc­ture) and asks it to cre­ate a new agent.

1.3Processing services

Steep is very flex­i­ble and al­lows a wide range of pro­cess­ing ser­vices (or mi­croser­vices) to be in­te­grated. A typ­i­cal pro­cess­ing ser­vice is a pro­gram that reads one or more input files and writes one or more out­put files. The pro­gram may also ac­cept generic pa­ra­me­ters. The ser­vice can be im­ple­mented in any pro­gram­ming lan­guage (as long as the bi­nary or script is ex­e­cutable on the ma­chine the Steep agent is run­ning) or can be wrapped in a Docker con­tainer.

For a seam­less in­te­gra­tion, a pro­cess­ing ser­vice should ad­here to the fol­low­ing guide­lines:

  • Every pro­cess­ing ser­vice should be a mi­croser­vice. It should run in its own process and serve one spe­cific pur­pose.
  • As Steep needs to call the ser­vice in a dis­trib­uted en­vi­ron­ment, it should not have a graph­i­cal user in­ter­face or re­quire any human in­ter­ac­tion dur­ing the run­time. Suit­able ser­vices are command-​line ap­pli­ca­tions that ac­cept ar­gu­ments to spec­ify input files, out­put files, and pa­ra­me­ters.
  • The ser­vice should read from input files, process the data, write re­sults to out­put files, and then exit. It should not run con­tin­u­ously like a web ser­vice. If you need to in­te­grate a web ser­vice in your work­flow, we rec­om­mend using the curl com­mand or some­thing sim­i­lar.
  • Steep does not re­quire the pro­cess­ing ser­vices to im­ple­ment a spe­cific in­ter­face. In­stead, the ser­vice’s input and out­put pa­ra­me­ters should be de­scribed in a spe­cial data model called ser­vice meta­data.
  • Ac­cord­ing to com­mon con­ven­tions for exit codes, a pro­cess­ing ser­vice should re­turn 0 (zero) upon suc­cess­ful ex­e­cu­tion and any num­ber but zero in case an error has oc­curred (e.g. 1, 2, 128, 255, etc.).
  • In order to en­sure de­ter­min­is­tic work­flow ex­ceu­tions, ser­vices should be state­less and idem­po­tent. This means that every ex­e­cu­tion of a ser­vice with the same input data and the same set of pa­ra­me­ters should pro­duce the same re­sult.

2Example workflows

In this sec­tion, we de­scribe ex­am­ple work­flows cov­er­ing pat­terns we reg­u­larly see in real-​world use cases. For each work­flow, we also pro­vide the re­quired ser­vice meta­data. For more in­for­ma­tion about the work­flow model and ser­vice meta­data, please read the sec­tion on data mod­els.

2.1Running two services in parallel

This ex­am­ple work­flow con­sists of two ac­tions that each copy a file. Since both ac­tions do not de­pend on each other (i.e. they do not share any vari­able), Steep con­verts them to two in­de­pen­dent process chains and ex­e­cutes them in par­al­lel (as long as there are at least two agents avail­able).

The work­flow de­fines four vari­ables. inputFile1 and inputFile2 point to the two files to be copied. outputFile1 and outputFile2 have no value. Steep will cre­ate unique val­ues (out­put file names) for them dur­ing the work­flow ex­e­cu­tion.

The work­flow then spec­i­fies two ex­e­cute ac­tions for the copy ser­vice. The ser­vice meta­data of copy de­fines that this pro­cess­ing ser­vice has an input pa­ra­me­ter input_​file and an out­put pa­ra­me­ter output_​file, both of which must be spec­i­fied ex­actly one time (cardinality equals 1.​.​1).

For each ex­e­cute ac­tion, Steep as­signs the input vari­ables to the input pa­ra­me­ters, gen­er­ates file names for the out­put vari­ables, and then ex­e­cutes the pro­cess­ing ser­vices.

Workflow:
YAML
JSON
api: 4.1.0
vars:
  - id: inputFile1
    value: example1.txt
  - id: outputFile1
  - id: inputFile2
    value: example2.txt
  - id: outputFile2
actions:
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: inputFile1
    outputs:
      - id: output_file
        var: outputFile1
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: inputFile2
    outputs:
      - id: output_file
        var: outputFile2
{
  "api": "4.1.0",
  "vars": [{
    "id": "inputFile1",
    "value": "example1.txt"
  }, {
    "id": "outputFile1"
  }, {
    "id": "inputFile2",
    "value": "example2.txt"
  }, {
    "id": "outputFile2"
  }],
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "inputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile1"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "inputFile2"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile2"
    }]
  }]
}
Service metadata:
YAML
JSON
- id: copy
  name: Copy
  description: Copy files
  path: cp
  runtime: other
  parameters:
    - id: input_file
      name: Input file name
      description: Input file name
      type: input
      cardinality: 1..1
      data_type: file
    - id: output_file
      name: Output file name
      description: Output file name
      type: output
      cardinality: 1..1
      data_type: file
[{
  "id": "copy",
  "name": "Copy",
  "description": "Copy files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}]

2.2Chaining two services

The fol­low­ing ex­am­ple work­flow makes a copy of a file and then a copy of the copy (i.e. the file is copied and the re­sult is copied again). The work­flow con­tains two ac­tions that share the same vari­able: outputFile1 is used as the out­put of the first ac­tion and as the input of the sec­ond ac­tion. Steep ex­e­cutes them in se­quence.

The ser­vice meta­data for this work­flow is the same as for the pre­vi­ous one.

Workflow:
YAML
JSON
api: 4.1.0
vars:
  - id: inputFile
    value: example.txt
  - id: outputFile1
  - id: outputFile2
actions:
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: inputFile
    outputs:
      - id: output_file
        var: outputFile1
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: outputFile1
    outputs:
      - id: output_file
        var: outputFile2
{
  "api": "4.1.0",
  "vars": [{
    "id": "inputFile",
    "value": "example.txt"
  }, {
    "id": "outputFile1"
  }, {
    "id": "outputFile2"
  }],
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "inputFile"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile1"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "outputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile2"
    }]
  }]
}

2.3Splitting and joining results

This ex­am­ple starts with an ac­tion that copies a file. Two other ac­tions then run in par­al­lel and make copies of the re­sult of the first ac­tion. A final ac­tion then joins these copies to a sin­gle file. The work­flow has a split-​and-join pat­tern be­cause the graph is split into two branches after the first ac­tion. These branches are then joined into a sin­gle one with the final ac­tion.

Workflow:
YAML
JSON
api: 4.1.0
vars:
  - id: inputFile
    value: example.txt
  - id: outputFile1
  - id: outputFile2
  - id: outputFile3
  - id: outputFile4
actions:
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: inputFile
    outputs:
      - id: output_file
        var: outputFile1
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: outputFile1
    outputs:
      - id: output_file
        var: outputFile2
  - type: execute
    service: copy
    inputs:
      - id: input_file
        var: outputFile1
    outputs:
      - id: output_file
        var: outputFile3
  - type: execute
    service: join
    inputs:
      - id: i
        var: outputFile2
      - id: i
        var: outputFile3
    outputs:
      - id: o
        var: outputFile4
{
  "api": "4.1.0",
  "vars": [{
    "id": "inputFile",
    "value": "example.txt"
  }, {
    "id": "outputFile1"
  }, {
    "id": "outputFile2"
  }, {
    "id": "outputFile3"
  }, {
    "id": "outputFile4"
  }],
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "inputFile"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile1"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "outputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile2"
    }]
  }, {
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input_file",
      "var": "outputFile1"
    }],
    "outputs": [{
      "id": "output_file",
      "var": "outputFile3"
    }]
  }, {
    "type": "execute",
    "service": "join",
    "inputs": [{
      "id": "i",
      "var": "outputFile2"
    }, {
      "id": "i",
      "var": "outputFile3"
    }],
    "outputs": [{
      "id": "o",
      "var": "outputFile4"
    }]
  }]
}
Service metadata:
YAML
JSON
- id: copy
  name: Copy
  description: Copy files
  path: cp
  runtime: other
  parameters:
    - id: input_file
      name: Input file name
      description: Input file name
      type: input
      cardinality: 1..1
      data_type: file
    - id: output_file
      name: Output file name
      description: Output file name
      type: output
      cardinality: 1..1
      data_type: file
- id: join
  name: Join
  description: Merge one or more files into one
  path: join.sh
  runtime: other
  parameters:
    - id: i
      name: Input files
      description: One or more input files to merge
      type: input
      cardinality: 1..n
      data_type: file
    - id: o
      name: Output file
      description: The output file
      type: output
      cardinality: 1..1
      data_type: file
[{
  "id": "copy",
  "name": "Copy",
  "description": "Copy files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}, {
  "id": "join",
  "name": "Join",
  "description": "Merge one or more files into one",
  "path": "join.sh",
  "runtime": "other",
  "parameters": [{
    "id": "i",
    "name": "Input files",
    "description": "One or more input files to merge",
    "type": "input",
    "cardinality": "1..n",
    "data_type": "file"
  }, {
    "id": "o",
    "name": "Output file",
    "description": "The output file",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}]

2.4Processing a dynamic number of results in parallel

This ex­am­ple demon­strates how to process the re­sults of an ac­tion in par­al­lel even if the num­ber of re­sult files is un­known dur­ing the de­sign of the work­flow. The work­flow starts with an ac­tion that splits an input file inputFile into mul­ti­ple files (e.g. one file per line) stored in a di­rec­tory outputDirectory. A for-​each ac­tion then it­er­ates over these files and cre­ates copies. The for-​each ac­tion has an it­er­a­tor i that serves as the input for the in­di­vid­ual in­stances of the copy ser­vice. The out­put files (outputFile1) of this ser­vice are col­lected via the yieldToOutput prop­erty in a vari­able called copies. The final join ser­vice merges these copies into a sin­gle file outputFile2.

Workflow:
YAML
JSON
api: 4.1.0
vars:
  - id: inputFile
    value: example.txt
  - id: lines
    value: 1
  - id: outputDirectory
  - id: i
  - id: outputFile1
  - id: copies
  - id: outputFile2
actions:
  - type: execute
    service: split
    inputs:
      - id: file
        var: inputFile
      - id: lines
        var: lines
    outputs:
      - id: output_directory
        var: outputDirectory
  - type: for
    input: outputDirectory
    enumerator: i
    output: copies
    actions:
      - type: execute
        service: copy
        inputs:
          - id: input_file
            var: i
        outputs:
          - id: output_file
            var: outputFile1
    yieldToOutput: outputFile1
  - type: execute
    service: join
    inputs:
      - id: i
        var: copies
    outputs:
      - id: o
        var: outputFile2
{
  "api": "4.1.0",
  "vars": [{
    "id": "inputFile",
    "value": "example.txt"
  }, {
    "id": "lines",
    "value": 1
  }, {
    "id": "outputDirectory"
  }, {
    "id": "i"
  }, {
    "id": "outputFile1"
  }, {
    "id": "copies"
  }, {
    "id": "outputFile2"
  }],
  "actions": [{
    "type": "execute",
    "service": "split",
    "inputs": [{
      "id": "file",
      "var": "inputFile"
    }, {
      "id": "lines",
      "var": "lines"
    }],
    "outputs": [{
      "id": "output_directory",
      "var": "outputDirectory"
    }]
  }, {
    "type": "for",
    "input": "outputDirectory",
    "enumerator": "i",
    "output": "copies",
    "actions": [{
      "type": "execute",
      "service": "copy",
      "inputs": [{
        "id": "input_file",
        "var": "i"
      }],
      "outputs": [{
        "id": "output_file",
        "var": "outputFile1"
      }]
    }],
    "yieldToOutput": "outputFile1"
  }, {
    "type": "execute",
    "service": "join",
    "inputs": [{
      "id": "i",
      "var": "copies"
    }],
    "outputs": [{
      "id": "o",
      "var": "outputFile2"
    }]
  }]
}
Service metadata:
YAML
JSON
- id: split
  name: Split
  description: Split a file into pieces
  path: split
  runtime: other
  parameters:
    - id: lines
      name: Number of lines per file
      description: Create smaller files n lines in length
      type: input
      cardinality: 0..1
      data_type: integer
      label: '-l'
    - id: file
      name: Input file
      description: The input file to split
      type: input
      cardinality: 1..1
      data_type: file
    - id: output_directory
      name: Output directory
      description: The output directory
      type: output
      cardinality: 1..1
      data_type: directory
      file_suffix: /
- id: copy
  name: Copy
  description: Copy files
  path: cp
  runtime: other
  parameters:
    - id: input_file
      name: Input file name
      description: Input file name
      type: input
      cardinality: 1..1
      data_type: file
    - id: output_file
      name: Output file name
      description: Output file name
      type: output
      cardinality: 1..1
      data_type: file
- id: join
  name: Join
  description: Merge one or more files into one
  path: join.sh
  runtime: other
  parameters:
    - id: i
      name: Input files
      description: One or more input files to merge
      type: input
      cardinality: 1..n
      data_type: file
    - id: o
      name: Output file
      description: The output file
      type: output
      cardinality: 1..1
      data_type: file
[{
  "id": "split",
  "name": "Split",
  "description": "Split a file into pieces",
  "path": "split",
  "runtime": "other",
  "parameters": [{
    "id": "lines",
    "name": "Number of lines per file",
    "description": "Create smaller files n lines in length",
    "type": "input",
    "cardinality": "0..1",
    "data_type": "integer",
    "label": "-l"
  }, {
    "id": "file",
    "name": "Input file",
    "description": "The input file to split",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output_directory",
    "name": "Output directory",
    "description": "The output directory",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "directory",
    "file_suffix": "/"
  }]
}, {
  "id": "copy",
  "name": "Copy",
  "description": "Copy files",
  "path": "cp",
  "runtime": "other",
  "parameters": [{
    "id": "input_file",
    "name": "Input file name",
    "description": "Input file name",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output_file",
    "name": "Output file name",
    "description": "Output file name",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}, {
  "id": "join",
  "name": "Join",
  "description": "Merge one or more files into one",
  "path": "join.sh",
  "runtime": "other",
  "parameters": [{
    "id": "i",
    "name": "Input files",
    "description": "One or more input files to merge",
    "type": "input",
    "cardinality": "1..n",
    "data_type": "file"
  }, {
    "id": "o",
    "name": "Output file",
    "description": "The output file",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "file"
  }]
}]

2.5Feeding results back into the workflow (cycles/​loops)

The fol­low­ing ex­am­ple shows how to cre­ate loops with a dy­namic num­ber of it­er­a­tions. Sup­pose there is a pro­cess­ing ser­vice called countdown.​js that reads a num­ber from an input file, de­creases this num­ber by 1, and then writes the re­sult to an out­put file. The ser­vice could be im­ple­mented in Node.js as fol­lows:

#!/usr/bin/env node

const fs = require("fs").promises

async function countDown(input, output) {
  let value = parseInt(await fs.readFile(input, "utf-8"))
  console.log(`Old value: ${value}`)

  value--
  if (value > 0) {
    console.log(`New value: ${value}`)
    await fs.writeFile(output, "" + value, "utf-8")
  } else {
    console.log("No new value")
  }
}

countDown(process.argv[2], process.argv[3])

The fol­low­ing work­flow uses this ser­vice in a for-​each ac­tion to con­tin­u­ously re­process a file and de­crease the num­ber in it until it reaches 0.

In the first it­er­a­tion of the for-​each ac­tion, the ser­vice reads from a file called input.​txt and writes to an out­put file with a name gen­er­ated dur­ing run­time. The path of this out­put file is routed back into the for-​each ac­tion via yieldToInput. In the sec­ond it­er­a­tion, the ser­vice reads from the out­put file and pro­duces an­other one. This process con­tin­ues until the num­ber equals 0. In this case, the ser­vice does not write an out­put file any­more and the work­flow fin­ishes.

Note that we use the data type fileOrEmptyList in the ser­vice meta­data for the out­put pa­ra­me­ter of the countdown ser­vice. This is a spe­cial data type that ei­ther re­turns the gen­er­ated file or an empty list if the file does not exist. In the lat­ter case, the for-​each ac­tion does not have any more input val­ues to process. Think of the input of a for-​each ac­tion as a queue. If noth­ing is pushed into the queue and all el­e­ments have al­ready been processed, the for-​each ac­tion can fin­ish.

Workflow:
YAML
JSON
api: 4.1.0
vars:
  - id: input_file
    value: input.txt
  - id: i
  - id: output_file
actions:
  - type: for
    input: input_file
    enumerator: i
    yieldToInput: output_file
    actions:
      - type: execute
        service: countdown
        inputs:
          - id: input
            var: i
        outputs:
          - id: output
            var: output_file
{
  "api": "4.1.0",
  "vars": [{
    "id": "input_file",
    "value": "input.txt"
  }, {
    "id": "i"
  }, {
    "id": "output_file"
  }],
  "actions": [{
    "type": "for",
    "input": "input_file",
    "enumerator": "i",
    "yieldToInput": "output_file",
    "actions": [{
      "type": "execute",
      "service": "countdown",
      "inputs": [{
        "id": "input",
        "var": "i"
      }],
      "outputs": [{
        "id": "output",
        "var": "output_file"
      }]
    }]
  }]
}
Service metadata:
YAML
JSON
- id: countdown
  name: Count Down
  description: 'Read a number, subtract 1, and write the result'
  path: ./countdown.js
  runtime: other
  parameters:
    - id: input
      name: Input file
      description: The input file containing the number to decrease
      type: input
      cardinality: 1..1
      data_type: file
    - id: output
      name: Output file
      description: The path to the output file
      type: output
      cardinality: 1..1
      data_type: fileOrEmptyList
[{
  "id": "countdown",
  "name": "Count Down",
  "description": "Read a number, subtract 1, and write the result",
  "path": "./countdown.js",
  "runtime": "other",
  "parameters": [{
    "id": "input",
    "name": "Input file",
    "description": "The input file containing the number to decrease",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "file"
  }, {
    "id": "output",
    "name": "Output file",
    "description": "The path to the output file",
    "type": "output",
    "cardinality": "1..1",
    "data_type": "fileOrEmptyList"
  }]
}]

3Data models

This sec­tion con­tains a de­scrip­tion of all data mod­els used by Steep.

3.1Workflows

The main com­po­nents of the work­flow model are vari­ables and ac­tions. Use vari­ables to spec­ify input files and pa­ra­me­ters for your pro­cess­ing ser­vices. Vari­ables for out­put files must also be de­clared but must not have a value. The names of out­put files will be gen­er­ated by Steep dur­ing the run­time of the work­flow.

Prop­ertyTypeDe­scrip­tion
api
(re­quired)
stringThe API (or data model) ver­sion. Should be 4.​1.​0.
name
(op­tional)
stringAn op­tional human-​readable work­flow name
vars
(re­quired)
arrayAn array of vari­ables
ac­tions
(re­quired)
arrayAn array of ac­tions that make up the work­flow
Example:

See the sec­tion on ex­am­ple work­flows.

3.2Variables

A vari­able holds a value for in­puts and out­puts of pro­cess­ing ser­vices. It can be de­fined (in­puts) or un­de­fined (out­puts). De­fined val­ues are im­mutable. Un­de­fined vari­ables will be as­signed a value by Steep dur­ing the run­time of a work­flow.

Vari­ables are also used to link two ser­vices to­gether and to de­fine the data flow in the work­flow graph. For ex­am­ple, if the out­put pa­ra­me­ter of a ser­vice A refers to a vari­able V, and the input pa­ra­me­ter of ser­vice B refers to the same vari­able, Steep will first ex­e­cute A to de­ter­mine the value of V and then ex­e­cute B.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringA unique vari­able iden­ti­fier
value
(op­tional)
anyThe vari­able’s value or null if the vari­able is un­de­fined
Example:
YAML
JSON
id: input_file
value: /data/input.txt
{
  "id": "input_file",
  "value": "/data/input.txt"
}

3.3Actions

There are two types of ac­tions in a work­flow: ex­e­cute ac­tions and for-​each ac­tions. They are dif­fer­en­ti­ated by their type at­tribute.

3.3.1Execute actions

An ex­e­cute ac­tion in­structs Steep to ex­e­cute a cer­tain ser­vice with given in­puts and out­puts.

Prop­ertyTypeDe­scrip­tion
type
(re­quired)
stringThe type of the ac­tion. Must be "execute".
ser­vice
(re­quired)
stringThe ID of the ser­vice to ex­e­cute
in­puts
(op­tional)
arrayAn array of input pa­ra­me­ters
out­puts
(op­tional)
arrayAn array of out­put pa­ra­me­ters
Example:
YAML
JSON
type: execute
service: my_service
inputs:
  - id: verbose
    var: is_verbose
  - id: resolution
    var: resolution_pixels
  - id: input_file
    var: my_input_file
outputs:
  - id: output_file
    var: my_output_file
    store: true
{
  "type": "execute",
  "service": "my_service",
  "inputs": [{
    "id": "verbose",
    "var": "is_verbose"
  }, {
    "id": "resolution",
    "var": "resolution_pixels"
  }, {
    "id": "input_file",
    "var": "my_input_file"
  }],
  "outputs": [{
    "id": "output_file",
    "var": "my_output_file",
    "store": true
  }]
}
3.3.2For-each actions

A for-​each ac­tion has an input, a list of sub-​actions, and an out­put. It clones the sub-​actions as many times as there are items in its input, ex­e­cutes the ac­tions, and then col­lects the re­sults in the out­put.

Al­though the ac­tion is called ‘for-​each’, the ex­e­cu­tion order of the sub-​actions is un­de­fined (i.e. the ex­e­cu­tion is non-​sequential and non-​deterministic). In­stead, Steep al­ways tries to ex­e­cute as many sub-​actions as pos­si­ble in par­al­lel.

For-​each ac­tions may con­tain ex­e­cute ac­tions but also nested for-​each ac­tions.

Prop­ertyTypeDe­scrip­tion
type
(re­quired)
stringThe type of the ac­tion. Must be "for".
input
(re­quired)
stringThe ID of a vari­able con­tain­ing the items to which to apply the sub-​actions
enu­mer­a­tor
(re­quired)
stringThe ID of a vari­able that holds the cur­rent value from input for each it­er­a­tion
out­put
(op­tional)
stringThe ID of a vari­able that will col­lect out­put val­ues from all it­er­a­tions (see yieldToOutput)
ac­tions
(op­tional)
arrayAn array of sub-​actions to ex­e­cute in each it­er­a­tion
yield­ToOut­put
(op­tional)
stringThe ID of a sub-​action’s out­put vari­able whose value should be ap­pended to the for-​each ac­tion’s output
yield­ToIn­put
(op­tional)
stringThe ID of a sub-​action’s out­put vari­able whose value should be ap­pended to the for-​each ac­tion’s input to gen­er­ate fur­ther it­er­a­tions
Example:
YAML
JSON
type: for
input: all_input_files
output: all_output_files
enumerator: i
yieldToOutput: output_file
actions:
  - type: execute
    service: copy
    inputs:
      - id: input
        var: i
    outputs:
      - id: output
        var: output_file
{
  "type": "for",
  "input": "all_input_files",
  "output": "all_output_files",
  "enumerator": "i",
  "yieldToOutput": "output_file",
  "actions": [{
    "type": "execute",
    "service": "copy",
    "inputs": [{
      "id": "input",
      "var": "i"
    }],
    "outputs": [{
      "id": "output",
      "var": "output_file"
    }]
  }]
}
3.3.3Parameters

This data model rep­re­sents in­puts and generic pa­ra­me­ters of ex­e­cute ac­tions.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringThe ID of the pa­ra­me­ter as de­fined in the ser­vice meta­data
var
(re­quired)
stringThe ID of a vari­able that holds the value for this pa­ra­me­ter
Example:
YAML
JSON
id: input
var: i
{
  "id": "input",
  "var": "i"
}
3.3.4Output parameters

Out­put pa­ra­me­ters of ex­e­cute ac­tions have ad­di­tional prop­er­ties com­pared to in­puts.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringThe ID of the pa­ra­me­ter as de­fined in the ser­vice meta­data
var
(re­quired)
stringThe ID of a vari­able to which Steep will as­sign the gen­er­ated name of the out­put file. This vari­able can then be used, for ex­am­ple, as an input pa­ra­me­ter of a sub­se­quent ac­tion.
pre­fix
(op­tional)
stringAn op­tional string to prepend to the gen­er­ated name of the out­put file. For ex­am­ple, if Steep gen­er­ates the name "name123abc" and the pre­fix is "my/​dir/​", the out­put file­name will be "my/​dir/​name123abc". Note that the pre­fix must end with a slash if you want to cre­ate a di­rec­tory. The out­put file­name will be rel­a­tive to the con­fig­ured tem­po­rary di­rec­tory or out­put di­rec­tory (de­pend­ing on the store prop­erty). You may even spec­ify an ab­solute path: if the gen­er­ated name is "name456fgh" and the pre­fix is "/​absolute/​dir/​", the out­put file­name will be "/​absolute/​dir/​name456fgh".
store
(op­tional)
booleanIf this prop­erty is true, Steep will gen­er­ate an out­put file­name that is rel­a­tive to the con­fig­ured out­put di­rec­tory in­stead of the tem­po­rary di­rec­tory. The de­fault value is false.
Example:
YAML
JSON
id: output
var: o
prefix: some_directory/
store: false
{
  "id": "output",
  "var": "o",
  "prefix": "some_directory/",
  "store": false
}

3.4Process chains

As de­scribed above, Steep trans­forms a work­flow to one or more process chains. A process chain is a se­quen­tial list of in­struc­tions that will be sent to Steep’s re­mote agents to ex­e­cute pro­cess­ing ser­vices in a dis­trib­uted en­vi­ron­ment.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringUnique process chain iden­ti­fier
ex­e­cuta­bles
(re­quired)
arrayA list of ex­e­cutable ob­jects that de­scribe what pro­cess­ing ser­vices should be called and with which ar­gu­ments
sub­mis­sionId
(re­quired)
stringThe ID of the sub­mis­sion to which this process chain be­longs
start­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the process chain ex­e­cu­tion was started. May be null if the ex­e­cu­tion has not started yet.
end­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the process chain ex­e­cu­tion fin­ished. May be null if the ex­e­cu­tion has not fin­ished yet.
sta­tus
(re­quired)
stringThe cur­rent sta­tus of the process chain
re­quired­Ca­pa­bil­i­ties
(op­tional)
arrayA set of strings spec­i­fy­ing ca­pa­bil­i­ties a host sys­tem must pro­vide to be able to ex­e­cute this process chain. See also se­tups.
re­sults
(op­tional)
ob­jectIf status is SUCCESS, this prop­erty con­tains the list of process chain re­sult files grouped by their out­put vari­able ID. Oth­er­wise, it is null.
es­ti­mat­ed­Progress
(op­tional)
num­berA float­ing point num­ber be­tween 0.​0 (0%) and 1.​0 (100%) in­di­cat­ing the cur­rent ex­e­cu­tion progress of this process chain. This prop­erty will only be pro­vided if the process chain is cur­rently being ex­e­cuted (i.e. if its status equals RUNNING) and if a progress could ac­tu­ally be es­ti­mated. Note that the value is an es­ti­ma­tion based on var­i­ous fac­tors and does not have to rep­re­sent the real progress. More pre­cise val­ues can be cal­cu­lated with a progress es­ti­ma­tor plugin. Some­times, progress can­not be es­ti­mated at all. In this case, the value will be null.
er­rorMes­sage
(op­tional)
stringIf status is ERROR, this prop­erty con­tains a human-​readable error mes­sage. Oth­er­wise, it is null.
Example:
YAML
JSON
id: akpm646jjigral4cdyyq
submissionId: akpm6yojjigral4cdxgq
startTime: '2020-05-18T08:44:19.221456Z'
endTime: '2020-05-18T08:44:19.446437Z'
status: SUCCESS
requiredCapabilities:
  - nodejs
executables:
  - id: Count Down
    path: ./countdown.js
    runtime: other
    arguments:
      - id: input
        type: input
        dataType: file
        variable:
          id: input_file
          value: input.txt
      - id: output
        type: output
        dataType: fileOrEmptyList
        variable:
          id: output_file
          value: output.txt
    runtimeArgs: []
results:
  output_file:
    - output.txt
{
  "id": "akpm646jjigral4cdyyq",
  "submissionId": "akpm6yojjigral4cdxgq",
  "startTime": "2020-05-18T08:44:19.221456Z",
  "endTime": "2020-05-18T08:44:19.446437Z",
  "status": "SUCCESS",
  "requiredCapabilities": ["nodejs"],
  "executables": [{
    "id": "Count Down",
    "path": "./countdown.js",
    "runtime": "other",
    "arguments": [{
      "id": "input",
      "type": "input",
      "dataType": "file",
      "variable": {
        "id": "input_file",
        "value": "input.txt"
      }
    }, {
      "id": "output",
      "type": "output",
      "dataType": "fileOrEmptyList",
      "variable": {
        "id": "output_file",
        "value": "output.txt"
      }
    }],
    "runtimeArgs": []
  }],
  "results": {
    "output_file": ["output.txt"]
  }
}
3.4.1Process chain status

The fol­low­ing table shows the sta­tuses a process chain can have:

Sta­tusDe­scrip­tion
REG­IS­TEREDThe process chain has been cre­ated but ex­e­cu­tion has not started yet
RUN­NINGThe process chain is cur­rently being ex­e­cuted
CAN­CELLEDThe ex­e­cu­tion of the process chain was can­celled
SUC­CESSThe process chain was ex­e­cuted suc­cess­fully
ERRORThe ex­e­cu­tion of the process chain failed

3.5Executables

An ex­e­cutable is part of a process chain. It de­scribes how a pro­cess­ing ser­vice should be ex­e­cuted and with which pa­ra­me­ters.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringAn iden­ti­fier (does not have to be unique. Typ­i­cally refers to the name of the ser­vice to be ex­e­cuted)
path
(re­quired)
stringThe path to the bi­nary of the ser­vice to be ex­e­cuted. This prop­erty is spe­cific to the runtime. For ex­am­ple, for the docker run­time, this prop­erty refers to the Docker image.
ar­gu­ments
(re­quired)
arrayA list of ar­gu­ments to pass to the ser­vice. May be empty.
run­time
(re­quired)
stringThe name of the run­time that will ex­e­cute the ser­vice. Built-​in run­times are cur­rently other (for any ser­vice that is ex­e­cutable on the tar­get sys­tem) and docker for Docker con­tain­ers. More run­times can be added through plu­g­ins
run­timeArgs
(op­tional)
arrayA list of ar­gu­ments to pass to the run­time. May be empty.
ser­vi­ceId
(op­tional)
stringThe ID of the pro­cess­ing ser­vice to be ex­e­cuted. May be null if the ex­e­cutable does not refer to a ser­vice.
Example:
YAML
JSON
id: Count Down
path: 'my_docker_image:latest'
runtime: docker
arguments:
  - id: input
    type: input
    dataType: file
    variable:
      id: input_file
      value: /data/input.txt
  - id: output
    type: output
    dataType: directory
    variable:
      id: output_file
      value: /data/output
  - id: arg1
    type: input
    dataType: boolean
    label: '--foobar'
    variable:
      id: akqcqqoedcsaoescyhga
      value: 'true'
runtimeArgs:
  - id: akqcqqoedcsaoescyhgq
    type: input
    dataType: string
    label: '-v'
    variable:
      id: data_mount
      value: '/data:/data'
{
  "id": "Count Down",
  "path": "my_docker_image:latest",
  "runtime": "docker",
  "arguments": [{
    "id": "input",
    "type": "input",
    "dataType": "file",
    "variable": {
      "id": "input_file",
      "value": "/data/input.txt"
    }
  }, {
    "id": "output",
    "type": "output",
    "dataType": "directory",
    "variable": {
      "id": "output_file",
      "value": "/data/output"
    }
  }, {
    "id": "arg1",
    "type": "input",
    "dataType": "boolean",
    "label": "--foobar",
    "variable": {
      "id": "akqcqqoedcsaoescyhga",
      "value": "true"
    }
  }],
  "runtimeArgs": [{
    "id": "akqcqqoedcsaoescyhgq",
    "type": "input",
    "dataType": "string",
    "label": "-v",
    "variable": {
      "id": "data_mount",
      "value": "/data:/data"
    }
  }]
}
3.5.1Arguments

An ar­gu­ment is part of an ex­e­cutable.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringAn ar­gu­ment iden­ti­fier
label
(op­tional)
stringAn op­tional label to use when the ar­gu­ment is passed to the ser­vice (e.g. --input).
vari­able
(re­quired)
ob­jectA vari­able that holds the value of this ar­gu­ment.
type
(re­quired)
stringThe type of this ar­gu­ment. Valid val­ues: input, output
dataType
(re­quired)
stringThe type of the ar­gu­ment value. If this prop­erty is directory, Steep will cre­ate a new di­rec­tory for the ser­vice’s out­put and re­cur­sively search it for re­sult files after the ser­vice has been ex­e­cuted. Oth­er­wise, this prop­erty can be an ar­bi­trary string. New data types with spe­cial han­dling can be added through out­put adapter plu­g­ins.
Example:
YAML
JSON
id: akqcqqoedcsaoescyhgq
type: input
dataType: string
label: '-v'
variable:
  id: data_mount
  value: '/data:/data'
{
  "id": "akqcqqoedcsaoescyhgq",
  "type": "input",
  "dataType": "string",
  "label": "-v",
  "variable": {
    "id": "data_mount",
    "value": "/data:/data"
  }
}
3.5.2Argument variables

An ar­gu­ment vari­able holds the value of an ar­gu­ment.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringThe vari­able’s unique iden­ti­fier
value
(re­quired)
stringThe vari­able’s value
Example:
YAML
JSON
id: data_mount
value: '/data:/data'
{
  "id": "data_mount",
  "value": "/data:/data"
}

3.6Submissions

A sub­mis­sion is cre­ated when you sub­mit a work­flow through the /​workflows end­point. It con­tains in­for­ma­tion about the work­flow ex­e­cu­tion such as the start and end time as well as the cur­rent sta­tus.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringUnique sub­mis­sion iden­ti­fier
work­flow
(re­quired)
ob­jectThe sub­mit­ted work­flow
start­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the work­flow ex­e­cu­tion was started. May be null if the ex­e­cu­tion has not started yet.
end­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the work­flow ex­e­cu­tion fin­ished. May be null if the ex­e­cu­tion has not fin­ished yet.
sta­tus
(re­quired)
stringThe cur­rent sta­tus of the sub­mis­sion
re­quired­Ca­pa­bil­i­tiesarrayA set of strings spec­i­fy­ing ca­pa­bil­i­ties a host sys­tem must pro­vide to be able to ex­e­cute this work­flow. See also se­tups.
run­ning­Process­Chains
(re­quired)
num­berThe num­ber of process chains cur­rently being ex­e­cuted
can­celled­Process­Chains
(re­quired)
num­berThe num­ber of process chains that have been can­celled
suc­ceed­ed­Process­Chains
(re­quired)
num­berThe num­ber of process chains that have fin­ished suc­cess­fully
failed­Process­Chains
(re­quired)
num­berThe num­ber of process chains whose ex­e­cu­tion has failed
to­tal­Process­Chains
(re­quired)
num­berThe cur­rent total num­ber of process chains in this sub­mis­sion. May in­crease dur­ing ex­e­cu­tion when new process chains are gen­er­ated.
re­sults
(op­tional)
ob­jectIf status is SUCCESS or PARTIAL_​SUCCESS, this prop­erty con­tains the list of work­flow re­sult files grouped by their out­put vari­able ID. Oth­er­wise, it is null.
er­rorMes­sage
(op­tional)
stringIf status is ERROR, this prop­erty con­tains a human-​readable error mes­sage. Oth­er­wise, it is null.
Example:
YAML
JSON
id: aiq7eios7ubxglkcqx5a
workflow:
  api: 4.1.0
  vars:
    - id: myInputFile
      value: /data/input.txt
    - id: myOutputFile
  actions:
    - type: execute
      service: cp
      inputs:
        - id: input_file
          var: myInputFile
      outputs:
        - id: output_file
          var: myOutputFile
          store: true
startTime: '2020-02-13T15:38:58.719382Z'
endTime: '2020-02-13T15:39:00.807715Z'
status: SUCCESS
runningProcessChains: 0
cancelledProcessChains: 0
succeededProcessChains: 1
failedProcessChains: 0
totalProcessChains: 1
results:
  myOutputFile:
    - /data/out/aiq7eios7ubxglkcqx5a/aiq7hygs7ubxglkcrf5a
{
  "id": "aiq7eios7ubxglkcqx5a",
  "workflow": {
    "api": "4.1.0",
    "vars": [{
      "id": "myInputFile",
      "value": "/data/input.txt"
    }, {
      "id": "myOutputFile"
    }],
    "actions": [{
      "type": "execute",
      "service": "cp",
      "inputs": [{
        "id": "input_file",
        "var": "myInputFile"
      }],
      "outputs": [{
        "id": "output_file",
        "var": "myOutputFile",
        "store": true
      }]
    }]
  },
  "startTime": "2020-02-13T15:38:58.719382Z",
  "endTime": "2020-02-13T15:39:00.807715Z",
  "status": "SUCCESS",
  "runningProcessChains": 0,
  "cancelledProcessChains": 0,
  "succeededProcessChains": 1,
  "failedProcessChains": 0,
  "totalProcessChains": 1,
  "results": {
    "myOutputFile": [
      "/data/out/aiq7eios7ubxglkcqx5a/aiq7hygs7ubxglkcrf5a"
    ]
  }
}
3.6.1Submission status

The fol­low­ing table shows the sta­tuses a sub­mis­sion can have:

Sta­tusDe­scrip­tion
AC­CEPTEDThe sub­mis­sion has been ac­cepted by Steep but ex­e­cu­tion has not started yet
RUN­NINGThe sub­mis­sion is cur­rently being ex­e­cuted
CAN­CELLEDThe sub­mis­sion was can­celled
SUC­CESSThe ex­e­cu­tion of the sub­mis­sion fin­ished suc­cess­fully
PAR­TIAL_SUC­CESSThe sub­mis­sion was ex­e­cuted com­pletely but one or more process chains failed
ERRORThe ex­e­cu­tion of the sub­mis­sion failed

3.8Agents

An agent rep­re­sents an in­stance of Steep that can ex­e­cute process chains.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringA unique agent iden­ti­fier
avail­able
(re­quired)
booleantrue if the agent is cur­rently idle and new process chains can be as­signed to it, false if it is busy ex­e­cut­ing a process chain
ca­pa­bil­i­ties
(re­quired)
arrayA set of strings spec­i­fy­ing ca­pa­bil­i­ties the agent pro­vides. See also se­tups.
start­Time
(re­quired)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the agent has started.
stat­e­Changed­Time
(re­quired)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the value of the available prop­erty changed from true to false (i.e. when the agent be­came busy) or from false to true (when it be­came avail­able)
Example:
YAML
JSON
id: akuxryerojbw7mnvovaa
available: true
capabilities:
  - docker
startTime: '2020-05-26T10:50:02.001998Z'
stateChangedTime: '2020-05-26T11:06:52.367121Z'
{
  "id": "akuxryerojbw7mnvovaa",
  "available": true,
  "capabilities": ["docker"],
  "startTime": "2020-05-26T10:50:02.001998Z",
  "stateChangedTime": "2020-05-26T11:06:52.367121Z"
}

3.9VMs

This data model de­scribes vir­tual ma­chines cre­ated by Steep’s cloud man­ager.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringA unique VM iden­ti­fier
ex­ter­nalId
(op­tional)
stringAn iden­ti­fier gen­er­ated by the Cloud plat­form
ipAd­dress
(op­tional)
stringThe VM’s IP ad­dress
setup
(re­quired)
ob­jectThe setup used to cre­ate this VM
cre­ation­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the VM was cre­ated. This prop­erty is null if the VM has not been cre­ated yet.
agen­tJoin­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when a Steep agent has been de­ployed to the VM and has joined the clus­ter. This prop­erty is null if the agent has not joined the clus­ter yet.
de­struc­tion­Time
(op­tional)
stringAn ISO 8601 time­stamp de­not­ing the date and time when the VM was de­stroyed. This prop­erty is null if the VM has not been de­stroyed yet.
sta­tus
(re­quired)
stringThe sta­tus of the VM
rea­son
(op­tional)
stringThe rea­son why the VM has the cur­rent sta­tus (e.g. an error mes­sage if it has the ERROR sta­tus or a sim­ple mes­sage in­di­cat­ing why it has been DESTROYED)
3.9.1VM status

The fol­low­ing table shows the sta­tuses a VM can have:

Sta­tusDe­scrip­tion
CRE­AT­INGThe VM is cur­rently being cre­ated
PRO­VI­SION­INGThe VM has been cre­ated and is cur­rently being pro­vi­sioned (i.e. pro­vi­sion­ing scripts de­fined in the VM’s setup are being ex­e­cuted and the Steep agent is being de­ployed)
RUN­NINGThe VM has been cre­ated and pro­vi­sioned suc­cess­fully. It is cur­rently run­ning and reg­is­tered as a re­mote agent.
LEFTThe re­mote agent on this VM has left. It will be de­stroyed even­tu­ally.
DE­STROY­INGThe VM is cur­rently being de­stroyed
DE­STROYEDThe VM has been de­stroyed
ERRORThe VM could not be cre­ated, pro­vi­sioned, or failed oth­er­wise. See the VM’s reason prop­erty for more in­for­ma­tion.

3.10Setups

A setup de­scribes how a vir­tual ma­chine (VM) should be cre­ated by Steep’s cloud man­ager.

Prop­ertyTypeDe­scrip­tion
id
(re­quired)
stringA unique setup iden­ti­fier
fla­vor
(re­quired)
stringThe fla­vor of the new VM
im­a­ge­Name
(re­quired)
stringThe name of the VM image to de­ploy
avail­abil­i­ty­Zone
(re­quired)
stringThe avail­abil­ity zone in which to cre­ate the VM
block­De­vice­SizeGb
(re­quired)
num­berThe size of the VM’s block de­vice in gi­ga­bytes
block­De­viceVol­ume­Type
(op­tional)
stringAn op­tional type of the VM’s block de­vice. By de­fault, the type will be se­lected au­to­mat­i­cally
min­VMs
(op­tional)
num­berAn op­tional min­i­mum num­ber of VMs to cre­ate with this setup. The de­fault value is 0.
maxVMs
(re­quired)
num­berThe max­i­mum num­ber of VMs to cre­ate with this setup
max­Cre­ate­Con­cur­rent
(op­tional)
num­berThe max­i­mum num­ber of VMs to cre­ate and pro­vi­sion con­cur­rently. The de­fault value is 1.
pro­vi­sion­ingScripts
(op­tional)
arrayAn op­tional list of paths to scripts that should be ex­e­cuted on the VM after it has been cre­ated
pro­vid­ed­Ca­pa­bil­i­ties
(op­tional)
arrayAn op­tional set of strings spec­i­fy­ing ca­pa­bil­i­ties that VMs with this setup will have
sshUser­name
(op­tional)
stringAn op­tional user­name for the SSH con­nec­tion to the cre­ated VM. Over­rides the main con­fig­u­ra­tion item steep.​cloud.​ssh.​username if it is de­fined.
Example:
YAML
JSON
id: default
flavor: 7d217779-4d7b-4689-8a40-c12a377b946d
imageName: Ubuntu 18.04
availabilityZone: nova
blockDeviceSizeGb: 50
minVMs: 0
maxVMs: 4
provisioningScripts:
  - conf/setups/default/01_docker.sh
  - conf/setups/default/02_steep.sh
providedCapabilities:
  - docker
{
  "id": "default",
  "flavor": "7d217779-4d7b-4689-8a40-c12a377b946d",
  "imageName": "Ubuntu 18.04",
  "availabilityZone": "nova",
  "blockDeviceSizeGb": 50,
  "minVMs": 0,
  "maxVMs": 4,
  "provisioningScripts": [
    "conf/setups/default/01_docker.sh",
    "conf/setups/default/02_steep.sh"
  ],
  "providedCapabilities": ["docker"]
}

3.11Pool agent parameters

Steep’s cloud man­ager com­po­nent is able to cre­ate vir­tual ma­chines and to de­ploy re­mote agent in­stances to it. The cloud man­ager keeps every re­mote agent cre­ated in a pool. Use pool agent pa­ra­me­ters to de­fine a min­i­mum and max­i­mum num­ber of in­stances per pro­vided ca­pa­bil­ity set.

Prop­ertyTypeDe­scrip­tion
ca­pa­bil­i­ties
(re­quired)
arrayA set of strings spec­i­fy­ing ca­pa­bil­i­ties that a re­mote agent must pro­vide so these pa­ra­me­ters apply to it
min
(op­tional)
num­berAn op­tional min­i­mum num­ber of re­mote agents that the cloud man­ager should cre­ate with the given ca­pa­bil­i­ties
max
(op­tional)
num­berAn op­tional max­i­mum num­ber of re­mote agents that the cloud man­ager is al­lowed to cre­ate with the given ca­pa­bil­i­ties
Example:
YAML
JSON
capabilities:
  - docker
  - python
min: 1
max: 5
{
  "capabilities": ["docker", "python"],
  "min": 1,
  "max": 5
}

4HTTP endpoints

The main way to com­mu­ni­cate with Steep (i.e. to sub­mit work­flows, to mon­i­tor progress, fetch meta­data, etc.) is through its HTTP in­ter­face. In this sec­tion, we de­scribe all HTTP end­points. By de­fault, Steep lis­tens to in­com­ing con­nec­tions on port 8080.

4.1GET information

Get in­for­ma­tion about Steep. This in­cludes:

  • Steep’s ver­sion num­ber
  • A build ID
  • A SHA of the Git com­mit for which the build was cre­ated
  • A time­stamp of the mo­ment when the build was cre­ated
Re­source URL
/
Pa­ra­me­ters
None
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
Ex­am­ple re­quest
GET / HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-type: application/json
content-length: 127

{
  "build": "83",
  "commit": "2e54898b3e15da0015a1831bf6f6abc94a43eaee",
  "timestamp": 1590049676916,
  "version": "5.4.0"
}

4.2GET submissions

Get in­for­ma­tion about all sub­mis­sions in the data­base. The re­sponse is a JSON array con­sist­ing of sub­mis­sion ob­jects with­out the prop­er­ties workflow, results, and errorMessage. In order to get the com­plete de­tails of a sub­mis­sion, use the GET sub­mis­sion by ID end­point.

The sub­mis­sions are re­turned in the order in which they were added to the data­base with the newest ones at the top.

Re­source URL
/workflows
Pa­ra­me­ters
size
(op­tional)
The max­i­mum num­ber of sub­mis­sions to re­turn. The de­fault value is 10.
off­set
(op­tional)
The off­set of the first sub­mis­sion to re­turn. The de­fault value is 0.
sta­tus
(op­tional)
If this pa­ra­me­ter is de­fined, Steep will only re­turn sub­mis­sions with the given sta­tus. Oth­er­wise, it will re­turn all sub­mis­sions from the data­base. See the list of sub­mis­sion sta­tuses for valid val­ues.
Re­sponse head­ers
x-​page-sizeThe size of the cur­rent page (i.e. the max­i­mum num­ber of sub­mis­sion ob­jects re­turned). See size re­quest pa­ra­me­ter.
x-​page-offsetThe off­set of the first sub­mis­sion re­turned. See offset re­quest pa­ra­me­ter
x-​page-totalThe total num­ber of sub­mis­sions in the data­base match­ing the given re­quest pa­ra­me­ters.
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
400One of the pa­ra­me­ters was in­valid. See re­sponse body for error mes­sage.
Ex­am­ple re­quest
GET /workflows HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 662
content-type: application/json
x-page-offset: 0
x-page-size: 10
x-page-total: 2

[
  {
    "id": "akpm6yojjigral4cdxgq",
    "startTime": "2020-05-18T08:44:01.045710Z",
    "endTime": "2020-05-18T08:44:21.218425Z",
    "status": "SUCCESS",
    "requiredCapabilities": [],
    "runningProcessChains": 0,
    "cancelledProcessChains": 0,
    "succeededProcessChains": 10,
    "failedProcessChains": 0,
    "totalProcessChains": 10
  },
  {
    "id": "akttc5kv575splk3ameq",
    "startTime": "2020-05-24T17:20:37.343072Z",
    "status": "RUNNING",
    "requiredCapabilities": [],
    "runningProcessChains": 1,
    "cancelledProcessChains": 0,
    "succeededProcessChains": 391,
    "failedProcessChains": 0,
    "totalProcessChains": 1000
  }
]

4.3GET submission by ID

Get de­tails about a sin­gle sub­mis­sion from the data­base.

Re­source URL
/workflows/:id
Pa­ra­me­ters
idThe ID of the sub­mis­sion to re­turn
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
404The subms­sion was not found
Ex­am­ple re­quest
GET /workflows/akpm6yojjigral4cdxgq HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 348
content-type: application/json

{
  "id": "akpm6yojjigral4cdxgq",
  "startTime": "2020-05-18T08:44:01.045710Z",
  "endTime": "2020-05-18T08:44:21.218425Z",
  "status": "SUCCESS",
  "requiredCapabilities": [],
  "runningProcessChains": 0,
  "cancelledProcessChains": 0,
  "succeededProcessChains": 10,
  "failedProcessChains": 0,
  "totalProcessChains": 10,
  "workflow": {
    "api": "4.1.0",
    "vars": [
      …
    ],
    "actions": [
      …
    ]
  }
}

4.4PUT submission

Up­date a sub­mis­sion. The re­quest body is a JSON ob­ject with the sub­mis­sion prop­er­ties to up­date. At the mo­ment, only the status prop­erty can be up­dated.

If the op­er­a­tion was suc­cess­ful, the re­sponse body con­tains the up­dated sub­mis­sion with­out the prop­er­ties workflow, results, and errorMessage.

Note: You can use this end­point to can­cel the ex­e­cu­tion of a sub­mis­sion (see ex­am­ple below).

Re­source URL
/workflows/:id
Pa­ra­me­ters
idThe ID of the sub­mis­sion to up­date
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
400The re­quest body was in­valid
404The sub­mis­sion was not found
Ex­am­ple re­quest
PUT /workflows/akujvtkv575splk3saqa HTTP/1.1
Content-Length: 28
Content-Type: application/json

{
  "status": "CANCELLED"
}
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 168
content-type: application/json

{
  "id": "akujvtkv575splk3saqa",
  "startTime": "2020-05-25T19:02:21.610396Z",
  "endTime": "2020-05-25T19:02:33.414032Z",
  "status": "CANCELLED",
  "runningProcessChains": 0,
  "cancelledProcessChains": 314,
  "succeededProcessChains": 686,
  "failedProcessChains": 0,
  "totalProcessChains": 1000
}

4.5POST workflow

Cre­ate a new sub­mis­sion. The re­quest body con­tains the work­flow to ex­e­cute.

If the op­er­a­tion was suc­cess­ful, the re­sponse body con­tains sub­mis­sion.

Re­source URL
/workflows
Sta­tus codes
202The work­flow has been ac­cepted (i.e. stored in the data­base) and is sched­uled for ex­e­cu­tion.
400The posted work­flow was in­valid. See re­sponse body for more in­for­ma­tion.
Ex­am­ple re­quest
POST /workflows HTTP/1.1
Content-Length: 231
Content-Type: application/json

{
  "api": "3.0.0",
  "vars": [{
    "id": "sleep_seconds",
    "value": 3
  }],
  "actions": [{
    "type": "execute",
    "service": "sleep",
    "inputs": [{
      "id": "seconds",
      "var": "sleep_seconds"
    }]
  }]
}
Ex­am­ple re­sponse
HTTP/1.1 202 Accepted
content-encoding: gzip
content-length: 374
content-type: application/json

{
  "id": "akukkcsv575splk3v2ma",
  "status": "ACCEPTED",
  "workflow": {
    "api": "3.0.0",
    "vars": [{
      "id": "sleep_seconds",
      "value": 3
    }],
    "actions": [{
      "type": "execute",
      "service": "sleep",
      "inputs": [],
      "outputs": [],
      "inputs": [{
        "id": "seconds",
        "var": "sleep_seconds"
      }]
    }]
  }
}

4.6GET process chains

Get in­for­ma­tion about all process chains in the data­base. The re­sponse is a JSON array con­sist­ing of process chain ob­jects with­out the prop­er­ties executables and results. In order to get the com­plete de­tails of a process chain, use the GET process chain by ID end­point.

The process chains are re­turned in the order in which they were added to the data­base with the newest ones at the top.

Re­source URL
/processchains
Pa­ra­me­ters
size
(op­tional)
The max­i­mum num­ber of process chains to re­turn. The de­fault value is 10.
off­set
(op­tional)
The off­set of the first process chain to re­turn. The de­fault value is 0.
sub­mis­sionId
(op­tional)
If this pa­ra­me­ter is de­fined, Steep will only re­turn process chains from the sub­mis­sion with the given ID. Oth­er­wise, it will re­turn process chains from all sub­mis­sions. If there is no sub­mis­sion with the given ID, the re­sult will be an empty array.
sta­tus
(op­tional)
If this pa­ra­me­ter is de­fined, Steep will only re­turn process chains with the given sta­tus. Oth­er­wise, it will re­turn all process chains from the data­base. See the list of process chain sta­tuses for valid val­ues.
Re­sponse head­ers
x-​page-sizeThe size of the cur­rent page (i.e. the max­i­mum num­ber of process chain ob­jects re­turned). See size re­quest pa­ra­me­ter.
x-​page-offsetThe off­set of the first process chain re­turned. See offset re­quest pa­ra­me­ter
x-​page-totalThe total num­ber of process chains in the data­base match­ing the given re­quest pa­ra­me­ters.
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
400One of the pa­ra­me­ters was in­valid. See re­sponse body for error mes­sage.
Ex­am­ple re­quest
GET /processchains HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 262
content-type: application/json
x-page-offset: 0
x-page-size: 10
x-page-total: 7026

[
  {
    "id": "akukkcsv575splk3v2na",
    "submissionId": "akukkcsv575splk3v2ma",
    "startTime": "2020-05-25T19:46:02.532829Z",
    "endTime": "2020-05-25T19:46:05.546807Z",
    "status": "SUCCESS",
    "requiredCapabilities": []
  },
  {
    "id": "akujvtkv575splk3tppq",
    "submissionId": "akujvtkv575splk3saqa",
    "status": "CANCELLED",
    "requiredCapabilities": []
  },
  …
]

4.7GET process chain by ID

Get de­tails about a sin­gle process chain from the data­base.

Re­source URL
/processchains/:id
Pa­ra­me­ters
idThe ID of the process chain to re­turn
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
404The process chain was not found
Ex­am­ple re­quest
GET /processchains/akukkcsv575splk3v2na HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 535
content-type: application/json

{
  "id": "akukkcsv575splk3v2na",
  "submissionId": "akukkcsv575splk3v2ma",
  "startTime": "2020-05-25T19:46:02.532829Z",
  "endTime": "2020-05-25T19:46:05.546807Z",
  "status": "SUCCESS",
  "requiredCapabilities": [],
  "results": {},
  "executables": [{
    "id": "sleep",
    "path": "sleep",
    "runtime": "other",
    "runtimeArgs": [],
    "arguments": [{
      "id": "seconds",
      "type": "input",
      "dataType": "integer",
      "variable": {
        "id": "sleep_seconds",
        "value": "3"
      }
    }]
  }]
}

4.8PUT process chain

Up­date a process chain. The re­quest body is a JSON ob­ject with the process chain prop­er­ties to up­date. At the mo­ment, only the status prop­erty can be up­dated.

If the op­er­a­tion was suc­cess­ful, the re­sponse body con­tains the up­dated process chain with­out the prop­er­ties executables and results.

Note: You can use this end­point to can­cel the ex­e­cu­tion of a process chain (see ex­am­ple below).

Re­source URL
/processchains/:id
Pa­ra­me­ters
idThe ID of the process chain to up­date
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
400The re­quest body was in­valid
404The process chain was not found
Ex­am­ple re­quest
PUT /processchains/akuxzp4rojbw7mnvovcq HTTP/1.1
Content-Length: 28
Content-Type: application/json

{
  "status": "CANCELLED"
}
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 222
content-type: application/json

{
  "id": "akuxzp4rojbw7mnvovcq",
  "submissionId": "akuxzp4rojbw7mnvovbq",
  "startTime": "2020-05-26T11:06:24.055225Z",
  "endTime": "2020-05-26T11:06:52.367194Z",
  "status": "CANCELLED",
  "requiredCapabilities": []
}

4.9GET agents

Get in­for­ma­tion about all agents cur­rently con­nected to the clus­ter. In order to get de­tails about a sin­gle agent, use the GET agent by ID end­point.

Re­source URL
/agents
Pa­ra­me­ters
None
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
Ex­am­ple re­quest
GET /agents HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 195
content-type: application/json

[
  {
    "id": "akuxryerojbw7mnvovaa",
    "available": false,
    "capabilities": [],
    "startTime": "2020-05-26T10:50:02.001998Z",
    "stateChangedTime": "2020-05-26T11:06:52.367121Z"
  },
  {
    "id": "akvn7r3szw5wiztrnotq",
    "available": true,
    "capabilities": [],
    "startTime": "2020-05-27T12:21:24.548640Z",
    "stateChangedTime": "2020-05-27T12:21:24.548640Z"
  }
]

4.10GET agent by ID

Get de­tails about a sin­gle agent.

Re­source URL
/agents/:id
Pa­ra­me­ters
idThe ID of the agent to re­turn
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
404The agent was not found
Ex­am­ple re­quest
GET /processchains/akuxryerojbw7mnvovaa HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 177
content-type: application/json

{
  "id": "akuxryerojbw7mnvovaa",
  "available": false,
  "capabilities": [],
  "startTime": "2020-05-26T10:50:02.001998Z",
  "stateChangedTime": "2020-05-26T11:06:52.367121Z"
}

4.11GET VMs

Get in­for­ma­tion about all VMs in the data­base. To get de­tails about a sin­gle VM, use the GET VM by ID end­point.

The VMs are re­turned in the order in which they were added to the data­base with the newest ones at the top.

Re­source URL
/vms
Pa­ra­me­ters
size
(op­tional)
The max­i­mum num­ber of VMs to re­turn. The de­fault value is 10.
off­set
(op­tional)
The off­set of the first VM to re­turn. The de­fault value is 0.
sta­tus
(op­tional)
If this pa­ra­me­ter is de­fined, Steep will only re­turn VMs with the given sta­tus. Oth­er­wise, it will re­turn all VMs from the data­base. See the list of VM sta­tuses for valid val­ues.
Re­sponse head­ers
x-​page-sizeThe size of the cur­rent page (i.e. the max­i­mum num­ber of VM ob­jects re­turned). See size re­quest pa­ra­me­ter.
x-​page-offsetThe off­set of the first VM re­turned. See offset re­quest pa­ra­me­ter
x-​page-totalThe total num­ber of VMs in the data­base match­ing the given re­quest pa­ra­me­ters.
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
400One of the pa­ra­me­ters was in­valid. See re­sponse body for error mes­sage.
Ex­am­ple re­quest
GET /vms HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 2402
content-type: application/json
x-page-offset: 0
x-page-size: 10
x-page-total: 614

[
  {
    "id": "akvn5rmvrozqzj5k3n7a",
    "externalId": "cc6bb115-5852-4646-87c0-d61a9e275722",
    "ipAddress": "10.90.5.10",
    "creationTime": "2020-05-27T12:17:01.861596Z",
    "agentJoinTime": "2020-05-27T12:18:27.957192Z",
    "status": "LEFT",
    "setup": {
      "id": "default",
      "flavor": "7d217779-4d7b-4689-8a40-c12a377b946d",
      "imageName": "Ubuntu 18.04",
      "availabilityZone": "nova",
      "blockDeviceSizeGb": 50,
      "minVMs": 0,
      "maxVMs": 4,
      "provisioningScripts": [
        "conf/setups/default/01_docker.sh",
        "conf/setups/default/02_steep.sh"
      ],
      "providedCapabilities": ["docker"]
    }
  },
  {
    "id": "akvnmkuvrozqzj5k3mza",
    "externalId": "f9ecfb9c-d0a1-45c9-87fc-3595bebc85c6",
    "ipAddress": "10.90.5.24",
    "creationTime": "2020-05-27T11:40:19.142991Z",
    "agentJoinTime": "2020-05-27T11:41:42.349354Z",
    "destructionTime": "2020-05-27T11:50:58.961455Z",
    "status": "DESTROYED",
    "reason": "Agent has left the cluster",
    "setup": {
      "id": "default",
      "flavor": "7d217779-4d7b-4689-8a40-c12a377b946d",
      "imageName": "Ubuntu 18.04",
      "availabilityZone": "nova",
      "blockDeviceSizeGb": 50,
      "minVMs": 0,
      "maxVMs": 4,
      "provisioningScripts": [
        "conf/setups/default/01_docker.sh",
        "conf/setups/default/02_steep.sh"
      ],
      "providedCapabilities": ["docker"]
    }
  },
  …
]

4.12GET VM by ID

Get de­tails about a sin­gle VM from the data­base.

Re­source URL
/vms/:id
Pa­ra­me­ters
idThe ID of the VM to re­turn
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
404The VM was not found
Ex­am­ple re­quest
GET /vms/akvn5rmvrozqzj5k3n7a HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 617
content-type: application/json

{
  "id": "akvn5rmvrozqzj5k3n7a",
  "externalId": "cc6bb115-5852-4646-87c0-d61a9e275722",
  "ipAddress": "10.90.5.10",
  "creationTime": "2020-05-27T12:17:01.861596Z",
  "agentJoinTime": "2020-05-27T12:18:27.957192Z",
  "status": "LEFT",
  "setup": {
    "id": "default",
    "flavor": "7d217779-4d7b-4689-8a40-c12a377b946d",
    "imageName": "Ubuntu 18.04",
    "availabilityZone": "nova",
    "blockDeviceSizeGb": 50,
    "minVMs": 0,
    "maxVMs": 4,
    "provisioningScripts": [
      "conf/setups/default/01_docker.sh",
      "conf/setups/default/02_steep.sh"
    ],
    "providedCapabilities": ["docker"]
  }
}

4.13GET services

Get in­for­ma­tion about all con­fig­ured ser­vice meta­data. To get meta­data of a sin­gle ser­vice, use the GET ser­vice by ID end­point.

Re­source URL
/services
Pa­ra­me­ters
None
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
Ex­am­ple re­quest
GET /services HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 2167
content-type: application/json

[
  {
    "id": "cp",
    "name": "cp",
    "description": "Copies files",
    "path": "cp",
    "runtime": "other",
    "parameters": [{
      "id": "no_overwrite",
      "name": "No overwrite",
      "description": "Do not overwrite existing file",
      "type": "input",
      "cardinality": "1..1",
      "data_type": "boolean",
      "default": false,
      "label": "-n"
    }, {
      "id": "input_file",
      "name": "Input file name",
      "description": "Input file name",
      "type": "input",
      "cardinality": "1..1",
      "data_type": "file"
    }, {
      "id": "output_file",
      "name": "Output file name",
      "description": "Output file name",
      "type": "output",
      "cardinality": "1..1",
      "data_type": "file"
    }],
    "runtime_args": [],
    "required_capabilities": []
  }, {
    "id": "sleep",
    "name": "sleep",
    "description": "sleeps for the given amount of seconds",
    "path": "sleep",
    "runtime": "other",
    "parameters": [{
      "id": "seconds",
      "name": "seconds to sleep",
      "description": "The number of seconds to sleep",
      "type": "input",
      "cardinality": "1..1",
      "data_type": "integer"
    }],
    "runtime_args": [],
    "required_capabilities": []
  },
  …
]

4.14GET service by ID

Get con­fig­ured meta­data of a sin­gle ser­vice.

Re­source URL
/services/:id
Pa­ra­me­ters
idThe ID of the ser­vice to re­turn
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
404The ser­vice meta­data was not found
Ex­am­ple re­quest
GET /services/sleep HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 401
content-type: application/json

{
  "id": "sleep",
  "name": "sleep",
  "description": "sleeps for the given amount of seconds",
  "path": "sleep",
  "runtime": "other",
  "parameters": [{
    "id": "seconds",
    "name": "seconds to sleep",
    "description": "The number of seconds to sleep",
    "type": "input",
    "cardinality": "1..1",
    "data_type": "integer"
  }],
  "runtime_args": [],
  "required_capabilities": []
}

4.15GET Prometheus metrics

Steep can pro­vide met­rics to Prometheus. Be­sides sta­tis­tics about the Java Vir­tual Ma­chine that Steep is run­ning in, the fol­low­ing met­rics are in­cluded:

Met­ricDe­scrip­tion
steep_​remote_​agentsThe num­ber of reg­is­tered re­mote agents
steep_​controller_​process_​chainsThe num­ber of process chains the con­troller is cur­rently wait­ing for
Re­source URL
/metrics
Pa­ra­me­ters
None
Sta­tus codes
200The op­er­a­tion was suc­cess­ful
Ex­am­ple re­quest
GET /metrics HTTP/1.1
Ex­am­ple re­sponse
HTTP/1.1 200 OK
content-type: text/plain
content-encoding: gzip
content-length: 1674

# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 2.1695392E8
jvm_memory_bytes_used{area="nonheap",} 1.46509968E8
…
# HELP steep_remote_agents Number of registered remote agents
# TYPE steep_remote_agents gauge
steep_remote_agents 1.0
…
# HELP steep_controller_process_chains Number of process chains the controller is waiting for
# TYPE steep_controller_process_chains gauge
steep_controller_process_chains 0.0
…

5Web-based user interface

Steep has a web-​based user in­ter­face that al­lows you to mon­i­tor the ex­e­cu­tion of run­ning work­flows, process chains, agents, and VMS, as well as to browse the data­base con­tents.

Start Steep and visit any of the above-​mentioned HTTP end­points with your web browser to open the user in­ter­face.

6Configuration

After you have down­loaded and ex­tracted Steep, you can find its con­fig­u­ra­tion files under the conf di­rec­tory. The fol­low­ing sec­tions de­scribe each of these files in de­tail.

6.1steep.yaml

The file steep.​yaml con­tains the main con­fig­u­ra­tion of Steep. In this sec­tion, we de­scribe all con­fig­u­ra­tion keys and val­ues you can set.

Note that keys are spec­i­fied using the dot no­ta­tion. You can use them as they are given here or use YAML no­ta­tion in­stead. For ex­am­ple, the fol­low­ing con­fig­u­ra­tion item

steep.cluster.eventBus.publicPort: 41187

is iden­ti­cal to:

steep:
  cluster:
    eventBus:
      publicPort: 41187

You may over­ride items in your con­fig­u­ra­tion file with en­vi­ron­ment vari­ables. This is par­tic­u­larly use­ful if you are using Steep in­side a Docker con­tainer. The en­vi­ron­ment vari­ables use a slightly dif­fer­ent nam­ing scheme. All vari­ables are in cap­i­tal let­ters and dots are re­placed by un­der­scores. For ex­am­ple, the con­fig­u­ra­tion key steep.​http.​host be­comes STEEP_​HTTP_​HOST and steep.​cluster.​eventBus.​publicPort be­comes STEEP_​CLUSTER_​EVENTBUS_​PUBLICPORT. You may use YAML syn­tax to spec­ify en­vi­ron­ment vari­able val­ues. For ex­am­ple, the array steep.​agent.​capabilities can be spec­i­fied as fol­lows:

STEEP_AGENT_CAPABILITIES=["docker", "python"]
6.1.1General configuration

steep.​tmpPath

The path to a di­rec­tory where tem­po­rary files should be stored dur­ing pro­cess­ing. Steep gen­er­ates names for the out­puts of ex­e­cute ac­tions in a work­flow. If the store flag of an out­put pa­ra­me­ter is false (which is the de­fault), the gen­er­ated file­name will be rel­a­tive to this tem­po­rary di­rec­tory.

steep.​outPath

The path to a di­rec­tory where out­put files should be stored. This path will be used in­stead of steep.​tmpPath to gen­er­ate a file­name for an out­put pa­ra­me­ter if its store flag is true.

steep.​overrideConfigFile

The path to a file that keeps ad­di­tional con­fig­u­ra­tion. The val­ues of the overrideConfigFile will be merged into the main con­fig­u­ra­tion file, so it ba­si­cally over­rides the de­fault val­ues. Note that con­fig­u­ra­tion items in this file can still be over­rid­den with en­vi­ron­ment vari­ables. This con­fig­u­ra­tion item is use­ful if you don’t want to change the main con­fig­u­ra­tion file (or if you can­not do so) but still want to set dif­fer­ent con­fig­u­ra­tion val­ues. Use it if you run Steep in a Docker con­tainer and bind mount the overrideConfigFile as a vol­ume.

steep.​services

The path to the con­fig­u­ra­tion files con­tain­ing ser­vice meta­data. Ei­ther a string point­ing to a sin­gle file, a glob pat­tern (e.g. **/​*.​yaml), or an array of files or glob pat­terns.

steep.​plugins

The path to the con­fig­u­ra­tion file(s) con­tain­ing plug­in de­scrip­tors. Ei­ther a string point­ing to a sin­gle file, a glob pat­tern (e.g. **/​*.​yaml), or an array of files or glob pat­terns.

6.1.2Cluster settings

Use these con­fig­u­ra­tion items to build up a clus­ter of Steep in­stances. Under the hood, Steep uses Vert.x and Hazel­cast, so these con­fig­u­ra­tion items are very sim­i­lar to the ones found in these two frame­works. To build up a clus­ter, you need to con­fig­ure an event bus con­nec­tion and a clus­ter con­nec­tion. They should use dif­fer­ent ports. host typ­i­cally refers to the ma­chine your in­stance is run­ning on and publicHost or publicAddress spec­ify the host­name or IP ad­dress that your Steep in­stance will use in your net­work to ad­ver­tise it­self so that other in­stances can con­nect to it.

For more in­for­ma­tion, please read the doc­u­men­ta­tion of Vert.x and Hazel­cast.

steep.​cluster.​eventBus.​host

The IP ad­dress (or host­name) to bind the clus­tered event­bus to

De­fault: Au­to­mat­i­cally de­tected local net­work in­ter­face

steep.​cluster.​eventBus.​port

The port the clus­tered event­bus should lis­ten on

De­fault: A ran­dom port

steep.​cluster.​eventBus.​publicHost

The IP ad­dress (or host­name) the event­bus uses to an­nounce it­self within in the clus­ter

De­fault: Same as steep.​cluster.​eventBus.​host

steep.​cluster.​eventBus.​publicPort

The port that the event­bus uses to an­nounce it­self within in the clus­ter

De­fault: Same as steep.​cluster.​eventBus.​port

steep.​cluster.​hazelcast.​publicAddress

The IP ad­dress (or host­name) and port Hazel­cast uses to an­nounce it­self within in the clus­ter

steep.​cluster.​hazelcast.​port

The port that Hazel­cast should lis­ten on

steep.​cluster.​hazelcast.​interfaces

A list of IP ad­dress pat­terns spec­i­fy­ing valid in­ter­faces Hazel­cast should bind to

steep.​cluster.​hazelcast.​members

A list of IP ad­dresses (or host­names) of Hazel­cast clus­ter mem­bers

steep.​cluster.​hazelcast.​tcpEnabled

true if Hazel­cast should use TCP to con­nect to other in­stances, false if it should use mul­ti­cast

De­fault: false

6.1.3HTTP configuration

steep.​http.​enabled

true if the HTTP in­ter­face should be en­abled

De­fault: true

steep.​http.​host

The host to bind the HTTP server to

De­fault: localhost

steep.​http.​port

The port the HTTP server should lis­ten on

De­fault: 8080

steep.​http.​postMaxSize

The max­i­mum size of HTTP POST bod­ies in bytes

De­fault: 1048576 (1 MB)

steep.​http.​basePath

The path where the HTTP end­points and the web-​based user in­ter­face should be mounted

De­fault: "" (empty string, i.e. no base path)

steep.​http.​cors.​enable

true if Cross-​Origin Re­source Shar­ing (CORS) should be en­abled

De­fault: false

steep.​http.​cors.​allowOrigin

A reg­u­lar ex­pres­sion spec­i­fy­ing al­lowed CORS ori­gins. Use *​ to allow all ori­gins.

De­fault: "$.​" (match noth­ing by de­fault)

steep.​http.​cors.​allowCredentials

true if the Access-​Control-​Allow-​Credentials re­sponse header should be re­turned.

De­fault: false

steep.​http.​cors.​allowHeaders

A string or an array in­di­cat­ing which header field names can be used in a re­quest.

steep.​http.​cors.​allowMethods

A string or an array in­di­cat­ing which HTTP meth­ods can be used in a re­quest.

steep.​http.​cors.​exposeHeaders

A string or an array in­di­cat­ing which head­ers are safe to ex­pose to the API of a CORS API spec­i­fi­ca­tion.

steep.​http.​cors.​maxAge

The num­ber of sec­onds the re­sults of a pre­flight re­quest can be cached in a pre­flight re­sult cache.

6.1.4Controller configuration

steep.​controller.​enabled

true if the con­troller should be en­abled. Set this value to false if your Steep in­stance does not have ac­cess to the shared data­base.

De­fault: true

steep.​controller.​lookupIntervalMilliseconds

The in­ter­val at which the con­troller looks for ac­cepted sub­mis­sions

De­fault: 2000 (2 sec­onds)

steep.​controller.​lookupOrphansIntervalMilliseconds

The in­ter­val at which the con­troller looks for or­phaned run­ning sub­mis­sions (i.e. sub­mis­sions that are in the sta­tus RUNNING but that are cur­rently not being processed by any Con­troller in­stance). If Steep finds such a sub­mis­sion it will try to re­sume it.

De­fault: 300000 (5 min­utes)

6.1.5Scheduler configuration

steep.​scheduler.​enabled

true if the sched­uler should be en­abled. Set this value to false if your Steep in­stance does not have ac­cess to the shared data­base.

De­fault: true

steep.​scheduler.​lookupIntervalMilliseconds

The in­ter­val in which the sched­uler looks for reg­is­tered process chains

De­fault: 20000 (20 sec­onds)

6.1.6Agent configuration

steep.​agent.​enabled

true if this Steep in­stance should be able to ex­e­cute process chains (i.e. if an agent should be de­ployed)

De­fault: true

steep.​agent.​id

Unique iden­ti­fier of this agent in­stance

De­fault: An au­to­mat­i­cally gen­er­ated unique ID

steep.​agent.​capabilities

List of ca­pa­bil­i­ties that this agent pro­vides

De­fault: [] (empty list)

steep.​agent.​autoShutdownTimeoutMinutes

The num­ber of min­utes an agent should re­main idle until it shuts it­self down grace­fully. By de­fault, this value is 0, which means the agent never shuts it­self down.

De­fault: 0

steep.​agent.​busyTimeoutSeconds

The num­ber of sec­onds that should pass be­fore an idle agent de­cides that it is not busy any­more. Nor­mally, the sched­uler al­lo­cates and agent, sends it a process chain, and then deal­lo­cates it after the process chain ex­e­cu­tion has fin­ished. This value is im­por­tant, if the sched­uler crashes while the process chain is being ex­e­cuted and does not deal­lo­cate the agent any­more. In this case, the agent deal­lo­cates it­self after the con­fig­ured time has passed.

De­fault: 60

steep.​agent.​outputLinesToCollect

The num­ber of out­put lines to col­lect at most from each ex­e­cuted ser­vice (also ap­plies to error out­put)

De­fault: 100

6.1.7Runtime settings

steep.​runtimes.​docker.​env

Ad­di­tional en­vi­ron­ment vari­ables that will be passed to con­tain­ers cre­ated by the Docker run­time

Ex­am­ple: ["key=value", "foo=bar"]

De­fault: [] (empty list)

steep.​runtimes.​docker.​volumes

Ad­di­tional vol­ume mounts to be passed to the Docker run­time

Ex­am­ple: ["/​data:/​data"]

De­fault: [] (empty list)

6.1.8Database connection

steep.​db.​driver

The data­base dri­ver

Valid val­ues: inmemory, postgresql, mongodb

De­fault: inmemory

steep.​db.​url

The data­base URL

steep.​db.​username

The data­base user­name (only used by the postgresql dri­ver)

steep.​db.​password

The data­base pass­word (only used by the postgresql dri­ver)

6.1.9Cloud connection

steep.​cloud.​enabled

true if Steep should con­nect to a cloud to ac­quire re­mote agents on de­mand

De­fault: false

steep.​cloud.​driver

De­fines which cloud dri­ver to use

Valid val­ues: openstack (see the Open­Stack cloud dri­ver for more in­for­ma­tion)

steep.​cloud.​createdByTag

A meta­data tag that should be at­tached to vir­tual ma­chines to in­di­cate that they have been cre­ated by Steep

steep.​cloud.​setupsFile

The path to the file that de­scribes all avail­able se­tups. See se­tups.yaml.

steep.​cloud.​syncIntervalSeconds

The num­ber of sec­onds that should pass be­fore the cloud man­ager syn­chro­nizes its in­ter­nal state with the cloud again

De­fault: 120 (2 min­utes)

steep.​cloud.​keepAliveIntervalSeconds

The num­ber of sec­onds that should pass be­fore the cloud man­ager sends keep-​alive mes­sages to a min­i­mum of re­mote agents again (so that they do not shut down them­selves). See minVMs prop­erty of the se­tups data model.

De­fault: 30

steep.​cloud.​agentPool

An array of agent pool pa­ra­me­ters de­scrib­ing how many re­mote agents the cloud man­ager should keep in its pool how many it is al­lowed to cre­ate for each given set of ca­pa­bil­i­ties.

De­fault: [] (empty list)

6.1.10OpenStack cloud driver

steep.​cloud.​openstack.​endpoint

Open­Stack au­then­ti­ca­tion end­point

steep.​cloud.​openstack.​username

Open­Stack user­name used for au­then­ti­ca­tion

steep.​cloud.​openstack.​password

Open­Stack pass­word used for au­then­ti­ca­tion

steep.​cloud.​openstack.​domainName

Open­Stack do­main name used for au­then­ti­ca­tion

steep.​cloud.​openstack.​projectId

The ID of the Open­Stack project to which to con­nect. Ei­ther this con­fig­u­ra­tion item or steep.​cloud.​openstack.​projectName must be set but not both at the same time.

steep.​cloud.​openstack.​projectName

The name of the Open­Stack project to which to con­nect. This con­fig­u­ra­tion item will be used in com­bi­na­tion with steep.​cloud.​openstack.​domainName if steep.​cloud.​openstack.​projectId is not set.

steep.​cloud.​openstack.​networkId

The ID of the Open­Stack net­work to at­tach new VMs to

steep.​cloud.​openstack.​usePublicIp

true if new VMs should have a pub­lic IP ad­dress

De­fault: false

steep.​cloud.​openstack.​securityGroups

The Open­Stack se­cu­rity groups that should be at­tached to new VMs.

De­fault: [] (empty list)

steep.​cloud.​openstack.​keypairName

The name of the key­pair to de­ploy to new VMs. The key­pair must al­ready exist in Open­Stack.

6.1.11SSH connection to VMs

steep.​cloud.​ssh.​username

User­name for SSH ac­cess to VMs. Can be over­ri­den by the sshUsername prop­erty in each setup. May even be null if all se­tups de­fine their own user­name.

steep.​cloud.​ssh.​privateKeyLocation

Lo­ca­tion of a pri­vate key to use for SSH

6.2setups.yaml

The con­fig­u­ra­tion file setups.​yaml con­tains an array of setup ob­jects that Steep’s cloud man­ager com­po­nent uses to cre­ate new vir­tual ma­chines and to de­ploy re­mote agents to it.

6.3services/​services.yaml

The file services/​services.​yaml con­tains an array of ser­vice meta­data ob­jects de­scrib­ing the in­ter­faces of all pro­cess­ing ser­vices Steep can ex­e­cute.

6.4plugins/​common.yaml

The con­fig­u­ra­tion file plugins/​common.​yaml de­scribes all plu­g­ins so Steep can com­pile and use them dur­ing run­time. The file con­tains an array of de­scrip­tor ob­jects with the prop­er­ties spec­i­fied in the sec­tion on ex­tend­ing Steep through plu­g­ins.

7Extending Steep through plugins

Steep can be ex­tended through plu­g­ins. In this sec­tion, we will de­scribe the in­ter­faces of all plu­g­ins and how they need to be con­fig­ured so Steep can com­pile and ex­e­cute them at run­time.

Each plug­in is a Kotlin script with the file ex­ten­sion .​kts. In­side this script, there should be a sin­gle func­tion with the same name as the plug­in and a sig­na­ture that de­pends on the plug­in type. Func­tion in­ter­faces are de­scribed in the sub-​sections below.

All plu­g­ins must be ref­er­enced in the plu­g­ins/com­mon.yaml file. This file is an array of de­scrip­tor ob­jects with at least the fol­low­ing prop­er­ties:

Prop­ertyTypeDe­scrip­tion
name
(re­quired)
stringA unique name of the plug­in (the func­tion in­side the plug­in’s script file must have the same name)
type
(re­quired)
stringThe plug­in type. Valid val­ues are: initializer, outputAdapter, processChainAdapter, and runtime.
script­File
(re­quired)
stringThe path to the plug­in’s Kotlin script file. The file should have the ex­ten­sion .​kts. The path is rel­a­tive to Steep’s ap­pli­ca­tion di­rec­tory, so a valid ex­am­ple is conf/​plugins/​fileOrEmptyList.​kts.

Spe­cific plug­in types may re­quire ad­di­tional prop­er­ties de­scribed in the sub-​sections below.

7.1Initializers

An ini­tial­izer plug­in is a func­tion that will be called dur­ing the ini­tial­iza­tion phase of Steep just be­fore com­po­nents such as the con­troller or the sched­uler are de­ployed. If re­quired, the func­tion can be a suspend func­tion.

Type

initializer

Ad­di­tional prop­er­ties
None
Func­tion in­ter­face
suspend fun myInitializer(vertx: io.vertx.core.Vertx)
Ex­am­ple de­scrip­tor
- name: myInitializer
  type: initializer
  scriptFile: conf/plugins/myInitializer.kts
Ex­am­ple plug­in script
suspend fun myInitializer(vertx: io.vertx.core.Vertx) {
  println("Hello from my initializer plugin")
}

7.2Output adapters

An out­put adapter plug­in is a func­tion that can ma­nip­u­late the out­put of ser­vices de­pend­ing on their pro­duced data type (see the data_​type prop­erty of the ser­vice pa­ra­me­ter data model, as well as the dataType prop­erty of the process chain ar­gu­ment data model).

In other words, if an out­put pa­ra­me­ter of a pro­cess­ing ser­vice has a spe­cific data_​type de­fined in the ser­vice’s meta­data and this data type matches the one given in the out­put adapter’s de­scrip­tor, then the plug­in’s func­tion will be called after the ser­vice has been ex­e­cuted. Steep will pass the out­put ar­gu­ment and the whole process chain (for ref­er­ence) to the plug­in. The out­put ar­gu­ment’s value will be the cur­rent re­sult (i.e. the out­put file or di­rec­tory). The plug­in can mod­ify this file or di­rec­tory (if nec­es­sary) and re­turn a new list of files that will then be used by Steep for fur­ther pro­cess­ing.

Steep has a built-​in out­put adapter for the data type directory. When­ever you spec­ify this data type in the ser­vice meta­data, Steep will pass the out­put di­rec­tory to an in­ter­nal func­tion that re­cur­sively col­lects all files from this di­rec­tory and re­turns them as a list.

The ex­am­ple out­put adapter fileOrEmptyList de­scribed below is also in­cluded in Steep. It checks if an out­put file ex­ists (i.e. if the pro­cess­ing ser­vice has ac­tu­ally cre­ated it) and ei­ther re­turns a list with a sin­gle el­e­ment (the file) or an empty list. This is use­ful if a pro­cess­ing ser­vice has an op­tional out­put that you want to use as input of a sub­se­quent for-​each ac­tion or of the cur­rent for-​each ac­tion via yieldToInput.

If re­quired, the func­tion can be a suspend func­tion.

Type

outputAdapter

Ad­di­tional prop­er­ties
sup­port­ed­DataType
(re­quired)
The data_​type that this plug­in can han­dle
Func­tion in­ter­face
suspend fun myOutputAdapter(output: model.processchain.Argument,
  processChain: model.processchain.ProcessChain,
  vertx: io.vertx.core.Vertx): List<Any>
Ex­am­ple de­scrip­tor
- name: fileOrEmptyList
  type: outputAdapter
  scriptFile: conf/plugins/fileOrEmptyList.kts
  supportedDataType: fileOrEmptyList
Ex­am­ple plug­in script
import io.vertx.core.Vertx
import io.vertx.kotlin.core.file.existsAwait
import model.processchain.Argument
import model.processchain.ProcessChain

suspend fun fileOrEmptyList(output: Argument, processChain: ProcessChain,
    vertx: Vertx): List<String> {
  return if (!vertx.fileSystem().existsAwait(output.variable.value)) {
    emptyList()
  } else {
    listOf(output.variable.value)
  }
}

7.3Process chain adapters

A process chain adapter plug­in is a func­tion that can ma­nip­u­late gen­er­ated process chains be­fore they are ex­e­cuted.

It takes a list of gen­er­ated process chains and re­turns a new list of process chains to ex­e­cute or the given list if no mod­i­fi­ca­tion was made. If re­quired, the func­tion can be a sus­pend func­tion.

Type

processChainAdapter

Ad­di­tional prop­er­ties
None
Func­tion in­ter­face
suspend fun myProcessChainAdapter(processChains: List<model.processchain.ProcessChain>,
  vertx: io.vertx.core.Vertx): List<model.processchain.ProcessChain>
Ex­am­ple de­scrip­tor
- name: myProcessChainAdapter
  type: processChainAdapter
  scriptFile: conf/plugins/myProcessChainAdapter.kts
Ex­am­ple plug­in script
import model.processchain.ProcessChain
import io.vertx.core.Vertx

suspend fun myProcessChainAdapter(processChains: List<ProcessChain>,
    vertx: Vertx): List<ProcessChain> {
  val result = mutableListOf<ProcessChain>()

  for (pc in processChains) {
    // never execute the 'sleep' service
    val executables = pc.executables.filter { e -> e.id != "sleep" }
    result.add(pc.copy(executables = executables))
  }

  return result
}

7.4Custom runtime environments

A run­time plug­in is a func­tion that can run process chain ex­e­cuta­bles in­side a cer­tain run­time en­vi­ron­ment. See the runtime prop­erty of ser­vice meta­data.

The plug­in’s func­tion takes an ex­e­cutable to run and the num­ber of lines to col­lect from the ex­e­cutable’s out­put. It re­turns the out­put gen­er­ated by the ex­e­cutable (trimmed down to given num­ber of lines). If re­quired, the func­tion can be a sus­pend func­tion.

Use this plug­in if you want to im­ple­ment a spe­cial way to ex­e­cute pro­cess­ing ser­vices. For ex­am­ple, you can im­ple­ment a re­mote web ser­vice call, you can use one of the ex­ist­ing run­times and run a cer­tain ser­vice in a spe­cial way (like in the ex­am­ple plug­in below), etc.

Type

runtime

Ad­di­tional prop­er­ties
sup­port­e­dRun­time
(re­quired)
The name of the runtime this plug­in pro­vides. Use this value in your ser­vice meta­data.
Func­tion in­ter­face
suspend fun myRuntime(executable: model.processchain.Executable,
  outputLinesToCollect: Int, vertx: io.vertx.core.Vertx): String
Ex­am­ple de­scrip­tor (Source)
- name: ignoreDiffFitError
  type: runtime
  scriptFile: conf/plugins/ignoreDiffFitError.kts
  supportedRuntime: ignoreDiffFitError
Ex­am­ple plug­in script (Source)
import helper.Shell.ExecutionException
import io.vertx.core.Vertx
import model.processchain.Executable
import runtime.OtherRuntime

fun ignoreDiffFitError(executable: Executable, outputLinesToCollect: Int, vertx: Vertx): String {
  return try {
    OtherRuntime().execute(executable, outputLinesToCollect)
  } catch (e: ExecutionException) {
    // ignore invalid exit code of the mDiffFit service
    if (e.exitCode == 1) {
      ""
    } else {
      throw e
    }
  }
}

7.5Progress estimators

A progress es­ti­ma­tor plug­in is a func­tion that analy­ses the log out­put of a run­ning pro­cess­ing ser­vice to es­ti­mate its cur­rent progress. For ex­am­ple, the plug­in can look for per­cent­ages or num­ber of bytes processed. The re­turned value con­tributes to the ex­e­cu­tion progress of a process chain (see the estimatedProgress prop­erty of the process chain data model).

The func­tion takes the ex­e­cutable that is cur­rently being run and a list of re­cently col­lected out­put lines. It re­turns an es­ti­mated progress be­tween 0.0 (0%) and 1.0 (100%) or null if the progress could not be de­ter­mined. The func­tion will be called for each out­put line col­lected and the newest line is al­ways at the end of the given list. If re­quired, the func­tion can be a suspend func­tion.

Type

progressEstimator

Ad­di­tional prop­er­ties
sup­port­ed­Ser­vi­ceId
(re­quired)
The ID of the ser­vice this es­ti­ma­tor plug­in sup­ports
Func­tion in­ter­face
suspend fun myProgressEstimator(executable: model.processchain.Executable,
  recentLines: List<String>, vertx: io.vertx.core.Vertx): Double?
Ex­am­ple de­scrip­tor
- name: extractArchiveProgressEstimator
  type: progressEstimator
  scriptFile: conf/plugins/extractArchiveProgressEstimator.kts
  supportedServiceId: extract-archive
Ex­am­ple plug­in script
import model.processchain.Executable
import io.vertx.core.Vertx

suspend fun extractArchiveProgressEstimator(executable: Executable,
    recentLines: List<String>, vertx: Vertx): Double? {
  val lastLine = recentLines.last()
  val percentSign = lastLine.indexOf('%')
  if (percentSign > 0) {
    val percentStr = lastLine.substring(0, percentSign)
    val percent = percentStr.trim().toIntOrNull()
    if (percent != null) {
      return percent / 100.0
    }
  }
  return null
}

About

Steep’s de­vel­op­ment is led by the com­pe­tence cen­ter for Spa­tial In­for­ma­tion Man­age­ment of the Fraun­hofer In­sti­tute for Com­puter Graph­ics Re­search IGD in Darm­stadt, Ger­many. Fraun­hofer IGD is the in­ter­na­tional lead­ing re­search in­sti­tu­tion for ap­plied vi­sual com­put­ing. The com­pe­tence cen­ter for Spa­tial In­for­ma­tion Man­age­ment of­fers ex­per­tise and in­no­v­a­tive tech­nolo­gies that en­able suc­cess­ful com­mu­ni­ca­tion and ef­fi­cient co­op­er­a­tion with the help of ge­o­graphic in­for­ma­tion.

Steep was ini­tially de­veloped within the re­search pro­ject “IQmu­lus” (A High-​volume Fu­sion and Ana­lysis Plat­form for Geo­spa­tial Point Clouds, Cov­er­ages and Volu­met­ric Data Sets) fun­ded from the 7th Frame­work Pro­gramme of the Eu­ro­pean Com­mis­sion, call iden­ti­fier FP7-ICT-2011-8, un­der the Grant agree­ment no. 318787 from 2012 to 2016. It was pre­vi­ously called the ‘IQmu­lus Job­Man­ager’ or just the ‘Job­Man­ager’.

Presentations

This pre­sen­ta­tion was given by Michel Krämer at the DATA con­fer­ence 2020. He pre­sented his paper en­ti­tled “Capability-​based sched­ul­ing of sci­en­tific work­flows in the cloud”. Michel talked about Steep’s soft­ware ar­chi­tec­ture and its sched­ul­ing al­go­rithm that as­signs process chains to vir­tual ma­chines.

Publications

Steep and its pre­de­ces­sor Job­Man­ager have ap­peared in at least the fol­low­ing pub­lic­a­tions:

Krämer, M. (2020). Capability-based Scheduling of Scientific Workflows in the Cloud. Proceedings of the 9th International Conference on Data Science, Technology, and Applications DATA, 43–54. https://doi.org/10.5220/0009805400430054
[ PDF ]
Krämer, M. (2018). A Mi­croservice Ar­chi­tec­ture for the Pro­cessing of Large Geo­spa­tial Data in the Cloud (Doc­toral dis­ser­ta­tion). Tech­nis­che Uni­versität Darm­stadt. ht­tps://​doi.org/​10.13140/​RG.2.2.30034.66248
[ PDF ]
Böhm, J., Bredif, M., Gi­er­linger, T., Krämer, M., Linden­bergh, R., Liu, K., … Sir­ma­cek, B. (2016). The IQmu­lus Urban Show­case: Auto­matic Tree Clas­si­fic­a­tion and Iden­ti­fic­a­tion in Huge Mo­bile Map­ping Point Clouds. IS­PRS - In­ter­na­tional Archives of the Pho­to­gram­metry, Re­mote Sens­ing and Spa­tial In­form­a­tion Sci­ences, XLI-B3, 301–307. ht­tps://​doi.org/​10.5194/​is­prs-archives-XLI-B3-301-2016
[ PDF ]
Krämer, M., & Sen­ner, I. (2015). A mod­u­lar soft­ware ar­chi­tec­ture for pro­cessing of big geo­spa­tial data in the cloud. Com­puters & Graph­ics, 49, 69–81. ht­tps://​doi.org/​10.1016/​j.cag.2015.02.005