Luigi in the ROMI Plant Scanner projectLink

Hereafter we explain how we used the luigi Python package in the ROMI Plant Scanner project. If you are not familiar with luigi, let's just say that it is useful to create long-running batch processes where you want to chain many tasks with dependencies and requirements.

In the context of the ROMI Plant Scanner project, we faced the challenge of creating complex pipelines, especially for the 3D reconstruction & analysis of a plant structure after its acquisition in the form of a series of RGB images. Several complex and fairly distinct algorithm are required to achieve our goals, and we thus decided to break down this chain of tasks to achieve greater modularity and robustness.

Using luigi led us to abstract several concepts like Task, Target & Parameter:

a task is limited to a single algorithmic operation with input(s), output(s) & parameter(s);
a target is a (set of) file(s) that can be the input required by a task or (one of) its output(s);
a parameter is a value controlling the algorithm;

Parameters & configurationLink

As we run luigi using the command-line tool, and the constructed workflow can be made of many tasks each with several parameters, we use TOML configuration files to define them. In addition to the TOML configuration file, the romi_run_task script requires the definition of two values:

the name of the ROMI task to run;
the name of the Scan dataset on which to run the ROMI task;

As we will see later, each "computational task" has an upstream task. Using these tasks dependencies luigi will create the required workflow. Hence you do not have to know or defines the required workflow to run a task, just call it and luigi will do the rest for you!

Target subclassLink

In order to check for a task requirement(s) or handle its output(s) luigi implement the Target class. As we use our own Python database implementation FSDB from plantdb, and it implements the concept of a set of files as a Fileset class, we subclassed the luigi.Target class as TargetFileset. It is thus used to get/create files from the FSDB database by our luigi.Task subclasses.

For example, the raw RGB images set obtained after a Scan task is the 'images' Fileset.

Task subclassesLink

This is where the computation is done with the run() method and targets are controlled with the requires() & output() methods.

Note

ROMI tasks do not use a Scan dataset identifier as it is assumed that they only work on one Scan at a time.