Evaluating the repeatability of a reconstruction and quantification pipelineLink

ObjectiveLink

The reconstruction and quantification pipeline aims to convert a series of RGB images (output of the plant-imager) into a 3D object, here a plant, with the ultimate goal to obtain quantitative phenotypic information.

It is possible to create multiple pipelines as they are composed of a sequence of different tasks, each having a specific function. Some algorithms used in these tasks may be stochastic, hence their output might vary even tough we provide the same input. As it can impact the subsequent quantification, it is of interest to be able to identify the sources of variability and to quantify them.

Mainly, two things can be evaluated: * using a dedicated metric (e.g. chamfer distance on point-clouds), quantify the differences in the outputs of a repeated task * quantify the final repercussion on the extracted phenotypic traits

PrerequisiteLink

Create and activate an isolated Python environment (see the procedure here )
Install romi plant-3d-vision (from source or using a docker image) & read install procedure

CLI overviewLink

The robustness_evaluation script has been developed to quantify variability in the reconstruction and quantification pipeline. It may be used to test the variability of a specific task or of the full reconstruction (and quantification) pipeline.

Basically it compares outputs of a task given the same input (previous task output or acquisition output depending on the mode) on a fixed parameterizable number of replicates.

robustness_evaluation -h

usage: robustness_evaluation [-h] [--config CONFIG] [-n N_REPLICATES] [-c] [-f] [-np]
                             [--suffix SUFFIX] [--eval-db EVAL_DB] [--date-fmt DATE_FMT]
                             [--no-date] [--models MODELS]
                             [--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}]
                             task [dataset_path ...]

Task robustness evaluation.

Evaluating the robustness of a task from the Reconstruction & Quantification (R&Q) pipeline is made as follows:
 1. copy the selected scan dataset in a temporary database (and clean it from previous R&Q if necessary)
 2. run the R&Q pipeline up to the upstream task of the task to evaluate, if any upstream task exist
 3. replicate (copy) this result to an evaluation database as many times as requested (by the `-n` option, defaults to `30`)
 4. run the task to evaluate on each replicated scan dataset
 5. compare the directories and files of the task to evaluate pair by pair
 6. use the comparison metrics for the task to evaluate, as defined in `robustness_evaluation.json` 

Please note that:
 - at step 3, a _replicate id_ is appended to the replicated scan dataset name
 - directory comparisons are done at the scale of the files generated by the selected task.
 - we use metrics to get a quantitative comparison on the output of the task.
 - it is possible to create fully independent repetitions by running the whole R&Q pipeline on each scan dataset using the `--full-pipe` option (or the shorter `-f`).
 - in order to use the ML-based R&Q pipeline, you will have to:
   1. create an output directory
   2. use the `--models` argument to copy the CNN trained models

positional arguments:
  task                  Task to evaluate, should be in: AnglesAndInternodes,
                        ClusteredMesh, Colmap, CurveSkeleton, ExtrinsicCalibration,
                        IntrinsicCalibration, Masks, OrganSegmentation, PointCloud,
                        Segmentation2D, Segmentation2d, SegmentedPointCloud, TreeGraph,
                        TriangleMesh, Undistorted, Voxels
  dataset_path          Path to scan dataset to use for task robustness evaluation.

optional arguments:
  -h, --help            show this help message and exit
  --config CONFIG       Path to the pipeline TOML configuration file.

Evaluation options:
  -n N_REPLICATES, --n_replicates N_REPLICATES
                        Number of replicates to create for task robustness evaluation.
                        Defaults to `30`.
  -c, --clean           Run a Clean task on scan dataset prior to duplication.
  -f, --full-pipe       Use this to run the whole pipeline independently for each
                        replicate. Else the task to evaluate is run on clones of the
                        results from the upstream task, if any.
  -np, --no-pipeline    Do not run the pipeline, only compare tasks outputs. Use with
                        `--eval-db` to rerun this code on an existing test evaluation
                        database!

Database options:
  --suffix SUFFIX       Suffix to append to the evaluation database directory to create.
  --eval-db EVAL_DB     Existing evaluation database location to use. Use with `-np` to
                        rerun this code on an existing test evaluation database!
  --date-fmt DATE_FMT   Datetime format to use as prefix for the name of the evaluation
                        database directory to create. Defaults to `DATETIME_FMT`.
  --no-date             Do not add the datetime as prefix to the name of the evaluation
                        database directory to create.

Other options:
  --models MODELS       Models database location to use with ML pipeline.
  --log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
                        Set message logging level. Defaults to `INFO`.

Detailed explanations here: https://docs.romi-project.eu/plant_imager/developer/pipeline_repeatability/

The metrics used are the same as the ones for an evaluation against a ground-truth

Step-by-step tutorialLink

1. Test a single taskLink

Example with the task TriangleMesh task, whose goal is to compute a mesh from a point-cloud:

robustness_evaluation TriangleMesh plant-3d-vision/tests/testdata/real_plant --config plant-3d-vision/configs/geom_pipe_real.toml -n 10

To summarize, the pipeline.toml configuration indicate the following order of tasks:

ImagesFilesetExists -> Colmap -> Undistorted -> Masks -> Voxels -> PointCloud -> TriangleMesh

The call to robustness_evaluation, as previously defined, should result in the following folder structure:

path/
├── 20210628123840_eval_TriangleMesh/
│   ├── my_scan_0/
│   ├── my_scan_1/
│   ├── my_scan_2/
│   ├── my_scan_3/
│   ├── my_scan_4/
│   ├── my_scan_5/
│   ├── my_scan_6/
│   ├── my_scan_7/
│   ├── my_scan_8/
│   ├── my_scan_9/
│   ├── filebyfile_comparison.json
│   ├── romidb
│   └── TriangleMesh_comparison.json
└── db/
    ├── my_scan/
    └── romidb

The scan datasets my_scan_* are identical up to PointCloud as they result from copies of the same temporary folder. Then the TriangleMesh task is run separately on each one. Quantitative results, using the appropriate metric(s), are in the TriangleMesh_comparison.json file.

2. Independent testsLink

If the goal is to evaluate the impact of stochasticity through the whole pipeline in the output of the TriangleMesh task, you should perform independent tests (run the whole pipeline each time) using the -f parameter:

robustness_evaluation TriangleMesh plant-3d-vision/tests/testdata/real_plant --config plant-3d-vision/configs/geom_pipe_real.toml -n 10 -f

This will yield a similar folder structure:

path/
├── 20210628123840_eval_TriangleMesh/
│   ├── my_scan_0/
│   ├── my_scan_1/
│   ├── my_scan_2/
│   ├── my_scan_3/
│   ├── my_scan_4/
│   ├── my_scan_5/
│   ├── my_scan_6/
│   ├── my_scan_7/
│   ├── my_scan_8/
│   ├── my_scan_9/
│   ├── filebyfile_comparison.json
│   ├── romidb
│   ├── TriangleMesh_comparison.json
└── db/
    ├── my_scan/
    └── romidb

Note

To run tests on an existing database the --eval-db parameter is configurable but be careful with it!