Generate large "virtual" training data for machine-learningLink
How to use the Virtual Plant Imager to generate a large dataset of virtual plant for machine learning purposes
ObjectiveLink
Working with virtual plants instead of real ones makes data acquisition inexpensive and has the advantage to parametrize the type of data. By design, ground truth data can be easily extracted from virtual datasets for evaluation purposes and building machine learning models. The Virtual Plant Imager is designed two address these two issues. After reading this tutorial, you should be able to generate a single virtual plant dataset in order to evaluate the robustness of plant-3d-vision.
PrerequisiteLink
If it is not already done, you must be able to build and run the docker image by following the instructions.
Step-by-step tutorialLink
Principle: Technically, the Virtual Plant Imager
relies on Blender v2.93 to generate the images of 3d model of the plants.
The 3d model can be provided as an input or can be also generated by lpy based on biological rules.
An HTTP server acts as an interface to drive Blender generation scripts.
1. Preparing your scan dataLink
First, you have to create a working database on your host machine, let's say home/host/path/database_example
.
You can find an example of this database here.
You can obtain sample data for the scanner here, and put it in the data folder.
wget https://db.romi-project.eu/models/arabidopsis_data.zip
unzip arabidopsis_data.zip -d data
To use custom data, it must consist in .obj
file, in which each type of organ corresponds to a distinct mesh.
This mesh must have a single material whose name is the name of the organ.
The data dir must contain the obj
and mtl
files.
Additionally, background HDRI files can be downloaded from hdri haven.
Download .hdr
files and put them in the hdri
folder.
2. Generating a large dataset for machine learning purposesLink
After preparing your working database directory. You have to run the docker container with the database mounted.
cd plant-imager/docker
./run.sh -db /home/host/path/database_example # This will map to `db` directory located in the the docker's user home
To generate a large dataset, you have to run the script generate_dataset.py
by passing the config file and the output folder.
(lpyEnv) user@5c9e389f223d python generate_dataset.py plant-imager/configs/vscan_lpy_blender.toml db/learning_set
After a while, and if the generation succeeded the learning_set
folder will be populated by virtual plants.