# Generate large "virtual" training data for machine-learningLink

How to use the Virtual Plant Imager to generate a large dataset of virtual plant for machine learning purposes

Working with virtual plants instead of real ones makes data acquisition inexpensive and has the advantage to parametrize the type of data. By design, ground truth data can be easily extracted from virtual datasets for evaluation purposes and building machine learning models. The Virtual Plant Imager is designed two address these two issues. After reading this tutorial, you should be able to generate a single virtual plant dataset in order to evaluate the robustness of plant-3d-vision.

If it is not already done, you must be able to build and run the docker image by following the instructions.

Principle: Technically, the Virtual Plant Imager relies on Blender v2.81a to generate the images of 3d model of the plants. The 3d model can be provided as an input or can be also generated by lpy based on biological rules. An Http server acts as an interface to drive Blender generation scripts.

First, you have to create a working database on your host machine, let's say home/host/path/database_example. You can find an example of this database here.

You can obtain sample data for the scanner here, and put it in the data folder.

wget https://db.romi-project.eu/models/arabidopsis_data.zip
unzip arabidopsis_data.zip -d data


To use custom data, it must consist in .obj file, in which each type of organ corresponds to a distinct mesh. This mesh must have a single material whose name is the name of the organ. The data dir must contain the obj and mtl files.

Additionally, background HDRI files can be downloaded from hdri haven. Download .hdr files and put them in the hdri folder.

### 2. Generating a large dataset for machine learning purposesLink

After preparing your working database directory. You have to run the docker container with the database mounted.

cd plant-imager/docker
./run.sh -db /home/host/path/database_example  # This will map to db directory located in the the docker's user home


To generate a large dataset, you have to run the script generate_dataset.py by passing the config file and the output folder.

(lpyEnv) user@5c9e389f223d  python generate_dataset.py plant-imager/config/vscan_lpy_blender.toml db/learning_set


After a while, and if the generation succeeded the learning_set folder will be populated by virtual plants.