Data
This page describes how to use the romi package plantdb
accessible here.
A shared example datasets is accessible here.
Getting startedLink
InstallationLink
Warning
If you intend to contribute to the development of plantdb
or want to be able to edit the code and test your changes, you should choose editable mode.
Non-editable modeLink
Install from GitHub using pip
:
pip install git+ssh://git@github.com/romi/plantdb.git#dev
Note
This uses ssh
and thus requires to be registered as part of the project and to deploy ssh keys.
Editable modeLink
Clone from GitHub and install using pip
:
git clone https://github.com/romi/plantdb.git
cd plantdb
pip install -e .
Minimal working exampleLink
Let's assume you have a list of images of a given object and that you want to add them to a ROMI database as a "scan".
1 - Initialize databaseLink
First create the directory for the database and add the romidb
marker to it:
from os.path import join
from tempfile import mkdtemp
mydb = mkdtemp(prefix='romidb_')
open(join(mydb, 'romidb'), 'w').close()
Now you can initialize a ROMI FSDB
database object:
from plantdb.fsdb import FSDB
db = FSDB(mydb)
db.connect() # Locks the database and allows access
2 - Create a new datasetLink
To create a new dataset, here named myscan_001
, do:
scan = db.create_scan("myscan_001")
To add scan metadata (eg. camera settings, biological metadata, hardware metadata...), do:
scan.set_metadata({"scanner": {"harware": 'test'}})
This will results in several changes in the local database:
- Add a
myscan_001
sub-directory in the database root directory; - Add a
metadata
sub-directory inmyscan_001
and ametadata.json
gathering the given scan metadata.
3 - Add images as new filesetLink
-
OPTIONAL - create a list of RGB images If you do not have a scan datasets available, either download a shared datasets here or generate a list of images as follows:
import numpy as np # Generate random noise images n_images = 99 imgs = [] for i in range(n_images): img = 256 * np.random.rand(256, 256, 3) img = np.array(img, dtype=np.uint8) imgs.append(img)
-
Create a new
fileset
:fileset = scan.create_fileset("images")
-
Add the images to the fileset: Load the file list (or skip if you generated random images):
from os import listdir imgs = listdir("</path/to/my/scan/images>")
-
Then loop the images list and add them to the
fileset
, optionally attach some metadata to each image:from plantdb import io for i, img in enumerate(imgs): file = fileset.create_file("%i"%i) io.write_image(file, img) file.set_metadata("key", "%i"%i)
This will results in several changes in the local database:
- Reference the image by its file name by adding an entry in
files.json
; - Write a
scan_img_1.jpeg
image in theimages
sub-directory of the scan"myscan"
. - Add an
images
sub-directory in themetadata
sub-directory, and JSON files with the imageid
as name to store the image metadata.
4 - Access image files in a filesetLink
To access the image files in a fileset (in a datasets, itself in an existing and accessible database), proceed as follows:
from plantdb.fsdb import FSDB
db = FSDB(mydb)
db.connect() # Locks the database and allows access
scan = db.get_scan("myscan")
fileset = scan.get_fileset("images")
for f in fileset.get_files():
im = io.read_image(f) # reads image data
print(f.get_metadata("key")) # i
db.disconnect()
ExamplesLink
from plantdb.fsdb import FSDB
from plantdb import io
import numpy as np
# Generate random noise images
n_images = 100
imgs = []
for i in range(n_images):
img = 256*np.random.rand(256, 256, 3)
img = np.array(img, dtype=np.uint8)
imgs.append(img)
from os import listdir
from os.path import join
from tempfile import mkdtemp
# Create a temporary DB directory:
mydb = mkdtemp(prefix='romidb_')
# Create the `romidb` file in previously created temporary DB directory:
open(join(mydb, 'romidb'), 'w').close()
listdir(mydb)
# Connect to the DB:
db = FSDB(mydb)
db.connect() # Locks the database and allows access
# Add a scan datasets to the DB:
scan = db.create_scan("myscan_001")
listdir(mydb)
# Add metadata to a scan datasets:
scan.set_metadata({"scanner": {"hardware": 'test'}})
listdir(join(mydb, "myscan_001"))
listdir(join(mydb, "myscan_001", "metadata"))
fileset = scan.create_fileset("images")
listdir(join(mydb, "myscan_001"))
for i, img in enumerate(imgs):
file = fileset.create_file("%i"%i)
io.write_image(file, img)
file.set_metadata("key", "%i"%i) # Add some metadata
# read files in the fileset:
scan = db.get_scan("myscan")
fileset = scan.get_fileset("images")
for f in fileset.get_files():
im = io.read_image(f) # reads image data
print(f.get_metadata("key")) # i
db.disconnect()
Database structureLink
OverviewLink
Hereafter we give an overview of the database structure using the ROMI database terminology:
plantdb_root/
├── dataset_001/
│ ├── fileset_A/
│ │ ├── file_A_001.ext
│ │ ├── [...]
│ │ └── file_A_009.ext
│ ├── fileset_B/
│ │ ├── file_B_001.ext
│ │ ├── [...]
│ │ └── file_B_009.ext
│ ├── metadata
│ │ ├── fileset_A.json
│ │ ├── fileset_B.json
│ │ └── metadata.json
│ └── files.json
├── dataset_002/
│ └── [...]
├── [...]
├── (lock)
└── romidb
Database root directoryLink
A root database directory is defined, eg. mydb/
.
Inside this directory we need to define (add) the romidb
marker, so it may be used by FSDB
class.
We may also find the lock
file used to limit the access to the database to only one user.
Note that the database initialization part is manual. To create them, in a terminal:
mkdir mydb
touch mydb/romidb
We just created the following tree structure:
mydb/
└── romidb
Once you have created this root directory and the romidb
marker file, you can initialize a ROMI FSDB
database object in Python:
from plantdb.fsdb import FSDB
db = FSDB("mydb")
db.connect()
The method FSDB.connect()
locks the database with a lock
file at root directory and allows access.
To disconnect and free the database do:
db.disconnect()
If for some reason the Python terminal unexpectedly terminate without a call to the disconnect
method, you may have to remove the lock
file manually.
Check that no one else is using the database!
Within this root database directory you will find other directories corresponding to datasets.
Datasets directoriesLink
At the next level, we find the datasets directory(s), eg. named myscan_001
.
Their names must be uniques, and you create them as follows:
scan = db.create_scan("myscan_001")
If you add scan metadata (eg. camera settings, biological metadata, hardware metadata...) with scan.set_metadata()
, you get another directory metadata
with a metadata.json
file.
scan.set_metadata({"scanner": {"hardware": 'test'}})
We now have the following tree structure:
mydb/
├── myscan_001/
│ └── metadata/
│ └── metadata.json
└── romidb
And the file metadata.json
should look like this:
{
"scanner": {
"hardware": "test"
}
}
Images directoriesLink
Inside myscan_001/
, we find the datasets or fileset in plantdb
terminology.
In the case of the "plant scanner", this is a list of RGB image files acquired by a camera moving around the plant.
To store the datasets, we thus name the created fileset "images":
fileset = scan.create_fileset("images")
This creates an images
directory and a files.json
at the dataset root directory.
We now have the following tree structure:
mydb/
├── myscan_001/
│ ├── images/
│ ├── metadata/
│ │ └── metadata.json
│ └── files.json
└── romidb
The JSON should look like this:
{
"filesets": [
{
"files": [],
"id": "images"
}
]
}
We then create random RGB images to add to the dataset:
import numpy as np
# Generate random noise images
n_images = 100
imgs = []
for i in range(n_images):
img = 256*np.random.rand(256, 256, 3)
img = np.array(img, dtype=np.uint8)
imgs.append(img)
And we add them with their metadata to the database:
for i, img in enumerate(imgs):
fname = f"img_{str(i).zfill(2)}.png"
file = fileset.create_file(fname)
io.write_image(file, img)
file.set_metadata("key", fname)
file.set_metadata("id", i)
Inside this images/
directory will reside the images added to the database.
At the same time you added images with REF_TO_TUTO, you created an entry in a JSON file referencing the files.
If you added metadata along with the files (eg. camera poses, jpeg metadata...) it should be referenced in metadata/images/
eg. metadata/images/<scan_img_01>.json
.
mydb/
├── myscan_001/
│ ├── files.json
│ ├── images/
│ │ ├── scan_img_01.jpg
│ │ ├── scan_img_02.jpg
│ │ ├── [...]
│ │ └── scan_img_99.jpg
│ ├── metadata/
│ │ ├── images
│ │ │ ├── scan_img_01.json
│ │ │ ├── scan_img_02.json
│ │ ├── [...]
│ │ │ └── scan_img_99.json
│ │ └── metadata.json
└── romidb
ExampleLink
mydb/
├── myscan_001/
│ ├── AnglesAndInternodes_1_0_2_0_0_1_dd8d67653a
│ │ └── AnglesAndInternodes.json
│ ├── Colmap_True____feature_extrac_3bbfcb1413
│ │ ├── cameras.json
│ │ ├── images.json
│ │ ├── points3d.json
│ │ └── sparse.ply
│ ├── CurveSkeleton_out__TriangleMesh_6a92751c20
│ │ └── CurveSkeleton.json
│ ├── files.json
│ ├── images
│ │ ├── pict20190201_110110_0.jpg
│ │ ├── [...]
│ │ └── pict20190201_111209_0.jpg
│ ├── Masks_True_5_out_9adb9db801
│ │ ├── pict20190201_110110_0.jpg
│ │ ├── [...]
│ │ └── pict20190201_111209_0.jpg
│ ├── measures.csv
│ ├── metadata
│ │ ├── AnglesAndInternodes_1_0_2_0_0_1_dd8d67653a.json
│ │ ├── Colmap_True____feature_extrac_3bbfcb1413.json
│ │ ├── CurveSkeleton_out__TriangleMesh_6a92751c20.json
│ │ ├── images
│ │ │ ├── pict20190201_110110_0.json
│ │ ├── [...]
│ │ │ └── pict20190201_111209_0.json
│ │ ├── images.json
│ │ ├── Masks_True_5_out_9adb9db801
│ │ │ ├── pict20190201_110110_0.json
│ │ ├── [...]
│ │ │ └── pict20190201_111209_0.json
│ │ ├── Masks_True_5_out_e90d1804eb.json
│ │ ├── metadata.json
│ │ ├── PointCloud_1_0_1_0_False_9ab5a15d9b
│ │ │ └── PointCloud.json
│ │ ├── PointCloud_1_0_1_0_False_9ab5a15d9b.json
│ │ ├── PointCloud__200_0_1_0_False_4ce2e46446.json
│ │ ├── TreeGraph_out__CurveSkeleton_5dca9a2821.json
│ │ ├── TriangleMesh_out__PointCloud_80dc94ac81.json
│ │ ├── Undistorted_out_____fb3e3fa0ff
│ │ │ ├── pict20190201_110110_0.json
│ │ ├── [...]
│ │ │ └── pict20190201_111209_0.json
│ │ ├── Undistorted_out_____fb3e3fa0ff.json
│ │ ├── Voxels_False____False_567dc7f48b
│ │ │ └── Voxels.json
│ │ ├── Voxels_False____False_567dc7f48b.json
│ │ ├── Voxels_False____True_af037e876e.json
│ │ └── Voxels_False____True_cd9a5ff06b.json
│ ├── pipeline.toml
│ ├── PointCloud_1_0_1_0_False_9ab5a15d9b
│ │ └── PointCloud.ply
│ ├── TreeGraph_out__CurveSkeleton_5dca9a2821
│ │ └── TreeGraph.p
│ ├── TriangleMesh_out__PointCloud_80dc94ac81
│ │ └── TriangleMesh.ply
│ └── Voxels_False____False_567dc7f48b
│ └── Voxels.npz
├── colmap_log.txt
├── lock
└── romidb