cellmap_segmentation_challenge.utils.datasplit#
Functions
|
Get the csv string for a given dataset path, to be written to the datasplit csv file. |
|
Get the counts of each class in each dataset. |
|
Get the name of the dataset from the raw path. |
|
Get the path to the raw data for a given crop path. |
|
Make a datasplit csv file for the given classes and datasets. |
- cellmap_segmentation_challenge.utils.datasplit.get_dataset_name(raw_path: str, search_path: str = '/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8') str [source]#
Get the name of the dataset from the raw path.
- Parameters:
raw_path (str)
search_path (str)
raw_name (str)
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.get_raw_path(crop_path: str, raw_name: str = 'em/fibsem-uint8', label: str = '') str [source]#
Get the path to the raw data for a given crop path.
- Parameters:
crop_path (str) – The path to the crop.
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
label (str, optional) – The label class at the crop_path, by default “”
- Returns:
The path to the raw data.
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.get_csv_string(path: str, classes: list[str], usage: str, raw_name: str = 'em/fibsem-uint8')[source]#
Get the csv string for a given dataset path, to be written to the datasplit csv file.
- Parameters:
path (str) – The path to the dataset.
classes (list[str]) – The classes present in the dataset.
usage (str) – The usage of the dataset (train or validate).
raw_name (str, optional) – The name of the raw data. Default is RAW_NAME.
- Returns:
The csv string for the dataset.
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.make_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], search_path: str = '/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}', csv_path: str = 'datasplit.csv', dry_run: bool = False)[source]#
Make a datasplit csv file for the given classes and datasets.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
- cellmap_segmentation_challenge.utils.datasplit.get_dataset_counts(classes: list[str] = ['nuc', 'mito'], search_path: str = '/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}')[source]#
Get the counts of each class in each dataset.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
- Returns:
A dictionary of the counts of each class in each dataset.
- Return type:
dict