cellmap_segmentation_challenge.utils.datasplit

cellmap_segmentation_challenge.utils.datasplit#

Functions

get_csv_string(path, classes, usage[, raw_name])

Get the csv string for a given dataset path, to be written to the datasplit csv file.

get_dataset_counts([classes, search_path, ...])

Get the counts of each class in each dataset.

get_dataset_name(raw_path[, search_path, ...])

Get the name of the dataset from the raw path.

get_raw_path(crop_path[, raw_name, label])

Get the path to the raw data for a given crop path.

make_datasplit_csv([classes, ...])

Make a datasplit csv file for the given classes and datasets.

cellmap_segmentation_challenge.utils.datasplit.get_dataset_name(raw_path: str, search_path: str = '/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8') str[source]#

Get the name of the dataset from the raw path.

Parameters:
  • raw_path (str)

  • search_path (str)

  • raw_name (str)

Return type:

str

cellmap_segmentation_challenge.utils.datasplit.get_raw_path(crop_path: str, raw_name: str = 'em/fibsem-uint8', label: str = '') str[source]#

Get the path to the raw data for a given crop path.

Parameters:
  • crop_path (str) – The path to the crop.

  • raw_name (str, optional) – The name of the raw data, by default RAW_NAME

  • label (str, optional) – The label class at the crop_path, by default “”

Returns:

The path to the raw data.

Return type:

str

cellmap_segmentation_challenge.utils.datasplit.get_csv_string(path: str, classes: list[str], usage: str, raw_name: str = 'em/fibsem-uint8')[source]#

Get the csv string for a given dataset path, to be written to the datasplit csv file.

Parameters:
  • path (str) – The path to the dataset.

  • classes (list[str]) – The classes present in the dataset.

  • usage (str) – The usage of the dataset (train or validate).

  • raw_name (str, optional) – The name of the raw data. Default is RAW_NAME.

Returns:

The csv string for the dataset.

Return type:

str

cellmap_segmentation_challenge.utils.datasplit.make_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], search_path: str = '/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}', csv_path: str = 'datasplit.csv', dry_run: bool = False)[source]#

Make a datasplit csv file for the given classes and datasets.

Parameters:
  • classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]

  • force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.

  • validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1

  • datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets

  • crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.

  • search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH

  • raw_name (str, optional) – The name of the raw data, by default RAW_NAME

  • crop_name (str, optional) – The name of the crop, by default CROP_NAME

  • csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”

  • dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False

cellmap_segmentation_challenge.utils.datasplit.get_dataset_counts(classes: list[str] = ['nuc', 'mito'], search_path: str = '/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}')[source]#

Get the counts of each class in each dataset.

Parameters:
  • classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]

  • search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH

  • raw_name (str, optional) – The name of the raw data, by default RAW_NAME

  • crop_name (str, optional) – The name of the crop, by default CROP_NAME

Returns:

A dictionary of the counts of each class in each dataset.

Return type:

dict