cellmap_segmentation_challenge.utils package#
Submodules#
cellmap_segmentation_challenge.utils.crops module#
- cellmap_segmentation_challenge.utils.crops.fetch_manifest(url: str | URL, file_name: str, object: Self) tuple[str, ...] [source]#
- Parameters:
url (str | URL)
file_name (str)
object (Self)
- Return type:
tuple[str, …]
- class cellmap_segmentation_challenge.utils.crops.TestCropRow(id: int, dataset: str, class_label: str, voxel_size: tuple[float, ...], translation: tuple[float, ...], shape: tuple[int, ...])[source]#
Bases:
object
A dataclass representing a row in the test crop manifest file.
- Parameters:
id (int)
dataset (str)
class_label (str)
voxel_size (tuple[float, ...])
translation (tuple[float, ...])
shape (tuple[int, ...])
- id: int#
- dataset: str#
- class_label: str#
- voxel_size: tuple[float, ...]#
- translation: tuple[float, ...]#
- shape: tuple[int, ...]#
- cellmap_segmentation_challenge.utils.crops.fetch_test_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/test_crop_manifest.csv') tuple[TestCropRow, ...] [source]#
Fetch a test manifest file from a URL and return a tuple of TestCropRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of TestCropRow objects.
- Return type:
tuple[TestCropRow, …]
- class cellmap_segmentation_challenge.utils.crops.ZipDatasetRow(all_res: bool, padding: int, name: str, url: URL)[source]#
Bases:
object
A dataclass representing a row in the zip dataset manifest file.
- Parameters:
all_res (bool)
padding (int)
name (str)
url (URL)
- all_res: bool#
- padding: int#
- name: str#
- url: URL#
- cellmap_segmentation_challenge.utils.crops.fetch_zip_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/zip_manifest.csv') tuple[ZipDatasetRow, ...] [source]#
Fetch a manifest file from a URL and return a tuple of ZipDatasetRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of ZipDatasetRow objects.
- Return type:
tuple[ZipDatasetRow, …]
- class cellmap_segmentation_challenge.utils.crops.CropRow(id: int, dataset: str, alignment: str, gt_source: URL | TestCropRow, em_url: URL)[source]#
Bases:
object
A dataclass representing a row in the crop manifest file.
- Parameters:
id (int)
dataset (str)
alignment (str)
gt_source (URL | TestCropRow)
em_url (URL)
- id: int#
- dataset: str#
- alignment: str#
- gt_source: URL | TestCropRow#
- em_url: URL#
- cellmap_segmentation_challenge.utils.crops.fetch_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/manifest.csv') tuple[CropRow, ...] [source]#
Fetch a manifest file from a URL and return a tuple of CropRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of CropRow objects.
- Return type:
tuple[CropRow, …]
cellmap_segmentation_challenge.utils.dataloader module#
- cellmap_segmentation_challenge.utils.dataloader.get_dataloader(datasplit_path: str, classes: Sequence[str], batch_size: int, array_info: Mapping[str, Sequence[int | float]] | None = None, input_array_info: Mapping[str, Sequence[int | float]] | None = None, target_array_info: Mapping[str, Sequence[int | float]] | None = None, spatial_transforms: Mapping[str, Any] | None = None, iterations_per_epoch: int = 1000, random_validation: bool = False, device: str | device | None = None) tuple[CellMapDataLoader, CellMapDataLoader] [source]#
Get the train and validation dataloaders.
This function gets the train and validation dataloaders for the given datasplit file, classes, batch size, array info, spatial transforms, iterations per epoch, number of workers, and device.
- Parameters:
datasplit_path (str) – Path to the datasplit file that defines the train/val split the dataloader should use.
classes (Sequence[str]) – List of classes to segment.
batch_size (int) – Batch size for the dataloader.
array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the input and target. Either array_info or input_array_info & target_array_info must be provided.
input_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the input.
target_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the target.
spatial_transforms (Optional[Mapping[str, any]]) – Dictionary containing the spatial transformations to apply to the data. For example the dictionary could contain transformations like mirror, transpose, and rotate.
{ (spatial_transforms =) –
# 3D
# Probability of applying mirror for each axis # Values range from 0 (no mirroring) to 1 (will always mirror)
”mirror”: {“axes”: {“x”: 0.5, “y”: 0.5, “z”: 0.5}},
# Specifies the axes that will be invovled in the trasposition
”transpose”: {“axes”: [“x”, “y”, “z”]},
# Defines rotation range for each axis. # Rotation angle for each axis is randomly chosen within the specified range (-180, 180).
”rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180], “z”: [-180, 180]}},
# 2D (used when there is no z axis) # “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5}}, # “transpose”: {“axes”: [“x”, “y”]}, # “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180]}},
}
iterations_per_epoch (int)
random_validation (bool)
device (str | device | None)
- Return type:
tuple[CellMapDataLoader, CellMapDataLoader]
- iterations_per_epochint
Number of iterations per epoch.
- random_validationbool
Whether or not to randomize the validation data draws. Useful if not evaluating on the entire validation set everytime. Defaults to False.
- deviceOptional[str or torch.device]
Device to use for training. If None, defaults to “cuda” if available, or “mps” if available, or “cpu”.
- Returns:
Tuple containing the train and validation dataloaders.
- Return type:
tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]
- Parameters:
datasplit_path (str)
classes (Sequence[str])
batch_size (int)
array_info (Mapping[str, Sequence[int | float]] | None)
input_array_info (Mapping[str, Sequence[int | float]] | None)
target_array_info (Mapping[str, Sequence[int | float]] | None)
spatial_transforms (Mapping[str, Any] | None)
iterations_per_epoch (int)
random_validation (bool)
device (str | device | None)
cellmap_segmentation_challenge.utils.datasplit module#
- cellmap_segmentation_challenge.utils.datasplit.get_dataset_name(raw_path: str, search_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8') str [source]#
Get the name of the dataset from the raw path.
- Parameters:
raw_path (str)
search_path (str)
raw_name (str)
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.get_raw_path(crop_path: str, raw_name: str = 'em/fibsem-uint8', label: str = '') str [source]#
Get the path to the raw data for a given crop path.
- Parameters:
crop_path (str) – The path to the crop.
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
label (str, optional) – The label class at the crop_path, by default “”
- Returns:
The path to the raw data.
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.get_formatted_fields(path: str, base_path: str, fields: list[str]) dict[str, str] [source]#
Get the formatted fields from the path.
- Parameters:
path (str) – The path to get the fields from.
base_path (str) – The unformatted path to find the fields in.
fields (list[str]) – The fields to get from the path.
- Returns:
The formatted fields.
- Return type:
dict[str, str]
- cellmap_segmentation_challenge.utils.datasplit.get_s3_csv_string(path: str, classes: list[str], usage: str)[source]#
Get the csv string for a given dataset path, to be written to the datasplit csv file.
- Parameters:
path (str) – The path to the dataset.
classes (list[str]) – The classes present in the dataset.
usage (str) – The usage of the dataset (train or validate).
- Returns:
The csv string for the dataset.
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.get_csv_string(path: str, classes: list[str], usage: str, raw_name: str = 'em/fibsem-uint8', search_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}')[source]#
Get the csv string for a given dataset path, to be written to the datasplit csv file.
- Parameters:
path (str) – The path to the dataset.
classes (list[str]) – The classes present in the dataset.
usage (str) – The usage of the dataset (train or validate).
raw_name (str, optional) – The name of the raw data. Default is RAW_NAME.
search_path (str, optional) – The search path to use to find the datasets. Default is SEARCH_PATH.
- Returns:
The csv string for the dataset.
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.make_s3_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], csv_path: str = 'datasplit.csv', dry_run: bool = False, **kwargs)[source]#
Make a datasplit csv file for the given classes and datasets.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
**kwargs (dict) – Additional keyword arguments will be unused. Kept for compatibility with make_datasplit_csv.
- cellmap_segmentation_challenge.utils.datasplit.make_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], search_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}', csv_path: str = 'datasplit.csv', dry_run: bool = False)[source]#
Make a datasplit csv file for the given classes and datasets.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
- cellmap_segmentation_challenge.utils.datasplit.get_dataset_counts(classes: list[str] = ['nuc', 'mito'], search_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}')[source]#
Get the counts of each class in each dataset.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
- Returns:
A dictionary of the counts of each class in each dataset.
- Return type:
dict
- cellmap_segmentation_challenge.utils.datasplit.get_tested_classes(csv_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/cellmap_segmentation_challenge/utils/tested_classes.csv')[source]#
Get the classes that will be tested for the challenge.
- Parameters:
csv_path (str, optional) – The path to the csv file, by default “tested_classes.csv”
- Returns:
A list of the classes that have been tested.
- Return type:
list[str]
- cellmap_segmentation_challenge.utils.datasplit.get_class_relations(csv_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/cellmap_segmentation_challenge/utils/classes.csv', named_classes: list[str] | None = None)[source]#
- Parameters:
csv_path (str)
named_classes (list[str] | None)
cellmap_segmentation_challenge.utils.fetch_data module#
- cellmap_segmentation_challenge.utils.fetch_data.copy_store(*, keys: Iterable[str], source_store: Store, dest_store: Store)[source]#
Iterate over the keys, copying them from the source store to the dest store
- Parameters:
keys (Iterable[str])
source_store (Store)
dest_store (Store)
- cellmap_segmentation_challenge.utils.fetch_data.partition_copy_store(*, keys, source_store, dest_store, batch_size, pool: ThreadPoolExecutor)[source]#
- Parameters:
pool (ThreadPoolExecutor)
- cellmap_segmentation_challenge.utils.fetch_data.get_store_url(store: BaseStore, path: str)[source]#
- Parameters:
store (BaseStore)
path (str)
- cellmap_segmentation_challenge.utils.fetch_data.get_chunk_keys(array: Array, region: tuple[slice, ...] = ()) Generator[str, None, None] [source]#
Get the keys for all the chunks in a Zarr array as a generator of strings. Returns keys relative to the path of the array.
copied with modifications from janelia-cellmap/fibsem-tools
- Parameters:
array (zarr.core.Array) – The zarr array to get the chunk keys from
region (tuple[slice, ...]) – The region in the zarr array get chunks keys from. Defaults to (), which will result in all the chunk keys being returned.
- Return type:
Generator[str, None, None]
- cellmap_segmentation_challenge.utils.fetch_data.read_group(path: str, **kwargs) Group [source]#
- Parameters:
path (str)
- Return type:
Group
- cellmap_segmentation_challenge.utils.fetch_data.subset_to_slice(outer_array, inner_array) tuple[slice, ...] [source]#
- Return type:
tuple[slice, …]
- cellmap_segmentation_challenge.utils.fetch_data.resolve_em_url(em_source_root: URL, em_source_paths: list[str])[source]#
- Parameters:
em_source_root (URL)
em_source_paths (list[str])
- cellmap_segmentation_challenge.utils.fetch_data.parse_s3_url(s3_url: str) -> (<class 'str'>, <class 'str'>)[source]#
- Parameters:
s3_url (str)
- Return type:
(<class ‘str’>, <class ‘str’>)
cellmap_segmentation_challenge.utils.loss module#
- class cellmap_segmentation_challenge.utils.loss.CellMapLossWrapper(loss_fn: _Loss | _WeightedLoss, **kwargs)[source]#
Bases:
_Loss
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
loss_fn (_Loss | _WeightedLoss)
- forward(outputs: Tensor, target: Tensor)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
outputs (Tensor)
target (Tensor)
cellmap_segmentation_challenge.utils.security module#
- cellmap_segmentation_challenge.utils.security.analyze_script(filepath)[source]#
Analyzes the script at filepath using ast for potentially unsafe imports and function calls. Returns a boolean indicating whether the script is safe and a list of detected issues.
Module contents#
- class cellmap_segmentation_challenge.utils.CellMapLossWrapper(loss_fn: _Loss | _WeightedLoss, **kwargs)[source]#
Bases:
_Loss
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
loss_fn (_Loss | _WeightedLoss)
- forward(outputs: Tensor, target: Tensor)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
outputs (Tensor)
target (Tensor)
- class cellmap_segmentation_challenge.utils.TestCropRow(id: int, dataset: str, class_label: str, voxel_size: tuple[float, ...], translation: tuple[float, ...], shape: tuple[int, ...])[source]#
Bases:
object
A dataclass representing a row in the test crop manifest file.
- Parameters:
id (int)
dataset (str)
class_label (str)
voxel_size (tuple[float, ...])
translation (tuple[float, ...])
shape (tuple[int, ...])
- id: int#
- dataset: str#
- class_label: str#
- voxel_size: tuple[float, ...]#
- translation: tuple[float, ...]#
- shape: tuple[int, ...]#
- cellmap_segmentation_challenge.utils.analyze_script(filepath)[source]#
Analyzes the script at filepath using ast for potentially unsafe imports and function calls. Returns a boolean indicating whether the script is safe and a list of detected issues.
- cellmap_segmentation_challenge.utils.fetch_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/manifest.csv') tuple[CropRow, ...] [source]#
Fetch a manifest file from a URL and return a tuple of CropRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of CropRow objects.
- Return type:
tuple[CropRow, …]
- cellmap_segmentation_challenge.utils.fetch_test_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/test_crop_manifest.csv') tuple[TestCropRow, ...] [source]#
Fetch a test manifest file from a URL and return a tuple of TestCropRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of TestCropRow objects.
- Return type:
tuple[TestCropRow, …]
- cellmap_segmentation_challenge.utils.get_class_relations(csv_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/cellmap_segmentation_challenge/utils/classes.csv', named_classes: list[str] | None = None)[source]#
- Parameters:
csv_path (str)
named_classes (list[str] | None)
- cellmap_segmentation_challenge.utils.get_dataloader(datasplit_path: str, classes: Sequence[str], batch_size: int, array_info: Mapping[str, Sequence[int | float]] | None = None, input_array_info: Mapping[str, Sequence[int | float]] | None = None, target_array_info: Mapping[str, Sequence[int | float]] | None = None, spatial_transforms: Mapping[str, Any] | None = None, iterations_per_epoch: int = 1000, random_validation: bool = False, device: str | device | None = None) tuple[CellMapDataLoader, CellMapDataLoader] [source]#
Get the train and validation dataloaders.
This function gets the train and validation dataloaders for the given datasplit file, classes, batch size, array info, spatial transforms, iterations per epoch, number of workers, and device.
- Parameters:
datasplit_path (str) – Path to the datasplit file that defines the train/val split the dataloader should use.
classes (Sequence[str]) – List of classes to segment.
batch_size (int) – Batch size for the dataloader.
array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the input and target. Either array_info or input_array_info & target_array_info must be provided.
input_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the input.
target_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the target.
spatial_transforms (Optional[Mapping[str, any]]) – Dictionary containing the spatial transformations to apply to the data. For example the dictionary could contain transformations like mirror, transpose, and rotate.
{ (spatial_transforms =) –
# 3D
# Probability of applying mirror for each axis # Values range from 0 (no mirroring) to 1 (will always mirror)
”mirror”: {“axes”: {“x”: 0.5, “y”: 0.5, “z”: 0.5}},
# Specifies the axes that will be invovled in the trasposition
”transpose”: {“axes”: [“x”, “y”, “z”]},
# Defines rotation range for each axis. # Rotation angle for each axis is randomly chosen within the specified range (-180, 180).
”rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180], “z”: [-180, 180]}},
# 2D (used when there is no z axis) # “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5}}, # “transpose”: {“axes”: [“x”, “y”]}, # “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180]}},
}
iterations_per_epoch (int)
random_validation (bool)
device (str | device | None)
- Return type:
tuple[CellMapDataLoader, CellMapDataLoader]
- iterations_per_epochint
Number of iterations per epoch.
- random_validationbool
Whether or not to randomize the validation data draws. Useful if not evaluating on the entire validation set everytime. Defaults to False.
- deviceOptional[str or torch.device]
Device to use for training. If None, defaults to “cuda” if available, or “mps” if available, or “cpu”.
- Returns:
Tuple containing the train and validation dataloaders.
- Return type:
tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]
- Parameters:
datasplit_path (str)
classes (Sequence[str])
batch_size (int)
array_info (Mapping[str, Sequence[int | float]] | None)
input_array_info (Mapping[str, Sequence[int | float]] | None)
target_array_info (Mapping[str, Sequence[int | float]] | None)
spatial_transforms (Mapping[str, Any] | None)
iterations_per_epoch (int)
random_validation (bool)
device (str | device | None)
- cellmap_segmentation_challenge.utils.get_test_crops() tuple[CropRow, ...] [source]#
- Return type:
tuple[CropRow, …]
- cellmap_segmentation_challenge.utils.get_tested_classes(csv_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/cellmap_segmentation_challenge/utils/tested_classes.csv')[source]#
Get the classes that will be tested for the challenge.
- Parameters:
csv_path (str, optional) – The path to the csv file, by default “tested_classes.csv”
- Returns:
A list of the classes that have been tested.
- Return type:
list[str]
- cellmap_segmentation_challenge.utils.load_safe_config(config_path, force_safe=True)[source]#
Loads the configuration script at config_path after verifying its safety. If force_safe is True, raises an error if the script is deemed unsafe.
- cellmap_segmentation_challenge.utils.make_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], search_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}', csv_path: str = 'datasplit.csv', dry_run: bool = False)[source]#
Make a datasplit csv file for the given classes and datasets.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
- cellmap_segmentation_challenge.utils.make_s3_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], csv_path: str = 'datasplit.csv', dry_run: bool = False, **kwargs)[source]#
Make a datasplit csv file for the given classes and datasets.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
**kwargs (dict) – Additional keyword arguments will be unused. Kept for compatibility with make_datasplit_csv.