cellmap_segmentation_challenge.utils package#
Submodules#
cellmap_segmentation_challenge.utils.crops module#
- cellmap_segmentation_challenge.utils.crops.fetch_manifest(url: str | URL, file_name: str, object: Self) tuple[str, ...] [source]#
- Parameters:
url (str | URL)
file_name (str)
object (Self)
- Return type:
tuple[str, …]
- class cellmap_segmentation_challenge.utils.crops.TestCropRow(id: int, dataset: str, class_label: str, voxel_size: tuple[float, ...], translation: tuple[float, ...], shape: tuple[int, ...])[source]#
Bases:
object
A dataclass representing a row in the test crop manifest file.
- Parameters:
id (int)
dataset (str)
class_label (str)
voxel_size (tuple[float, ...])
translation (tuple[float, ...])
shape (tuple[int, ...])
- id: int#
- dataset: str#
- class_label: str#
- voxel_size: tuple[float, ...]#
- translation: tuple[float, ...]#
- shape: tuple[int, ...]#
- cellmap_segmentation_challenge.utils.crops.fetch_test_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/test_crop_manifest.csv') tuple[TestCropRow, ...] [source]#
Fetch a test manifest file from a URL and return a tuple of TestCropRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of TestCropRow objects.
- Return type:
tuple[TestCropRow, …]
- class cellmap_segmentation_challenge.utils.crops.ZipDatasetRow(all_res: bool, padding: int, name: str, url: URL)[source]#
Bases:
object
A dataclass representing a row in the zip dataset manifest file.
- Parameters:
all_res (bool)
padding (int)
name (str)
url (URL)
- all_res: bool#
- padding: int#
- name: str#
- url: URL#
- cellmap_segmentation_challenge.utils.crops.fetch_zip_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/zip_manifest.csv') tuple[ZipDatasetRow, ...] [source]#
Fetch a manifest file from a URL and return a tuple of ZipDatasetRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of ZipDatasetRow objects.
- Return type:
tuple[ZipDatasetRow, …]
- class cellmap_segmentation_challenge.utils.crops.CropRow(id: int, dataset: str, alignment: str, gt_source: URL | TestCropRow, em_url: URL)[source]#
Bases:
object
A dataclass representing a row in the crop manifest file.
- Parameters:
id (int)
dataset (str)
alignment (str)
gt_source (URL | TestCropRow)
em_url (URL)
- id: int#
- dataset: str#
- alignment: str#
- gt_source: URL | TestCropRow#
- em_url: URL#
- cellmap_segmentation_challenge.utils.crops.fetch_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/manifest.csv') tuple[CropRow, ...] [source]#
Fetch a manifest file from a URL and return a tuple of CropRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of CropRow objects.
- Return type:
tuple[CropRow, …]
cellmap_segmentation_challenge.utils.dataloader module#
- cellmap_segmentation_challenge.utils.dataloader.get_dataloader(datasplit_path: str, classes: ~typing.Sequence[str], batch_size: int, input_array_info: ~typing.Mapping[str, ~typing.Sequence[int | float]] | None = None, target_array_info: ~typing.Mapping[str, ~typing.Sequence[int | float]] | None = None, spatial_transforms: ~typing.Mapping[str, ~typing.Any] | None = None, target_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( ToDtype(scale=False) Binarize(threshold=0) ), train_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( Normalize ToDtype(scale=True) NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), val_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( Normalize ToDtype(scale=True) NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), iterations_per_epoch: int = 1000, random_validation: bool = False, device: str | ~torch.device | None = None, use_mutual_exclusion: bool = False, weighted_sampler: bool = True, **kwargs) tuple[CellMapDataLoader, CellMapDataLoader] [source]#
Get the train and validation dataloaders.
This function gets the train and validation dataloaders for the given datasplit file, classes, batch size, array info, spatial transforms, iterations per epoch, number of workers, and device.
- Parameters:
datasplit_path (str) – Path to the datasplit file that defines the train/val split the dataloader should use.
classes (Sequence[str]) – List of classes to segment.
batch_size (int) – Batch size for the dataloader.
input_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the input.
target_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the target.
spatial_transforms (Optional[Mapping[str, any]]) –
Dictionary containing the spatial transformations to apply to the data. For example the dictionary could contain transformations like mirror, transpose, and rotate. spatial_transforms = {
# 3D
# Probability of applying mirror for each axis # Values range from 0 (no mirroring) to 1 (will always mirror) “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5, “z”: 0.5}},
# Specifies the axes that will be invovled in the trasposition “transpose”: {“axes”: [“x”, “y”, “z”]},
# Defines rotation range for each axis. # Rotation angle for each axis is randomly chosen within the specified range (-180, 180). “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180], “z”: [-180, 180]}},
# 2D (used when there is no z axis) # “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5}}, # “transpose”: {“axes”: [“x”, “y”]}, # “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180]}},
}
target_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the target values. Defaults to T.Compose([T.ToDtype(torch.float), Binarize()]) which converts the input masks to float32 and threshold at 0 (turning object ID’s into binary masks for use with binary cross entropy loss).
train_raw_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the raw values for training. Defaults to T.Compose([Normalize(), T.ToDtype(torch.float, scale=True), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0. This can be used to add augmentations such as random erasing, blur, noise, etc.
val_raw_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the raw values for validation. Defaults to T.Compose([Normalize(), T.ToDtype(torch.float, scale=True), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0.
iterations_per_epoch (int) – Number of iterations per epoch.
random_validation (bool) – Whether or not to randomize the validation data draws. Useful if not evaluating on the entire validation set everytime. Defaults to False.
device (Optional[str or torch.device]) – Device to use for training. If None, defaults to “cuda” if available, or “mps” if available, or “cpu”.
use_mutual_exclusion (bool) – Whether to use mutually exclusive class labels to infer non-present labels for the training data. Defaults to False.
weighted_sampler (bool) – Whether to weight sample draws based on the number of positive labels within a dataset. Defaults to True.
**kwargs (Any) – Additional keyword arguments to pass to the CellMapDataLoader.
- Returns:
Tuple containing the train and validation dataloaders.
- Return type:
tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]
cellmap_segmentation_challenge.utils.datasplit module#
- cellmap_segmentation_challenge.utils.datasplit.get_dataset_name(raw_path: str, search_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8') str [source]#
Get the name of the dataset from the raw path.
- Parameters:
raw_path (str)
search_path (str)
raw_name (str)
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.get_raw_path(crop_path: str, raw_name: str = 'em/fibsem-uint8', label: str = '') str [source]#
Get the path to the raw data for a given crop path.
- Parameters:
crop_path (str) – The path to the crop.
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
label (str, optional) – The label class at the crop_path, by default “”
- Returns:
The path to the raw data.
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.get_formatted_fields(path: str, base_path: str, fields: list[str]) dict[str, str] [source]#
Get the formatted fields from the path.
- Parameters:
path (str) – The path to get the fields from.
base_path (str) – The unformatted path to find the fields in.
fields (list[str]) – The fields to get from the path.
- Returns:
The formatted fields.
- Return type:
dict[str, str]
- cellmap_segmentation_challenge.utils.datasplit.get_s3_csv_string(path: str, classes: list[str], usage: str)[source]#
Get the csv string for a given dataset path, to be written to the datasplit csv file.
- Parameters:
path (str) – The path to the dataset.
classes (list[str]) – The classes present in the dataset.
usage (str) – The usage of the dataset (train or validate).
- Returns:
The csv string for the dataset.
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.get_csv_string(path: str, classes: list[str], usage: str, raw_name: str = 'em/fibsem-uint8', search_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}')[source]#
Get the csv string for a given dataset path, to be written to the datasplit csv file.
- Parameters:
path (str) – The path to the dataset.
classes (list[str]) – The classes present in the dataset.
usage (str) – The usage of the dataset (train or validate).
raw_name (str, optional) – The name of the raw data. Default is RAW_NAME.
search_path (str, optional) – The search path to use to find the datasets. Default is SEARCH_PATH.
- Returns:
The csv string for the dataset.
- Return type:
str
- cellmap_segmentation_challenge.utils.datasplit.make_s3_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], csv_path: str = 'datasplit.csv', dry_run: bool = False, **kwargs)[source]#
Make a datasplit csv file for the given classes and datasets.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
**kwargs (dict) – Additional keyword arguments will be unused. Kept for compatibility with make_datasplit_csv.
- cellmap_segmentation_challenge.utils.datasplit.make_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], search_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}', csv_path: str = 'datasplit.csv', dry_run: bool = False)[source]#
Make a datasplit csv file for the given classes and datasets.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
- cellmap_segmentation_challenge.utils.datasplit.get_dataset_counts(classes: list[str] = ['nuc', 'mito'], search_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}')[source]#
Get the counts of each class in each dataset.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
- Returns:
A dictionary of the counts of each class in each dataset.
- Return type:
dict
- cellmap_segmentation_challenge.utils.datasplit.get_tested_classes(csv_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/tested_classes.csv')[source]#
Get the classes that will be tested for the challenge.
- Parameters:
csv_path (str, optional) – The path to the csv file, by default “tested_classes.csv”
- Returns:
A list of the classes that have been tested.
- Return type:
list[str]
- cellmap_segmentation_challenge.utils.datasplit.get_class_relations(csv_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/classes.csv', named_classes: list[str] | None = None)[source]#
- Parameters:
csv_path (str)
named_classes (list[str] | None)
cellmap_segmentation_challenge.utils.fetch_data module#
- cellmap_segmentation_challenge.utils.fetch_data.copy_store(*, keys: Iterable[str], source_store: Store, dest_store: Store)[source]#
Iterate over the keys, copying them from the source store to the dest store
- Parameters:
keys (Iterable[str])
source_store (Store)
dest_store (Store)
- cellmap_segmentation_challenge.utils.fetch_data.partition_copy_store(*, keys, source_store, dest_store, batch_size, pool: ThreadPoolExecutor)[source]#
- Parameters:
pool (ThreadPoolExecutor)
- cellmap_segmentation_challenge.utils.fetch_data.get_store_url(store: BaseStore, path: str)[source]#
- Parameters:
store (BaseStore)
path (str)
- cellmap_segmentation_challenge.utils.fetch_data.get_chunk_keys(array: Array, region: tuple[slice, ...] = ()) Generator[str, None, None] [source]#
Get the keys for all the chunks in a Zarr array as a generator of strings. Returns keys relative to the path of the array.
copied with modifications from janelia-cellmap/fibsem-tools
- Parameters:
array (zarr.core.Array) – The zarr array to get the chunk keys from
region (tuple[slice, ...]) – The region in the zarr array get chunks keys from. Defaults to (), which will result in all the chunk keys being returned.
- Return type:
Generator[str, None, None]
- cellmap_segmentation_challenge.utils.fetch_data.read_group(path: str, **kwargs) Group [source]#
- Parameters:
path (str)
- Return type:
Group
- cellmap_segmentation_challenge.utils.fetch_data.subset_to_slice(outer_array, inner_array, force_nonempty=False) tuple[slice, ...] [source]#
- Return type:
tuple[slice, …]
- cellmap_segmentation_challenge.utils.fetch_data.resolve_em_url(em_source_root: URL, em_source_paths: list[str])[source]#
- Parameters:
em_source_root (URL)
em_source_paths (list[str])
- cellmap_segmentation_challenge.utils.fetch_data.parse_s3_url(s3_url: str) -> (<class 'str'>, <class 'str'>)[source]#
- Parameters:
s3_url (str)
- Return type:
(<class ‘str’>, <class ‘str’>)
cellmap_segmentation_challenge.utils.loss module#
- class cellmap_segmentation_challenge.utils.loss.CellMapLossWrapper(loss_fn: _Loss | _WeightedLoss, **kwargs)[source]#
Bases:
_Loss
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
loss_fn (_Loss | _WeightedLoss)
- forward(outputs: dict | Tensor, targets: dict | Tensor)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
outputs (dict | Tensor)
targets (dict | Tensor)
cellmap_segmentation_challenge.utils.security module#
- cellmap_segmentation_challenge.utils.security.analyze_script(filepath)[source]#
Analyzes the script at filepath using ast for potentially unsafe imports and function calls. Returns a boolean indicating whether the script is safe and a list of detected issues.
cellmap_segmentation_challenge.utils.submission module#
- cellmap_segmentation_challenge.utils.submission.save_numpy_class_labels_to_zarr(save_path, test_volume_name, label_name, labels, overwrite=False, attrs=None)[source]#
Save a single 3D numpy array of class labels to a Zarr-2 file with the required structure.
- Parameters:
save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (str) – The names of the labels.
labels (np.ndarray) – A 3D numpy array of class labels.
overwrite (bool) – Whether to overwrite the Zarr-2 file if it already exists.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.
- Example usage:
# Generate random class labels, with 0 as background labels = np.random.randint(0, 4, (128, 128, 128)) save_numpy_labels_to_zarr(‘submission.zarr’, ‘test_volume’, [‘label1’, ‘label2’, ‘label3’], labels)
- cellmap_segmentation_challenge.utils.submission.save_numpy_class_arrays_to_zarr(save_path, test_volume_name, label_names, labels, mode='append', attrs=None)[source]#
Save a list of 3D numpy arrays of binary or instance labels to a Zarr-2 file with the required structure.
- Parameters:
save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (list) – A list of label names corresponding to the list of 3D numpy arrays.
labels (list) – A list of 3D numpy arrays of binary labels.
mode (str) – The mode to use when saving the Zarr-2 file. Options are ‘append’ or ‘overwrite’.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.
- Example usage:
label_names = [‘label1’, ‘label2’, ‘label3’] # Generate random binary volumes for each label labels = [np.random.randint(0, 2, (128, 128, 128)) for _ in range len(label_names)] save_numpy_binary_to_zarr(‘submission.zarr’, ‘test_volume’, label_names, labels)
- cellmap_segmentation_challenge.utils.submission.zip_submission(zarr_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/submission.zarr')[source]#
(Re-)Zip a submission zarr file.
- Parameters:
zarr_path (str | UPath) – The path to the submission zarr file (ending with <filename>.zarr). .zarr will be replaced with .zip.
- cellmap_segmentation_challenge.utils.submission.package_crop(crop, zarr_group, overwrite, input_search_path='/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/processed/{dataset}.zarr/{crop}')[source]#
- cellmap_segmentation_challenge.utils.submission.package_submission(input_search_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/processed/{dataset}.zarr/{crop}', output_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/submission.zarr', overwrite: bool = False, max_workers: int = 4)[source]#
Package a submission for the CellMap challenge. This will create a zarr file, combining all the processed volumes, and then zip it.
- Parameters:
input_search_path (str) – The base path to the processed volumes, with placeholders for dataset and crops.
output_path (str | UPath) – The path to save the submission zarr to. (ending with <filename>.zarr; .zarr will be appended if not present, and replaced with .zip when zipped).
overwrite (bool) – Whether to overwrite the submission zarr if it already exists.
max_workers (int) – The maximum number of workers to use for parallel processing. Defaults to the number of CPUs.
cellmap_segmentation_challenge.utils.utils module#
- cellmap_segmentation_challenge.utils.utils.format_coordinates(coordinates)[source]#
Format the coordinates to a string.
- Parameters:
coordinates (list) – List of coordinates.
- Returns:
Formatted string.
- Return type:
str
- cellmap_segmentation_challenge.utils.utils.construct_test_crop_manifest(path_root: str, search_path: str = '{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}', write_path: str | None = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/test_crop_manifest.csv', verbose: bool = False) None | list[str] [source]#
Construct a manifest file for testing crops from a given path.
- Parameters:
path_root (str) – Path to the directory containing the datasets.
search_path (str, optional) – Format string to search for the crops. The default is “{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}”. The function assumes that the keys appear in the file tree in the following order: 1) “path_root”, 2) “dataset”, 3) “crop”, 4) “label”
write_path (str, optional) – Path to write the manifest file. The default is “test_crop_manifest.csv”.
verbose (bool, optional) – Print verbose output. The default is False.
- Return type:
None | list[str]
- cellmap_segmentation_challenge.utils.utils.construct_truth_dataset(path_root: str, search_path: str = '{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}', destination: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/ground_truth.zarr', write_path: str = '{crop}/{label}')[source]#
Construct a consolidated Zarr file for the groundtruth datasets, to use for evaluation.
- Parameters:
path_root (str) – Path to the directory containing the datasets.
search_path (str, optional) – Format string to search for the crops. The default is “{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}”. The function assumes that the keys appear in the file tree in the following order: 1) “path_root”, 2) “dataset”, 3) “crop”, 4) “label”
destination (str, optional) – Path to write the consolidated Zarr file. The default is “cellmap-segmentation-challenge/data/ground_truth.zarr”.
write_path (str, optional) – Format string to write the crops to within the destination Zarr. The default is “{crop}/{label}”.
- cellmap_segmentation_challenge.utils.utils.copy_gt(line, search_path, path_root, write_path, ground_truth)[source]#
- cellmap_segmentation_challenge.utils.utils.simulate_predictions_accuracy(true_labels, accuracy)[source]#
- cellmap_segmentation_challenge.utils.utils.perturb_instance_mask(true_labels, hd_target=None, accuracy=0.8)[source]#
Simulate a predicted instance segmentation mask with an approximate Hausdorff distance.
Parameters: - true_labels: np.ndarray
Ground-truth instance segmentation mask.
- hd_target: float | None
Desired approximate Hausdorff distance. If None, it will be calculated from the accuracy.
- accuracy: float
Desired accuracy of the perturbed mask.
Returns: - np.ndarray
Perturbed instance segmentation mask.
- cellmap_segmentation_challenge.utils.utils.format_string(string: str, format_kwargs: dict) str [source]#
Convenience function to format a string with only the keys present in both the stringand in the format_kwargs. When all keys in the format_kwargs are present in string (in brackets), the function will return string.format(**format_kwargs) exactly. When none of the keys in the format_kwargs are present in the string, the function will return the original string, without error.
- Parameters:
string (str) – The string to format.
format_kwargs (dict) – The dictionary of key-value pairs to format the string with.
- Returns:
The formatted string
- Return type:
str
Examples
format_string(“this/{thing}”, {}) # returns “this/{thing}” format_string(“this/{thing}”, {“thing”:”that”, “but”:”not this”}) # returns “this/that”
Module contents#
- class cellmap_segmentation_challenge.utils.CellMapLossWrapper(loss_fn: _Loss | _WeightedLoss, **kwargs)[source]#
Bases:
_Loss
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
loss_fn (_Loss | _WeightedLoss)
- forward(outputs: dict | Tensor, targets: dict | Tensor)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
outputs (dict | Tensor)
targets (dict | Tensor)
- class cellmap_segmentation_challenge.utils.TestCropRow(id: int, dataset: str, class_label: str, voxel_size: tuple[float, ...], translation: tuple[float, ...], shape: tuple[int, ...])[source]#
Bases:
object
A dataclass representing a row in the test crop manifest file.
- Parameters:
id (int)
dataset (str)
class_label (str)
voxel_size (tuple[float, ...])
translation (tuple[float, ...])
shape (tuple[int, ...])
- classmethod from_csv_row(row: str) Self [source]#
Create a CropRow object from a CSV row.
- Parameters:
row (str)
- Return type:
Self
- id: int#
- dataset: str#
- class_label: str#
- voxel_size: tuple[float, ...]#
- translation: tuple[float, ...]#
- shape: tuple[int, ...]#
- cellmap_segmentation_challenge.utils.analyze_script(filepath)[source]#
Analyzes the script at filepath using ast for potentially unsafe imports and function calls. Returns a boolean indicating whether the script is safe and a list of detected issues.
- cellmap_segmentation_challenge.utils.construct_test_crop_manifest(path_root: str, search_path: str = '{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}', write_path: str | None = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/test_crop_manifest.csv', verbose: bool = False) None | list[str] [source]#
Construct a manifest file for testing crops from a given path.
- Parameters:
path_root (str) – Path to the directory containing the datasets.
search_path (str, optional) – Format string to search for the crops. The default is “{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}”. The function assumes that the keys appear in the file tree in the following order: 1) “path_root”, 2) “dataset”, 3) “crop”, 4) “label”
write_path (str, optional) – Path to write the manifest file. The default is “test_crop_manifest.csv”.
verbose (bool, optional) – Print verbose output. The default is False.
- Return type:
None | list[str]
- cellmap_segmentation_challenge.utils.construct_truth_dataset(path_root: str, search_path: str = '{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}', destination: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/ground_truth.zarr', write_path: str = '{crop}/{label}')[source]#
Construct a consolidated Zarr file for the groundtruth datasets, to use for evaluation.
- Parameters:
path_root (str) – Path to the directory containing the datasets.
search_path (str, optional) – Format string to search for the crops. The default is “{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}”. The function assumes that the keys appear in the file tree in the following order: 1) “path_root”, 2) “dataset”, 3) “crop”, 4) “label”
destination (str, optional) – Path to write the consolidated Zarr file. The default is “cellmap-segmentation-challenge/data/ground_truth.zarr”.
write_path (str, optional) – Format string to write the crops to within the destination Zarr. The default is “{crop}/{label}”.
- cellmap_segmentation_challenge.utils.fetch_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/manifest.csv') tuple[CropRow, ...] [source]#
Fetch a manifest file from a URL and return a tuple of CropRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of CropRow objects.
- Return type:
tuple[CropRow, …]
- cellmap_segmentation_challenge.utils.fetch_test_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/test_crop_manifest.csv') tuple[TestCropRow, ...] [source]#
Fetch a test manifest file from a URL and return a tuple of TestCropRow objects.
- Parameters:
url (str or yarl.URL) – The URL to the manifest file.
- Returns:
A tuple of TestCropRow objects.
- Return type:
tuple[TestCropRow, …]
- cellmap_segmentation_challenge.utils.format_string(string: str, format_kwargs: dict) str [source]#
Convenience function to format a string with only the keys present in both the stringand in the format_kwargs. When all keys in the format_kwargs are present in string (in brackets), the function will return string.format(**format_kwargs) exactly. When none of the keys in the format_kwargs are present in the string, the function will return the original string, without error.
- Parameters:
string (str) – The string to format.
format_kwargs (dict) – The dictionary of key-value pairs to format the string with.
- Returns:
The formatted string
- Return type:
str
Examples
format_string(“this/{thing}”, {}) # returns “this/{thing}” format_string(“this/{thing}”, {“thing”:”that”, “but”:”not this”}) # returns “this/that”
- cellmap_segmentation_challenge.utils.get_class_relations(csv_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/classes.csv', named_classes: list[str] | None = None)[source]#
- Parameters:
csv_path (str)
named_classes (list[str] | None)
- cellmap_segmentation_challenge.utils.get_dataloader(datasplit_path: str, classes: ~typing.Sequence[str], batch_size: int, input_array_info: ~typing.Mapping[str, ~typing.Sequence[int | float]] | None = None, target_array_info: ~typing.Mapping[str, ~typing.Sequence[int | float]] | None = None, spatial_transforms: ~typing.Mapping[str, ~typing.Any] | None = None, target_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( ToDtype(scale=False) Binarize(threshold=0) ), train_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( Normalize ToDtype(scale=True) NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), val_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( Normalize ToDtype(scale=True) NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), iterations_per_epoch: int = 1000, random_validation: bool = False, device: str | ~torch.device | None = None, use_mutual_exclusion: bool = False, weighted_sampler: bool = True, **kwargs) tuple[CellMapDataLoader, CellMapDataLoader] [source]#
Get the train and validation dataloaders.
This function gets the train and validation dataloaders for the given datasplit file, classes, batch size, array info, spatial transforms, iterations per epoch, number of workers, and device.
- Parameters:
datasplit_path (str) – Path to the datasplit file that defines the train/val split the dataloader should use.
classes (Sequence[str]) – List of classes to segment.
batch_size (int) – Batch size for the dataloader.
input_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the input.
target_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the target.
spatial_transforms (Optional[Mapping[str, any]]) –
Dictionary containing the spatial transformations to apply to the data. For example the dictionary could contain transformations like mirror, transpose, and rotate. spatial_transforms = {
# 3D
# Probability of applying mirror for each axis # Values range from 0 (no mirroring) to 1 (will always mirror) “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5, “z”: 0.5}},
# Specifies the axes that will be invovled in the trasposition “transpose”: {“axes”: [“x”, “y”, “z”]},
# Defines rotation range for each axis. # Rotation angle for each axis is randomly chosen within the specified range (-180, 180). “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180], “z”: [-180, 180]}},
# 2D (used when there is no z axis) # “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5}}, # “transpose”: {“axes”: [“x”, “y”]}, # “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180]}},
}
target_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the target values. Defaults to T.Compose([T.ToDtype(torch.float), Binarize()]) which converts the input masks to float32 and threshold at 0 (turning object ID’s into binary masks for use with binary cross entropy loss).
train_raw_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the raw values for training. Defaults to T.Compose([Normalize(), T.ToDtype(torch.float, scale=True), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0. This can be used to add augmentations such as random erasing, blur, noise, etc.
val_raw_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the raw values for validation. Defaults to T.Compose([Normalize(), T.ToDtype(torch.float, scale=True), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0.
iterations_per_epoch (int) – Number of iterations per epoch.
random_validation (bool) – Whether or not to randomize the validation data draws. Useful if not evaluating on the entire validation set everytime. Defaults to False.
device (Optional[str or torch.device]) – Device to use for training. If None, defaults to “cuda” if available, or “mps” if available, or “cpu”.
use_mutual_exclusion (bool) – Whether to use mutually exclusive class labels to infer non-present labels for the training data. Defaults to False.
weighted_sampler (bool) – Whether to weight sample draws based on the number of positive labels within a dataset. Defaults to True.
**kwargs (Any) – Additional keyword arguments to pass to the CellMapDataLoader.
- Returns:
Tuple containing the train and validation dataloaders.
- Return type:
tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]
- cellmap_segmentation_challenge.utils.get_test_crops() tuple[CropRow, ...] [source]#
- Return type:
tuple[CropRow, …]
- cellmap_segmentation_challenge.utils.get_tested_classes(csv_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/tested_classes.csv')[source]#
Get the classes that will be tested for the challenge.
- Parameters:
csv_path (str, optional) – The path to the csv file, by default “tested_classes.csv”
- Returns:
A list of the classes that have been tested.
- Return type:
list[str]
- cellmap_segmentation_challenge.utils.load_safe_config(config_path, force_safe=False)[source]#
Loads the configuration script at config_path after verifying its safety. If force_safe is True, raises an error if the script is deemed unsafe.
- cellmap_segmentation_challenge.utils.make_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], search_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}', csv_path: str = 'datasplit.csv', dry_run: bool = False)[source]#
Make a datasplit csv file for the given classes and datasets.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
- cellmap_segmentation_challenge.utils.make_s3_datasplit_csv(classes: list[str] = ['nuc', 'mito'], force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], csv_path: str = 'datasplit.csv', dry_run: bool = False, **kwargs)[source]#
Make a datasplit csv file for the given classes and datasets.
- Parameters:
classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
**kwargs (dict) – Additional keyword arguments will be unused. Kept for compatibility with make_datasplit_csv.
- cellmap_segmentation_challenge.utils.package_submission(input_search_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/processed/{dataset}.zarr/{crop}', output_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/submission.zarr', overwrite: bool = False, max_workers: int = 4)[source]#
Package a submission for the CellMap challenge. This will create a zarr file, combining all the processed volumes, and then zip it.
- Parameters:
input_search_path (str) – The base path to the processed volumes, with placeholders for dataset and crops.
output_path (str | UPath) – The path to save the submission zarr to. (ending with <filename>.zarr; .zarr will be appended if not present, and replaced with .zip when zipped).
overwrite (bool) – Whether to overwrite the submission zarr if it already exists.
max_workers (int) – The maximum number of workers to use for parallel processing. Defaults to the number of CPUs.
- cellmap_segmentation_challenge.utils.perturb_instance_mask(true_labels, hd_target=None, accuracy=0.8)[source]#
Simulate a predicted instance segmentation mask with an approximate Hausdorff distance.
Parameters: - true_labels: np.ndarray
Ground-truth instance segmentation mask.
- hd_target: float | None
Desired approximate Hausdorff distance. If None, it will be calculated from the accuracy.
- accuracy: float
Desired accuracy of the perturbed mask.
Returns: - np.ndarray
Perturbed instance segmentation mask.
- cellmap_segmentation_challenge.utils.save_numpy_class_arrays_to_zarr(save_path, test_volume_name, label_names, labels, mode='append', attrs=None)[source]#
Save a list of 3D numpy arrays of binary or instance labels to a Zarr-2 file with the required structure.
- Parameters:
save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (list) – A list of label names corresponding to the list of 3D numpy arrays.
labels (list) – A list of 3D numpy arrays of binary labels.
mode (str) – The mode to use when saving the Zarr-2 file. Options are ‘append’ or ‘overwrite’.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.
- Example usage:
label_names = [‘label1’, ‘label2’, ‘label3’] # Generate random binary volumes for each label labels = [np.random.randint(0, 2, (128, 128, 128)) for _ in range len(label_names)] save_numpy_binary_to_zarr(‘submission.zarr’, ‘test_volume’, label_names, labels)
- cellmap_segmentation_challenge.utils.save_numpy_class_labels_to_zarr(save_path, test_volume_name, label_name, labels, overwrite=False, attrs=None)[source]#
Save a single 3D numpy array of class labels to a Zarr-2 file with the required structure.
- Parameters:
save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (str) – The names of the labels.
labels (np.ndarray) – A 3D numpy array of class labels.
overwrite (bool) – Whether to overwrite the Zarr-2 file if it already exists.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.
- Example usage:
# Generate random class labels, with 0 as background labels = np.random.randint(0, 4, (128, 128, 128)) save_numpy_labels_to_zarr(‘submission.zarr’, ‘test_volume’, [‘label1’, ‘label2’, ‘label3’], labels)