cellmap_segmentation_challenge.utils package

cellmap_segmentation_challenge.utils package#

Submodules#

cellmap_segmentation_challenge.utils.crops module#

cellmap_segmentation_challenge.utils.crops.fetch_manifest(url: str | URL, file_name: str, object: Self) → tuple[str, ...][source]#

Parameters:

url (str | URL)
file_name (str)
object (Self)

Return type:

tuple[str, …]

class cellmap_segmentation_challenge.utils.crops.TestCropRow(id: int, dataset: str, class_label: str, voxel_size: tuple[float, ...], translation: tuple[float, ...], shape: tuple[int, ...])[source]#

Bases: object

A dataclass representing a row in the test crop manifest file.

Parameters:

id (int)
dataset (str)
class_label (str)
voxel_size (tuple[float, ...])
translation (tuple[float, ...])
shape (tuple[int, ...])

id: int#

dataset: str#

class_label: str#

voxel_size: tuple[float, ...]#

translation: tuple[float, ...]#

shape: tuple[int, ...]#

classmethod from_csv_row(row: str) → Self[source]#

Create a CropRow object from a CSV row.

Parameters:: row (str)
Return type:: Self

cellmap_segmentation_challenge.utils.crops.fetch_test_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/test_crop_manifest.csv') → tuple[TestCropRow, ...][source]#

Fetch a test manifest file from a URL and return a tuple of TestCropRow objects.

Parameters:: url (str or yarl.URL) – The URL to the manifest file.
Returns:: A tuple of TestCropRow objects.
Return type:: tuple[TestCropRow, …]

class cellmap_segmentation_challenge.utils.crops.ZipDatasetRow(all_res: bool, padding: int, name: str, url: URL)[source]#

Bases: object

A dataclass representing a row in the zip dataset manifest file.

Parameters:

all_res (bool)
padding (int)
name (str)
url (URL)

all_res: bool#

padding: int#

name: str#

url: URL#

classmethod from_csv_row(row: str) → Self[source]#

Create a CropRow object from a CSV row.

Parameters:: row (str)
Return type:: Self

cellmap_segmentation_challenge.utils.crops.fetch_zip_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/zip_manifest.csv') → tuple[ZipDatasetRow, ...][source]#

Fetch a manifest file from a URL and return a tuple of ZipDatasetRow objects.

Parameters:: url (str or yarl.URL) – The URL to the manifest file.
Returns:: A tuple of ZipDatasetRow objects.
Return type:: tuple[ZipDatasetRow, …]

class cellmap_segmentation_challenge.utils.crops.CropRow(id: int, dataset: str, alignment: str, gt_source: URL | TestCropRow, em_url: URL)[source]#

Bases: object

A dataclass representing a row in the crop manifest file.

Parameters:

id (int)
dataset (str)
alignment (str)
gt_source (URL | TestCropRow)
em_url (URL)

id: int#

dataset: str#

alignment: str#

gt_source: URL | TestCropRow#

em_url: URL#

classmethod from_csv_row(row: str) → Self[source]#

Create a CropRow object from a CSV row.

Parameters:: row (str)
Return type:: Self

cellmap_segmentation_challenge.utils.crops.fetch_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/manifest.csv') → tuple[CropRow, ...][source]#

Fetch a manifest file from a URL and return a tuple of CropRow objects.

Parameters:: url (str or yarl.URL) – The URL to the manifest file.
Returns:: A tuple of CropRow objects.
Return type:: tuple[CropRow, …]

cellmap_segmentation_challenge.utils.crops.get_test_crops() → tuple[CropRow, ...][source]#

Return type:: tuple[CropRow, …]

cellmap_segmentation_challenge.utils.dataloader module#

cellmap_segmentation_challenge.utils.dataloader.get_dataloader(config: ~typing.Any | None = None, datasplit_path: str = './datasplit.csv', classes: ~typing.Sequence[str] | None = None, batch_size: int = 1, input_array_info: ~typing.Mapping[str, ~typing.Sequence[int | float]] | None = None, target_array_info: ~typing.Mapping[str, ~typing.Sequence[int | float]] | None = None, spatial_transforms: ~typing.Mapping[str, ~typing.Any] | None = None, target_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( ToDtype(scale=False) Binarize(threshold=0) ), train_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( ToDtype(scale=False) Normalize NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), val_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( ToDtype(scale=False) Normalize NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), iterations_per_epoch: int = 1000, random_validation: bool = False, device: str | ~torch.device | None = None, use_mutual_exclusion: bool = False, weighted_sampler: bool = True, **kwargs) → tuple[CellMapDataLoader, CellMapDataLoader | None][source]#

Get the train and validation dataloaders.

This function gets the train and validation dataloaders for the given datasplit file, classes, batch size, array info, spatial transforms, iterations per epoch, number of workers, and device.

Parameters:

config (Optional[Any]) – Optional configuration object that can be used instead of the keyword arguments.
datasplit_path (str) – Path to the datasplit file that defines the train/val split the dataloader should use. Default is “./datasplit.csv”.
classes (Sequence[str] | None) – List of classes to segment. If None, assumes training on raw data. Default is None.
batch_size (int) – Batch size for the dataloader. Defaults to 1.
input_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the input.
target_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the target.
spatial_transforms (Optional[Mapping[str, any]]) –
Dictionary containing the spatial transformations to apply to the data. For example the dictionary could contain transformations like mirror, transpose, and rotate. spatial_transforms = {

# 3D

# Probability of applying mirror for each axis # Values range from 0 (no mirroring) to 1 (will always mirror) “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5, “z”: 0.5}},

# Specifies the axes that will be invovled in the trasposition “transpose”: {“axes”: [“x”, “y”, “z”]},

# Defines rotation range for each axis. # Rotation angle for each axis is randomly chosen within the specified range (-180, 180). “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180], “z”: [-180, 180]}},

# 2D (used when there is no z axis) # “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5}}, # “transpose”: {“axes”: [“x”, “y”]}, # “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180]}},

}
target_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the target values. Defaults to T.Compose([T.ToDtype(torch.float), Binarize()]) which converts the input masks to float32 and threshold at 0 (turning object ID’s into binary masks for use with binary cross entropy loss).
train_raw_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the raw values for training. Defaults to T.Compose([T.ToDtype(torch.float), Normalize(), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0. This can be used to add augmentations such as random erasing, blur, noise, etc.
val_raw_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the raw values for validation. Defaults to T.Compose([T.ToDtype(torch.float), Normalize(), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0.
iterations_per_epoch (int) – Number of iterations per epoch.
random_validation (bool) – Whether or not to randomize the validation data draws. Useful if not evaluating on the entire validation set everytime. Defaults to False.
device (Optional[str or torch.device]) – Device to use for training. If None, defaults to “cuda” if available, or “mps” if available, or “cpu”.
use_mutual_exclusion (bool) – Whether to use mutually exclusive class labels to infer non-present labels for the training data. Defaults to False.
weighted_sampler (bool) – Whether to weight sample draws based on the number of positive labels within a dataset. Defaults to True.
**kwargs (Any) – Additional keyword arguments to pass to the CellMapDataLoader.

Returns:

Tuple containing the train and validation dataloaders.

Return type:

tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]

cellmap_segmentation_challenge.utils.datasplit module#

cellmap_segmentation_challenge.utils.datasplit.get_dataset_name(raw_path: str, search_path: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8') → str[source]#

Get the name of the dataset from the raw path.

Parameters:

raw_path (str)
search_path (str)
raw_name (str)

Return type:

str

cellmap_segmentation_challenge.utils.datasplit.get_raw_path(crop_path: str, raw_name: str = 'em/fibsem-uint8', label: str = '') → str[source]#

Get the path to the raw data for a given crop path.

Parameters:

crop_path (str) – The path to the crop.
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
label (str, optional) – The label class at the crop_path, by default “”

Returns:

The path to the raw data.

Return type:

str

cellmap_segmentation_challenge.utils.datasplit.get_formatted_fields(path: str, base_path: str, fields: list[str]) → dict[str, str][source]#

Get the formatted fields from the path.

Parameters:

path (str) – The path to get the fields from.
base_path (str) – The unformatted path to find the fields in.
fields (list[str]) – The fields to get from the path.

Returns:

The formatted fields.

Return type:

dict[str, str]

cellmap_segmentation_challenge.utils.datasplit.get_s3_csv_string(path: str, classes: list[str], usage: str)[source]#

Get the csv string for a given dataset path, to be written to the datasplit csv file.

Parameters:

path (str) – The path to the dataset.
classes (list[str]) – The classes present in the dataset.
usage (str) – The usage of the dataset (train or validate).

Returns:

The csv string for the dataset.

Return type:

str

cellmap_segmentation_challenge.utils.datasplit.get_csv_string(path: str, classes: list[str], usage: str, raw_name: str = 'em/fibsem-uint8', search_path: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}')[source]#

Get the csv string for a given dataset path, to be written to the datasplit csv file.

Parameters:

path (str) – The path to the dataset.
classes (list[str]) – The classes present in the dataset.
usage (str) – The usage of the dataset (train or validate).
raw_name (str, optional) – The name of the raw data. Default is RAW_NAME.
search_path (str, optional) – The search path to use to find the datasets. Default is SEARCH_PATH.

Returns:

The csv string for the dataset.

Return type:

str

cellmap_segmentation_challenge.utils.datasplit.make_s3_datasplit_csv(classes: list[str] = ['nuc', 'mito'], scale: float | list[float] = None, force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], csv_path: str = 'datasplit.csv', dry_run: bool = False, **kwargs)[source]#

Make a datasplit csv file for the given classes and datasets.

Parameters:

classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
scale (float | list[float], optional) – Single scalar or list of scalars defining resolution (scale) used to filter out crops that don’t have data at required scale. If only a scalar is specified, isotropic resolution is assumed. Default is not to filter data by resolution (e.g. None).
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
**kwargs (dict) – Additional keyword arguments will be unused. Kept for compatibility with make_datasplit_csv.

cellmap_segmentation_challenge.utils.datasplit.make_datasplit_csv(classes: list[str] = ['nuc', 'mito'], scale: float | list[float] = None, force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], search_path: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}', csv_path: str = 'datasplit.csv', dry_run: bool = False)[source]#

Make a datasplit csv file for the given classes and datasets.

Parameters:

classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
scale (float | list[float], optional) – Single scalar or list of scalars defining resolution (scale) used to filter out crops that don’t have data at required scale. If only a scalar is specified, isotropic resolution is assumed. Default is not to filter data by resolution (e.g. None).
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False

cellmap_segmentation_challenge.utils.datasplit.check_scale(dataset, scale)[source]#

Check if the dataset has the required scale.

Parameters:

dataset (zarr.Group) – The dataset to check.
scale (float | list[float]) – The required scale.

Returns:

True if the dataset has the required scale, False otherwise.

Return type:

bool

cellmap_segmentation_challenge.utils.datasplit.get_dataset_counts(classes: list[str] = ['nuc', 'mito'], search_path: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}')[source]#

Get the counts of each class in each dataset.

Parameters:

classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME

Returns:

A dictionary of the counts of each class in each dataset.

Return type:

dict

cellmap_segmentation_challenge.utils.datasplit.get_tested_classes(csv_path: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/tested_classes.csv')[source]#

Get the classes that will be tested for the challenge.

Parameters:: csv_path (str, optional) – The path to the csv file, by default “tested_classes.csv”
Returns:: A list of the classes that have been tested.
Return type:: list[str]

cellmap_segmentation_challenge.utils.datasplit.get_class_incl_ids(incl_ids_string)[source]#

cellmap_segmentation_challenge.utils.datasplit.get_class_relations(csv_path: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/classes.csv', named_classes: list[str] | None = None)[source]#

Parameters:

csv_path (str)
named_classes (list[str] | None)

cellmap_segmentation_challenge.utils.fetch_data module#

cellmap_segmentation_challenge.utils.fetch_data.copy_store(*, keys: Iterable[str], source_store: Store, dest_store: Store)[source]#

Iterate over the keys, copying them from the source store to the dest store

Parameters:

keys (Iterable[str])
source_store (Store)
dest_store (Store)

cellmap_segmentation_challenge.utils.fetch_data.partition_copy_store(*, keys, source_store, dest_store, batch_size, pool: ThreadPoolExecutor)[source]#

Parameters:: pool (ThreadPoolExecutor)

cellmap_segmentation_challenge.utils.fetch_data.get_store_url(store: BaseStore, path: str)[source]#

Parameters:

store (BaseStore)
path (str)

cellmap_segmentation_challenge.utils.fetch_data.get_chunk_keys(array: Array, region: tuple[slice, ...] = ()) → Generator[str, None, None][source]#

Get the keys for all the chunks in a Zarr array as a generator of strings. Returns keys relative to the path of the array.

copied with modifications from janelia-cellmap/fibsem-tools

Parameters:

array (zarr.core.Array) – The zarr array to get the chunk keys from
region (tuple[slice, ...]) – The region in the zarr array get chunks keys from. Defaults to (), which will result in all the chunk keys being returned.

Return type:

Generator[str, None, None]

cellmap_segmentation_challenge.utils.fetch_data.read_group(path: str, **kwargs) → Group[source]#

Parameters:: path (str)
Return type:: Group

cellmap_segmentation_challenge.utils.fetch_data.subset_to_slice(outer_array, inner_array, force_nonempty=False) → tuple[slice, ...][source]#

Return type:: tuple[slice, …]

cellmap_segmentation_challenge.utils.fetch_data.resolve_em_url(em_source_root: URL, em_source_paths: list[str])[source]#

Parameters:

em_source_root (URL)
em_source_paths (list[str])

cellmap_segmentation_challenge.utils.fetch_data.parse_s3_url(s3_url: str) -> (<class 'str'>, <class 'str'>)[source]#

Parameters:: s3_url (str)
Return type:: (<class ‘str’>, <class ‘str’>)

cellmap_segmentation_challenge.utils.fetch_data.download_file_with_progress(s3_url, local_filename)[source]#

cellmap_segmentation_challenge.utils.fetch_data.get_zip_if_available(crops, raw_padding, fetch_all_em_resolutions, zips_from_manifest)[source]#

cellmap_segmentation_challenge.utils.loss module#

class cellmap_segmentation_challenge.utils.loss.CellMapLossWrapper(loss_fn: _Loss | _WeightedLoss, **kwargs)[source]#

Bases: _Loss

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:: loss_fn (_Loss | _WeightedLoss)

calc_loss(outputs: Tensor, target: Tensor)[source]#

Parameters:

outputs (Tensor)
target (Tensor)

forward(outputs: dict | Tensor, targets: dict | Tensor)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

outputs (dict | Tensor)
targets (dict | Tensor)

cellmap_segmentation_challenge.utils.security module#

cellmap_segmentation_challenge.utils.security.analyze_script(filepath)[source]#: Analyzes the script at filepath using ast for potentially unsafe imports and function calls. Returns a boolean indicating whether the script is safe and a list of detected issues.

cellmap_segmentation_challenge.utils.security.load_safe_config(config_path, force_safe=False)[source]#: Loads the configuration script at config_path after verifying its safety. If force_safe is True, raises an error if the script is deemed unsafe.

class cellmap_segmentation_challenge.utils.security.Config(**kwargs)[source]#

Bases: object

to_dict()[source]#: Returns the configuration as a dictionary.

serialize()[source]#: Serializes the configuration to a string representation.

get(key: str, default: any = None) → any[source]#

Gets the value of a configuration key.

Parameters:

key (str)
default (any)

Return type:

any

cellmap_segmentation_challenge.utils.submission module#

cellmap_segmentation_challenge.utils.submission.save_numpy_class_labels_to_zarr(save_path, test_volume_name, label_name, labels, overwrite=False, attrs=None)[source]#

Save a single 3D numpy array of class labels to a Zarr-2 file with the required structure.

Parameters:

save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (str) – The names of the labels.
labels (np.ndarray) – A 3D numpy array of class labels.
overwrite (bool) – Whether to overwrite the Zarr-2 file if it already exists.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.

Example usage:: # Generate random class labels, with 0 as background labels = np.random.randint(0, 4, (128, 128, 128)) save_numpy_labels_to_zarr(‘submission.zarr’, ‘test_volume’, [‘label1’, ‘label2’, ‘label3’], labels)

cellmap_segmentation_challenge.utils.submission.save_numpy_class_arrays_to_zarr(save_path, test_volume_name, label_names, labels, mode='append', attrs=None)[source]#

Save a list of 3D numpy arrays of binary or instance labels to a Zarr-2 file with the required structure.

Parameters:

save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (list) – A list of label names corresponding to the list of 3D numpy arrays.
labels (list) – A list of 3D numpy arrays of binary labels.
mode (str) – The mode to use when saving the Zarr-2 file. Options are ‘append’ or ‘overwrite’.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.

Example usage:: label_names = [‘label1’, ‘label2’, ‘label3’] # Generate random binary volumes for each label labels = [np.random.randint(0, 2, (128, 128, 128)) for _ in range len(label_names)] save_numpy_binary_to_zarr(‘submission.zarr’, ‘test_volume’, label_names, labels)

cellmap_segmentation_challenge.utils.submission.zip_submission(zarr_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/submission.zarr')[source]#

(Re-)Zip a submission zarr file.

Parameters:: zarr_path (str | UPath) – The path to the submission zarr file (ending with <filename>.zarr). .zarr will be replaced with .zip.

cellmap_segmentation_challenge.utils.submission.package_crop(crop, zarr_group, overwrite, input_search_path='/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/processed/{dataset}.zarr/{crop}')[source]#

cellmap_segmentation_challenge.utils.submission.package_submission(input_search_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/processed/{dataset}.zarr/{crop}', output_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/submission.zarr', overwrite: bool = False, max_workers: int = 4)[source]#

Package a submission for the CellMap challenge. This will create a zarr file, combining all the processed volumes, and then zip it.

Parameters:

input_search_path (str) – The base path to the processed volumes, with placeholders for dataset and crops.
output_path (str | UPath) – The path to save the submission zarr to. (ending with <filename>.zarr; .zarr will be appended if not present, and replaced with .zip when zipped).
overwrite (bool) – Whether to overwrite the submission zarr if it already exists.
max_workers (int) – The maximum number of workers to use for parallel processing. Defaults to the number of CPUs.

cellmap_segmentation_challenge.utils.utils module#

cellmap_segmentation_challenge.utils.utils.format_coordinates(coordinates)[source]#

Format the coordinates to a string.

Parameters:: coordinates (list) – List of coordinates.
Returns:: Formatted string.
Return type:: str

cellmap_segmentation_challenge.utils.utils.construct_test_crop_manifest(path_root: str, search_path: str = '{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}', write_path: str | None = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/test_crop_manifest.csv', verbose: bool = False) → None | list[str][source]#

Construct a manifest file for testing crops from a given path.

Parameters:

path_root (str) – Path to the directory containing the datasets.
search_path (str, optional) – Format string to search for the crops. The default is “{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}”. The function assumes that the keys appear in the file tree in the following order: 1) “path_root”, 2) “dataset”, 3) “crop”, 4) “label”
write_path (str, optional) – Path to write the manifest file. The default is “test_crop_manifest.csv”.
verbose (bool, optional) – Print verbose output. The default is False.

Return type:

None | list[str]

cellmap_segmentation_challenge.utils.utils.construct_truth_dataset(path_root: str, search_path: str = '{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}', destination: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/ground_truth.zarr', write_path: str = '{crop}/{label}')[source]#

Construct a consolidated Zarr file for the groundtruth datasets, to use for evaluation.

Parameters:

path_root (str) – Path to the directory containing the datasets.
search_path (str, optional) – Format string to search for the crops. The default is “{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}”. The function assumes that the keys appear in the file tree in the following order: 1) “path_root”, 2) “dataset”, 3) “crop”, 4) “label”
destination (str, optional) – Path to write the consolidated Zarr file. The default is “cellmap-segmentation-challenge/data/ground_truth.zarr”.
write_path (str, optional) – Format string to write the crops to within the destination Zarr. The default is “{crop}/{label}”.

cellmap_segmentation_challenge.utils.utils.copy_gt(line, search_path, path_root, write_path, ground_truth)[source]#

cellmap_segmentation_challenge.utils.utils.simulate_predictions_iou_binary(labels, iou)[source]#

cellmap_segmentation_challenge.utils.utils.simulate_predictions_iou(true_labels, iou)[source]#

cellmap_segmentation_challenge.utils.utils.simulate_predictions_accuracy(true_labels, accuracy)[source]#

cellmap_segmentation_challenge.utils.utils.perturb_instance_mask(true_labels, hd_target=None, accuracy=0.8)[source]#

Simulate a predicted instance segmentation mask with an approximate Hausdorff distance.

Parameters: - true_labels: np.ndarray

Ground-truth instance segmentation mask.

hd_target: float | None
Desired approximate Hausdorff distance. If None, it will be calculated from the accuracy.
accuracy: float
Desired accuracy of the perturbed mask.

Returns: - np.ndarray

Perturbed instance segmentation mask.

cellmap_segmentation_challenge.utils.utils.download_file(url, dest)[source]#

cellmap_segmentation_challenge.utils.utils.format_string(string: str, format_kwargs: dict) → str[source]#

Convenience function to format a string with only the keys present in both the stringand in the format_kwargs. When all keys in the format_kwargs are present in string (in brackets), the function will return string.format(**format_kwargs) exactly. When none of the keys in the format_kwargs are present in the string, the function will return the original string, without error.

Parameters:

string (str) – The string to format.
format_kwargs (dict) – The dictionary of key-value pairs to format the string with.

Returns:

The formatted string

Return type:

str

Examples

format_string(“this/{thing}”, {}) # returns “this/{thing}” format_string(“this/{thing}”, {“thing”:”that”, “but”:”not this”}) # returns “this/that”

Module contents#

class cellmap_segmentation_challenge.utils.CellMapLossWrapper(loss_fn: _Loss | _WeightedLoss, **kwargs)[source]#

Bases: _Loss

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:: loss_fn (_Loss | _WeightedLoss)

calc_loss(outputs: Tensor, target: Tensor)[source]#

Parameters:

outputs (Tensor)
target (Tensor)

forward(outputs: dict | Tensor, targets: dict | Tensor)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Parameters:

outputs (dict | Tensor)
targets (dict | Tensor)

class cellmap_segmentation_challenge.utils.CropRow(id: int, dataset: str, alignment: str, gt_source: URL | TestCropRow, em_url: URL)[source]#

Bases: object

A dataclass representing a row in the crop manifest file.

Parameters:

id (int)
dataset (str)
alignment (str)
gt_source (URL | TestCropRow)
em_url (URL)

classmethod from_csv_row(row: str) → Self[source]#

Create a CropRow object from a CSV row.

Parameters:: row (str)
Return type:: Self

id: int#

dataset: str#

alignment: str#

gt_source: URL | TestCropRow#

em_url: URL#

class cellmap_segmentation_challenge.utils.TestCropRow(id: int, dataset: str, class_label: str, voxel_size: tuple[float, ...], translation: tuple[float, ...], shape: tuple[int, ...])[source]#

Bases: object

A dataclass representing a row in the test crop manifest file.

Parameters:

id (int)
dataset (str)
class_label (str)
voxel_size (tuple[float, ...])
translation (tuple[float, ...])
shape (tuple[int, ...])

classmethod from_csv_row(row: str) → Self[source]#

Create a CropRow object from a CSV row.

Parameters:: row (str)
Return type:: Self

id: int#

dataset: str#

class_label: str#

voxel_size: tuple[float, ...]#

translation: tuple[float, ...]#

shape: tuple[int, ...]#

cellmap_segmentation_challenge.utils.analyze_script(filepath)[source]#: Analyzes the script at filepath using ast for potentially unsafe imports and function calls. Returns a boolean indicating whether the script is safe and a list of detected issues.

cellmap_segmentation_challenge.utils.construct_test_crop_manifest(path_root: str, search_path: str = '{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}', write_path: str | None = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/test_crop_manifest.csv', verbose: bool = False) → None | list[str][source]#

Construct a manifest file for testing crops from a given path.

Parameters:

path_root (str) – Path to the directory containing the datasets.
search_path (str, optional) – Format string to search for the crops. The default is “{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}”. The function assumes that the keys appear in the file tree in the following order: 1) “path_root”, 2) “dataset”, 3) “crop”, 4) “label”
write_path (str, optional) – Path to write the manifest file. The default is “test_crop_manifest.csv”.
verbose (bool, optional) – Print verbose output. The default is False.

Return type:

None | list[str]

cellmap_segmentation_challenge.utils.construct_truth_dataset(path_root: str, search_path: str = '{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}', destination: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/ground_truth.zarr', write_path: str = '{crop}/{label}')[source]#

Construct a consolidated Zarr file for the groundtruth datasets, to use for evaluation.

Parameters:

path_root (str) – Path to the directory containing the datasets.
search_path (str, optional) – Format string to search for the crops. The default is “{path_root}/{dataset}/groundtruth.zarr/{crop}/{label}”. The function assumes that the keys appear in the file tree in the following order: 1) “path_root”, 2) “dataset”, 3) “crop”, 4) “label”
destination (str, optional) – Path to write the consolidated Zarr file. The default is “cellmap-segmentation-challenge/data/ground_truth.zarr”.
write_path (str, optional) – Format string to write the crops to within the destination Zarr. The default is “{crop}/{label}”.

cellmap_segmentation_challenge.utils.download_file(url, dest)[source]#

cellmap_segmentation_challenge.utils.fetch_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/manifest.csv') → tuple[CropRow, ...][source]#

Fetch a manifest file from a URL and return a tuple of CropRow objects.

Parameters:: url (str or yarl.URL) – The URL to the manifest file.
Returns:: A tuple of CropRow objects.
Return type:: tuple[CropRow, …]

cellmap_segmentation_challenge.utils.fetch_test_crop_manifest(url: str | URL = 'https://raw.githubusercontent.com/janelia-cellmap/cellmap-segmentation-challenge/refs/heads/main/src/cellmap_segmentation_challenge/utils/test_crop_manifest.csv') → tuple[TestCropRow, ...][source]#

Fetch a test manifest file from a URL and return a tuple of TestCropRow objects.

Parameters:: url (str or yarl.URL) – The URL to the manifest file.
Returns:: A tuple of TestCropRow objects.
Return type:: tuple[TestCropRow, …]

cellmap_segmentation_challenge.utils.format_string(string: str, format_kwargs: dict) → str[source]#

Parameters:

string (str) – The string to format.
format_kwargs (dict) – The dictionary of key-value pairs to format the string with.

Returns:

The formatted string

Return type:

str

Examples

format_string(“this/{thing}”, {}) # returns “this/{thing}” format_string(“this/{thing}”, {“thing”:”that”, “but”:”not this”}) # returns “this/that”

cellmap_segmentation_challenge.utils.get_class_relations(csv_path: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/classes.csv', named_classes: list[str] | None = None)[source]#

Parameters:

csv_path (str)
named_classes (list[str] | None)

cellmap_segmentation_challenge.utils.get_dataloader(config: ~typing.Any | None = None, datasplit_path: str = './datasplit.csv', classes: ~typing.Sequence[str] | None = None, batch_size: int = 1, input_array_info: ~typing.Mapping[str, ~typing.Sequence[int | float]] | None = None, target_array_info: ~typing.Mapping[str, ~typing.Sequence[int | float]] | None = None, spatial_transforms: ~typing.Mapping[str, ~typing.Any] | None = None, target_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( ToDtype(scale=False) Binarize(threshold=0) ), train_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( ToDtype(scale=False) Normalize NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), val_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( ToDtype(scale=False) Normalize NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), iterations_per_epoch: int = 1000, random_validation: bool = False, device: str | ~torch.device | None = None, use_mutual_exclusion: bool = False, weighted_sampler: bool = True, **kwargs) → tuple[CellMapDataLoader, CellMapDataLoader | None][source]#

Get the train and validation dataloaders.

This function gets the train and validation dataloaders for the given datasplit file, classes, batch size, array info, spatial transforms, iterations per epoch, number of workers, and device.

Parameters:

config (Optional[Any]) – Optional configuration object that can be used instead of the keyword arguments.
datasplit_path (str) – Path to the datasplit file that defines the train/val split the dataloader should use. Default is “./datasplit.csv”.
classes (Sequence[str] | None) – List of classes to segment. If None, assumes training on raw data. Default is None.
batch_size (int) – Batch size for the dataloader. Defaults to 1.
input_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the input.
target_array_info (Optional[Mapping[str, Sequence[int | float]]]) – Dictionary containing the shape and scale of the data to load for the target.
spatial_transforms (Optional[Mapping[str, any]]) –
Dictionary containing the spatial transformations to apply to the data. For example the dictionary could contain transformations like mirror, transpose, and rotate. spatial_transforms = {

# 3D

# Probability of applying mirror for each axis # Values range from 0 (no mirroring) to 1 (will always mirror) “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5, “z”: 0.5}},

# Specifies the axes that will be invovled in the trasposition “transpose”: {“axes”: [“x”, “y”, “z”]},

# Defines rotation range for each axis. # Rotation angle for each axis is randomly chosen within the specified range (-180, 180). “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180], “z”: [-180, 180]}},

# 2D (used when there is no z axis) # “mirror”: {“axes”: {“x”: 0.5, “y”: 0.5}}, # “transpose”: {“axes”: [“x”, “y”]}, # “rotate”: {“axes”: {“x”: [-180, 180], “y”: [-180, 180]}},

}
target_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the target values. Defaults to T.Compose([T.ToDtype(torch.float), Binarize()]) which converts the input masks to float32 and threshold at 0 (turning object ID’s into binary masks for use with binary cross entropy loss).
train_raw_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the raw values for training. Defaults to T.Compose([T.ToDtype(torch.float), Normalize(), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0. This can be used to add augmentations such as random erasing, blur, noise, etc.
val_raw_value_transforms (Optional[torchvision.transforms.v2.Transform]) – Transform to apply to the raw values for validation. Defaults to T.Compose([T.ToDtype(torch.float), Normalize(), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0.
iterations_per_epoch (int) – Number of iterations per epoch.
random_validation (bool) – Whether or not to randomize the validation data draws. Useful if not evaluating on the entire validation set everytime. Defaults to False.
device (Optional[str or torch.device]) – Device to use for training. If None, defaults to “cuda” if available, or “mps” if available, or “cpu”.
use_mutual_exclusion (bool) – Whether to use mutually exclusive class labels to infer non-present labels for the training data. Defaults to False.
weighted_sampler (bool) – Whether to weight sample draws based on the number of positive labels within a dataset. Defaults to True.
**kwargs (Any) – Additional keyword arguments to pass to the CellMapDataLoader.

Returns:

Tuple containing the train and validation dataloaders.

Return type:

tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]

cellmap_segmentation_challenge.utils.get_formatted_fields(path: str, base_path: str, fields: list[str]) → dict[str, str][source]#

Get the formatted fields from the path.

Parameters:

path (str) – The path to get the fields from.
base_path (str) – The unformatted path to find the fields in.
fields (list[str]) – The fields to get from the path.

Returns:

The formatted fields.

Return type:

dict[str, str]

cellmap_segmentation_challenge.utils.get_test_crops() → tuple[CropRow, ...][source]#

Return type:: tuple[CropRow, …]

cellmap_segmentation_challenge.utils.get_tested_classes(csv_path: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/cellmap_segmentation_challenge/utils/tested_classes.csv')[source]#

Get the classes that will be tested for the challenge.

Parameters:: csv_path (str, optional) – The path to the csv file, by default “tested_classes.csv”
Returns:: A list of the classes that have been tested.
Return type:: list[str]

cellmap_segmentation_challenge.utils.load_safe_config(config_path, force_safe=False)[source]#: Loads the configuration script at config_path after verifying its safety. If force_safe is True, raises an error if the script is deemed unsafe.

cellmap_segmentation_challenge.utils.make_datasplit_csv(classes: list[str] = ['nuc', 'mito'], scale: float | list[float] = None, force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], search_path: str = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}', csv_path: str = 'datasplit.csv', dry_run: bool = False)[source]#

Make a datasplit csv file for the given classes and datasets.

Parameters:

classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
scale (float | list[float], optional) – Single scalar or list of scalars defining resolution (scale) used to filter out crops that don’t have data at required scale. If only a scalar is specified, isotropic resolution is assumed. Default is not to filter data by resolution (e.g. None).
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
search_path (str, optional) – The search path to use to find the datasets, by default SEARCH_PATH
raw_name (str, optional) – The name of the raw data, by default RAW_NAME
crop_name (str, optional) – The name of the crop, by default CROP_NAME
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False

cellmap_segmentation_challenge.utils.make_s3_datasplit_csv(classes: list[str] = ['nuc', 'mito'], scale: float | list[float] = None, force_all_classes: bool | str = False, validation_prob: float = 0.1, datasets: list[str] = ['*'], crops: list[str] = ['*'], csv_path: str = 'datasplit.csv', dry_run: bool = False, **kwargs)[source]#

Make a datasplit csv file for the given classes and datasets.

Parameters:

classes (list[str], optional) – The classes to include in the csv, by default [“nuc”, “mito”]
scale (float | list[float], optional) – Single scalar or list of scalars defining resolution (scale) used to filter out crops that don’t have data at required scale. If only a scalar is specified, isotropic resolution is assumed. Default is not to filter data by resolution (e.g. None).
force_all_classes (bool | str, optional) – If True, force all classes to be present in the training/validation datasets. If False, as long as at least one requested class is present, a crop will be included. If “train” or “validate”, force all classes to be present in the training or validation datasets, respectively. By default False.
validation_prob (float, optional) – The probability of a dataset being in the validation set, by default 0.1
datasets (list[str], optional) – The datasets to include in the csv, by default [“*”], which includes all datasets
crops (list[str], optional) – The crops to include in the csv, by default all crops are included. Otherwise, only the crops in the list are included.
csv_path (str, optional) – The path to write the csv file to, by default “datasplit.csv”
dry_run (bool, optional) – If True, do not write the csv file - just return the found datapaths. By default False
**kwargs (dict) – Additional keyword arguments will be unused. Kept for compatibility with make_datasplit_csv.

cellmap_segmentation_challenge.utils.package_submission(input_search_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/processed/{dataset}.zarr/{crop}', output_path: str | UPath = '/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/data/submission.zarr', overwrite: bool = False, max_workers: int = 4)[source]#

Package a submission for the CellMap challenge. This will create a zarr file, combining all the processed volumes, and then zip it.

Parameters:

input_search_path (str) – The base path to the processed volumes, with placeholders for dataset and crops.
output_path (str | UPath) – The path to save the submission zarr to. (ending with <filename>.zarr; .zarr will be appended if not present, and replaced with .zip when zipped).
overwrite (bool) – Whether to overwrite the submission zarr if it already exists.
max_workers (int) – The maximum number of workers to use for parallel processing. Defaults to the number of CPUs.

cellmap_segmentation_challenge.utils.perturb_instance_mask(true_labels, hd_target=None, accuracy=0.8)[source]#

Simulate a predicted instance segmentation mask with an approximate Hausdorff distance.

Parameters: - true_labels: np.ndarray

Ground-truth instance segmentation mask.

hd_target: float | None
Desired approximate Hausdorff distance. If None, it will be calculated from the accuracy.
accuracy: float
Desired accuracy of the perturbed mask.

Returns: - np.ndarray

Perturbed instance segmentation mask.

cellmap_segmentation_challenge.utils.save_numpy_class_arrays_to_zarr(save_path, test_volume_name, label_names, labels, mode='append', attrs=None)[source]#

Save a list of 3D numpy arrays of binary or instance labels to a Zarr-2 file with the required structure.

Parameters:

save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (list) – A list of label names corresponding to the list of 3D numpy arrays.
labels (list) – A list of 3D numpy arrays of binary labels.
mode (str) – The mode to use when saving the Zarr-2 file. Options are ‘append’ or ‘overwrite’.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.

Example usage:: label_names = [‘label1’, ‘label2’, ‘label3’] # Generate random binary volumes for each label labels = [np.random.randint(0, 2, (128, 128, 128)) for _ in range len(label_names)] save_numpy_binary_to_zarr(‘submission.zarr’, ‘test_volume’, label_names, labels)

cellmap_segmentation_challenge.utils.save_numpy_class_labels_to_zarr(save_path, test_volume_name, label_name, labels, overwrite=False, attrs=None)[source]#

Save a single 3D numpy array of class labels to a Zarr-2 file with the required structure.

Parameters:

save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (str) – The names of the labels.
labels (np.ndarray) – A 3D numpy array of class labels.
overwrite (bool) – Whether to overwrite the Zarr-2 file if it already exists.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.

Example usage:: # Generate random class labels, with 0 as background labels = np.random.randint(0, 4, (128, 128, 128)) save_numpy_labels_to_zarr(‘submission.zarr’, ‘test_volume’, [‘label1’, ‘label2’, ‘label3’], labels)

cellmap_segmentation_challenge.utils.simulate_predictions_accuracy(true_labels, accuracy)[source]#

cellmap_segmentation_challenge.utils.simulate_predictions_iou(true_labels, iou)[source]#

cellmap_segmentation_challenge.utils.simulate_predictions_iou_binary(labels, iou)[source]#

cellmap_segmentation_challenge.utils package

Contents

cellmap_segmentation_challenge.utils package#

Submodules#

cellmap_segmentation_challenge.utils.crops module#

cellmap_segmentation_challenge.utils.dataloader module#

cellmap_segmentation_challenge.utils.datasplit module#

cellmap_segmentation_challenge.utils.fetch_data module#

cellmap_segmentation_challenge.utils.loss module#

cellmap_segmentation_challenge.utils.security module#

cellmap_segmentation_challenge.utils.submission module#

cellmap_segmentation_challenge.utils.utils module#

Module contents#