cellmap_segmentation_challenge package#

Subpackages#

Submodules#

cellmap_segmentation_challenge.config module#

cellmap_segmentation_challenge.evaluate module#

class cellmap_segmentation_challenge.evaluate.spoof_precomputed(array, ids)[source]#

Bases: object

__getitem__(ids)[source]#
__len__()[source]#
cellmap_segmentation_challenge.evaluate.optimized_hausdorff_distances(truth_label, matched_pred_label, voxel_size, hausdorff_distance_max, method='standard')[source]#
cellmap_segmentation_challenge.evaluate.compute_hausdorff_distance(image0, image1, voxel_size, max_distance, method)[source]#

Compute the Hausdorff distance between two binary masks, optimized for pre-vectorized inputs.

cellmap_segmentation_challenge.evaluate.score_instance(pred_label, truth_label, voxel_size, hausdorff_distance_max=inf) dict[str, float][source]#

Score a single instance label volume against the ground truth instance label volume.

Parameters:
  • pred_label (np.ndarray) – The predicted instance label volume.

  • truth_label (np.ndarray) – The ground truth instance label volume.

  • voxel_size (tuple) – The size of a voxel in each dimension.

  • hausdorff_distance_max (float) – The maximum distance to consider for the Hausdorff distance.

Returns:

A dictionary of scores for the instance label volume.

Return type:

dict

Example usage:

scores = score_instance(pred_label, truth_label)

cellmap_segmentation_challenge.evaluate.score_semantic(pred_label, truth_label) dict[str, float][source]#

Score a single semantic label volume against the ground truth semantic label volume.

Parameters:
  • pred_label (np.ndarray) – The predicted semantic label volume.

  • truth_label (np.ndarray) – The ground truth semantic label volume.

Returns:

A dictionary of scores for the semantic label volume.

Return type:

dict

Example usage:

scores = score_semantic(pred_label, truth_label)

cellmap_segmentation_challenge.evaluate.score_label(pred_label_path, label_name, crop_name, truth_path='/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/ground_truth.zarr', instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance']) dict[str, float][source]#

Score a single label volume against the ground truth label volume.

Parameters:
  • pred_label_path (str) – The path to the predicted label volume.

  • truth_path (str) – The path to the ground truth label volume.

  • instance_classes (list) – A list of instance classes.

Returns:

A dictionary of scores for the label volume.

Return type:

dict

Example usage:

scores = score_label(‘pred.zarr/test_volume/label1’)

cellmap_segmentation_challenge.evaluate.empty_label_score(label, crop_name, instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance'], truth_path='/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/ground_truth.zarr')[source]#
cellmap_segmentation_challenge.evaluate.get_evaluation_args(volumes, submission_path, truth_path='/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/ground_truth.zarr', instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance']) dict[str, dict[str, float]][source]#

Get the arguments for scoring each label in the submission. :param volumes: A list of volumes to score. :type volumes: list :param submission_path: The path to the submission volume. :type submission_path: str :param truth_path: The path to the ground truth volume. :type truth_path: str :param instance_classes: A list of instance classes. :type instance_classes: list

Returns:

A list of tuples containing the arguments for each label to be scored.

Return type:

dict[str, dict[str, float]]

cellmap_segmentation_challenge.evaluate.missing_volume_score(truth_volume_path, instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance']) dict[str, dict[str, float]][source]#

Score a missing volume as 0’s, congruent with the score_volume function.

Parameters:

truth_volume_path (str) – The path to the ground truth volume.

Returns:

A dictionary of scores for the volume.

Return type:

dict

Example usage:

scores = missing_volume_score(‘truth.zarr/test_volume’)

cellmap_segmentation_challenge.evaluate.combine_scores(scores, include_missing=True, instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance'], cast_to_none=[nan, inf, -inf])[source]#

Combine scores across volumes, normalizing by the number of voxels.

Parameters:
  • scores (dict) – A dictionary of scores for each volume, as returned by score_volume.

  • include_missing (bool) – Whether to include missing volumes in the combined scores.

  • instance_classes (list) – A list of instance classes.

  • cast_to_none (list) – A list of values to cast to None in the combined scores.

Returns:

A dictionary of combined scores across all volumes.

Return type:

dict

Example usage:

combined_scores = combine_scores(scores)

cellmap_segmentation_challenge.evaluate.score_submission(submission_path='/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/submission.zip', result_file=None, truth_path='/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/ground_truth.zarr', instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance'])[source]#

Score a submission against the ground truth data.

Parameters:
  • submission_path (str) – The path to the zipped submission Zarr-2 file.

  • result_file (str) – The path to save the scores.

Returns:

A dictionary of scores for the submission.

Return type:

dict

Example usage:

scores = score_submission(‘submission.zip’)

The results json is a dictionary with the following structure: {

“volume” (the name of the ground truth volume): {
“label” (the name of the predicted class): {
(For semantic segmentation)

“iou”: (the intersection over union score), “dice_score”: (the dice score),

OR

(For instance segmentation)

“accuracy”: (the accuracy score), “haussdorf_distance”: (the haussdorf distance), “normalized_haussdorf_distance”: (the normalized haussdorf distance), “combined_score”: (the geometric mean of the accuracy and normalized haussdorf distance),

} “num_voxels”: (the number of voxels in the ground truth volume),

} “label_scores”: {

(the name of the predicted class): {
(For semantic segmentation)

“iou”: (the mean intersection over union score), “dice_score”: (the mean dice score),

OR

(For instance segmentation)

“accuracy”: (the mean accuracy score), “haussdorf_distance”: (the mean haussdorf distance), “combined_score”: (the mean geometric mean of the accuracy and haussdorf distance),

}

“overall_score”: (the mean of the combined scores across all classes),

}

cellmap_segmentation_challenge.evaluate.resize_array(arr, target_shape, pad_value=0)[source]#

Resize an array to a target shape by padding or cropping as needed.

Parameters:
  • arr (np.ndarray) – Input array to resize.

  • target_shape (tuple) – Desired shape for the output array.

  • pad_value (int, float, etc.) – Value to use for padding if the array is smaller than the target shape.

Returns:

Resized array with the specified target shape.

Return type:

np.ndarray

cellmap_segmentation_challenge.evaluate.match_crop_space(path, class_label, voxel_size, shape, translation) ndarray[source]#

Match the resolution of a zarr array to a target resolution and shape, resampling as necessary with interpolation dependent on the class label. Instance segmentations will be resampled with nearest neighbor interpolation, while semantic segmentations will be resampled with linear interpolation and then thresholded.

Parameters:
  • path (str | UPath) – The path to the zarr array to be adjusted. The zarr can be an OME-NGFF multiscale zarr file, or a traditional single scale formatted zarr.

  • class_label (str) – The class label of the array.

  • voxel_size (tuple) – The target voxel size.

  • shape (tuple) – The target shape.

  • translation (tuple) – The translation (i.e. offset) of the array in world units.

Returns:

The rescaled array.

Return type:

np.ndarray

cellmap_segmentation_challenge.evaluate.unzip_file(zip_path)[source]#

Unzip a zip file to a specified directory.

Parameters:

zip_path (str) – The path to the zip file.

Example usage:

unzip_file(‘submission.zip’)

cellmap_segmentation_challenge.predict module#

cellmap_segmentation_challenge.predict.predict_orthoplanes(model: Module, dataset_writer_kwargs: dict[str, Any], batch_size: int)[source]#
Parameters:
  • model (Module)

  • dataset_writer_kwargs (dict[str, Any])

  • batch_size (int)

cellmap_segmentation_challenge.predict.predict(config_path: str, crops: str = 'test', output_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/predictions/{dataset}.zarr/{crop}', do_orthoplanes: bool = True, overwrite: bool = False, search_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}')[source]#

Given a model configuration file and list of crop numbers, predicts the output of a model on a large dataset by splitting it into blocks and predicting each block separately.

Parameters:
  • config_path (str) – The path to the model configuration file. This can be the same as the config file used for training.

  • crops (str, optional) – A comma-separated list of crop numbers to predict on, or “test” to predict on the entire test set. Default is “test”.

  • output_path (str, optional) – The path to save the output predictions to, formatted as a string with a placeholders for the dataset, crop number, and label. Default is PREDICTIONS_PATH set in cellmap-segmentation/config.py.

  • do_orthoplanes (bool, optional) – Whether to compute the average of predictions from x, y, and z orthogonal planes for the full 3D volume. This is sometimes called 2.5D predictions. It expects a model that yields 2D outputs. Similarly, it expects the input shape to the model to be 2D. Default is True for 2D models.

  • overwrite (bool, optional) – Whether to overwrite the output dataset if it already exists. Default is False.

  • search_path (str, optional) – The path to search for the raw dataset, with placeholders for dataset and name. Default is SEARCH_PATH set in cellmap-segmentation/config.py.

  • raw_name (str, optional) – The name of the raw dataset. Default is RAW_NAME set in cellmap-segmentation/config.py.

  • crop_name (str, optional) – The name of the crop dataset with placeholders for crop and label. Default is CROP_NAME set in cellmap-segmentation/config.py.

cellmap_segmentation_challenge.process module#

cellmap_segmentation_challenge.process.process(config_path: str | UPath, crops: str = 'test', input_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/predictions/{dataset}.zarr/{crop}', output_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/processed/{dataset}.zarr/{crop}', overwrite: bool = False, device: str | device | None = None, max_workers: int = 4) None[source]#

Process and save arrays using an arbitrary process function defined in a config python file.

Parameters:
  • config_path (str | UPath) – The path to the python file containing the process function and other configurations. The script should specify the process function as process_func; input_array_info and target_array_info corresponding to the chunk sizes and scales for the input and output datasets, respectively; batch_size; classes; and any other required configurations. The process function should take an array as input and return an array as output.

  • crops (str, optional) – A comma-separated list of crop numbers to process, or “test” to process the entire test set. Default is “test”.

  • input_path (str, optional) – The path to the data to process, formatted as a string with a placeholders for the crop number, dataset, and label. Default is PREDICTIONS_PATH set in cellmap-segmentation/config.py.

  • output_path (str, optional) – The path to save the processed output to, formatted as a string with a placeholders for the crop number, dataset, and label. Default is PROCESSED_PATH set in cellmap-segmentation/config.py.

  • overwrite (bool, optional) – Whether to overwrite the output dataset if it already exists. Default is False.

  • device (str | torch.device, optional) – The device to use for processing the data. Default is to use that specified in the config. If not specified, then defaults to “cuda” if available, then “mps”, otherwise “cpu”.

  • max_workers (int, optional) – The maximum number of workers to use for processing the data. Default is the number of CPUs on the system.

Return type:

None

cellmap_segmentation_challenge.train module#

cellmap_segmentation_challenge.train.train(config_path: str)[source]#

Train a model using the configuration file at the specified path. The model checkpoints and training logs, as well as the datasets used for training, will be saved to the paths specified in the configuration file.

Parameters:

config_path (str) – Path to the configuration file to use for training the model. This file should be a Python file that defines the hyperparameters and other configurations for training the model. This may include: - model_save_path: Path to save the model checkpoints. Default is ‘checkpoints/{model_name}_{epoch}.pth’. - logs_save_path: Path to save the logs for tensorboard. Default is ‘tensorboard/{model_name}’. Training progress may be monitored by running tensorboard –logdir <logs_save_path> in the terminal. - datasplit_path: Path to the datasplit file that defines the train/val split the dataloader should use. Default is ‘datasplit.csv’. - validation_prob: Proportion of the datasets to use for validation. This is used if the datasplit CSV specified by datasplit_path does not already exist. Default is 0.15. - learning_rate: Learning rate for the optimizer. Default is 0.0001. - batch_size: Batch size for the dataloader. Default is 8. - input_array_info: Dictionary containing the shape and scale of the input data. Default is {‘shape’: (1, 128, 128), ‘scale’: (8, 8, 8)}. - target_array_info: Dictionary containing the shape and scale of the target data. Default is to use input_array_info. - epochs: Number of epochs to train the model for. Default is 1000. - iterations_per_epoch: Number of iterations per epoch. Each iteration includes an independently generated random batch from the training set. Default is 1000. - random_seed: Random seed for reproducibility. Default is 42. - classes: List of classes to train the model to predict. This will be reflected in the data included in the datasplit, if generated de novo after calling this script. Default is [‘nuc’, ‘er’]. - model_name: Name of the model to use. If the config file constructs the PyTorch model, this name can be anything. If the config file does not construct the PyTorch model, the model_name will need to specify which included architecture to use. This includes ‘2d_unet’, ‘2d_resnet’, ‘3d_unet’, ‘3d_resnet’, and ‘vitnet’. Default is ‘2d_unet’. See the models module README.md for more information. - model_to_load: Name of the pre-trained model to load. Default is the same as model_name. - model_kwargs: Dictionary of keyword arguments to pass to the model constructor. Default is {}. If the PyTorch model is passed, this will be ignored. See the models module README.md for more information. - model: PyTorch model to use for training. If this is provided, the model_name and model_to_load can be any string. Default is None. - load_model: Which model checkpoint to load if it exists. Options are ‘latest’ or ‘best’. If no checkpoints exist, will silently use the already initialized model. Default is ‘latest’. - spatial_transforms: Dictionary of spatial transformations to apply to the training data. Default is {‘mirror’: {‘axes’: {‘x’: 0.5, ‘y’: 0.5}}, ‘transpose’: {‘axes’: [‘x’, ‘y’]}, ‘rotate’: {‘axes’: {‘x’: [-180, 180], ‘y’: [-180, 180]}}}. See the dataloader module documentation for more information. - validation_time_limit: Maximum time to spend on validation in seconds. If None, there is no time limit. Default is None. - validation_batch_limit: Maximum number of validation batches to process. If None, there is no limit. Default is None. - device: Device to use for training. If None, will use ‘cuda’ if available, ‘mps’ if available, or ‘cpu’ otherwise. Default is None. - use_s3: Whether to use the S3 bucket for the datasplit. Default is False. - optimizer: PyTorch optimizer to use for training. Default is torch.optim.RAdam(model.parameters(), lr=learning_rate, decoupled_weight_decay=True). - criterion: Uninstantiated PyTorch loss function to use for training. Default is torch.nn.BCEWithLogitsLoss. - criterion_kwargs: Dictionary of keyword arguments to pass to the loss function constructor. Default is {}. - weight_loss: Whether to weight the loss function by class counts found in the datasets. Default is True. - use_mutual_exclusion: Whether to use mutual exclusion to infer labels for unannotated pixels. Default is False. - weighted_sampler: Whether to use a sampler weighted by class counts for the dataloader. Default is True. - train_raw_value_transforms: Transform to apply to the raw values for training. Defaults to T.Compose([T.ToDtype(torch.float, scale=True), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0. This can be used to add augmentations such as random erasing, blur, noise, etc. - val_raw_value_transforms: Transform to apply to the raw values for validation, similar to train_raw_value_transforms. Default is the same as train_raw_value_transforms. - target_value_transforms: Transform to apply to the target values. Default is T.Compose([T.ToDtype(torch.float), Binarize()]) which converts the input masks to float32 and threshold at 0 (turning object ID’s into binary masks for use with binary cross entropy loss). This can be used to specify other targets, such as distance transforms. - max_grad_norm: Maximum gradient norm for clipping. If None, no clipping is performed. Default is None. This can be useful to prevent exploding gradients which would lead to NaNs in the weights. - force_all_classes: Whether to force all classes to be present in each batch provided by dataloaders. Can either be True to force this for both validation and training dataloader, False to force for neither, or train / validate to restrict it to training or validation, respectively. Default is ‘validate’. - scheduler: PyTorch learning rate scheduler (or uninstantiated class) to use for training. Default is None. If provided, the scheduler will be called at the end of each epoch. - scheduler_kwargs: Dictionary of keyword arguments to pass to the scheduler constructor. Default is {}. If scheduler instantiation is provided, this will be ignored.

Return type:

None

cellmap_segmentation_challenge.visualize module#

cellmap_segmentation_challenge.visualize.visualize(datasets: str | Sequence[str] = '*', crops: int | list = ['*'], classes: str | Sequence[str] = '*', kinds: Sequence[str] = ['gt', 'predictions', 'processed'])[source]#

Visualize datasets and crops in Neuroglancer.

Parameters:
  • datasets (str | Sequence[str], optional) – The name of the dataset to visualize. Can be a string or a list of strings. Default is “*”. If “*”, all datasets will be visualized.

  • crops (int | Sequence[int], optional) – The crop number(s) to visualize. Can be an integer or a list of integers, or None. Default is None. If None, all crops will be visualized.

  • classes (str | Sequence[str], optional) – The class to visualize. Can be a string or a list of strings. Default is “*”. If “*”, all classes will be visualized.

  • kinds (Sequence[str], optional) – The type of layers to visualize. Can be “gt” for groundtruth, “predictions” for predictions, or “processed” for processed data. Default is [“gt”, “predictions”, “processed”, “submission”].

cellmap_segmentation_challenge.visualize.add_layers(viewer: Viewer, kind: str, dataset_name: str, crops: Sequence, classes: Sequence[str], search_paths={'gt': '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/labels/groundtruth/{crop}/{label}', 'predictions': '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/predictions/{dataset}.zarr/{crop}/{label}', 'processed': '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/processed/{dataset}.zarr/{crop}/{label}'}, visible: bool = False) Viewer | None[source]#

Add layers to a Neuroglancer viewer.

Parameters:
  • viewer (neuroglancer.Viewer) – The viewer to add layers to.

  • kind (str) – The type of layers to add. Can be “gt” for groundtruth, “predictions” for predictions, or “processed” for processed data.

  • dataset_name (str) – The name of the dataset to add layers for.

  • crops (Sequence) – The crops to add layers for.

  • classes (Sequence[str]) – The class(es) to add layers for.

  • search_paths (dict, optional) – The search paths to use for finding the data. Default is SEARCH_PATHS.

  • visible (bool, optional) – Whether the layers should be visible. Default is False.

Return type:

Viewer | None

cellmap_segmentation_challenge.visualize.get_layer(data_path: str, layer_type: str = 'image', multiscale: bool = True) Layer[source]#

Get a Neuroglancer layer from a zarr data path for a LocalVolume.

Parameters:
  • data_path (str) – The path to the zarr data.

  • layer_type (str) – The type of layer to get. Can be “image” or “segmentation”. Default is “image”.

  • multiscale (bool) – Whether the metadata is OME-NGFF multiscale. Default is True.

Returns:

The Neuroglancer layer.

Return type:

neuroglancer.Layer

cellmap_segmentation_challenge.visualize.get_image(data_path: str)[source]#
Parameters:

data_path (str)

cellmap_segmentation_challenge.visualize.parse_multiscale_metadata(data_path: str)[source]#
Parameters:

data_path (str)

Module contents#

cellmap_segmentation_challenge.match_crop_space(path, class_label, voxel_size, shape, translation) ndarray[source]#

Match the resolution of a zarr array to a target resolution and shape, resampling as necessary with interpolation dependent on the class label. Instance segmentations will be resampled with nearest neighbor interpolation, while semantic segmentations will be resampled with linear interpolation and then thresholded.

Parameters:
  • path (str | UPath) – The path to the zarr array to be adjusted. The zarr can be an OME-NGFF multiscale zarr file, or a traditional single scale formatted zarr.

  • class_label (str) – The class label of the array.

  • voxel_size (tuple) – The target voxel size.

  • shape (tuple) – The target shape.

  • translation (tuple) – The translation (i.e. offset) of the array in world units.

Returns:

The rescaled array.

Return type:

np.ndarray

cellmap_segmentation_challenge.predict(config_path: str, crops: str = 'test', output_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/predictions/{dataset}.zarr/{crop}', do_orthoplanes: bool = True, overwrite: bool = False, search_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}')[source]#

Given a model configuration file and list of crop numbers, predicts the output of a model on a large dataset by splitting it into blocks and predicting each block separately.

Parameters:
  • config_path (str) – The path to the model configuration file. This can be the same as the config file used for training.

  • crops (str, optional) – A comma-separated list of crop numbers to predict on, or “test” to predict on the entire test set. Default is “test”.

  • output_path (str, optional) – The path to save the output predictions to, formatted as a string with a placeholders for the dataset, crop number, and label. Default is PREDICTIONS_PATH set in cellmap-segmentation/config.py.

  • do_orthoplanes (bool, optional) – Whether to compute the average of predictions from x, y, and z orthogonal planes for the full 3D volume. This is sometimes called 2.5D predictions. It expects a model that yields 2D outputs. Similarly, it expects the input shape to the model to be 2D. Default is True for 2D models.

  • overwrite (bool, optional) – Whether to overwrite the output dataset if it already exists. Default is False.

  • search_path (str, optional) – The path to search for the raw dataset, with placeholders for dataset and name. Default is SEARCH_PATH set in cellmap-segmentation/config.py.

  • raw_name (str, optional) – The name of the raw dataset. Default is RAW_NAME set in cellmap-segmentation/config.py.

  • crop_name (str, optional) – The name of the crop dataset with placeholders for crop and label. Default is CROP_NAME set in cellmap-segmentation/config.py.

cellmap_segmentation_challenge.process(config_path: str | UPath, crops: str = 'test', input_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/predictions/{dataset}.zarr/{crop}', output_path: str = '/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/processed/{dataset}.zarr/{crop}', overwrite: bool = False, device: str | device | None = None, max_workers: int = 4) None[source]#

Process and save arrays using an arbitrary process function defined in a config python file.

Parameters:
  • config_path (str | UPath) – The path to the python file containing the process function and other configurations. The script should specify the process function as process_func; input_array_info and target_array_info corresponding to the chunk sizes and scales for the input and output datasets, respectively; batch_size; classes; and any other required configurations. The process function should take an array as input and return an array as output.

  • crops (str, optional) – A comma-separated list of crop numbers to process, or “test” to process the entire test set. Default is “test”.

  • input_path (str, optional) – The path to the data to process, formatted as a string with a placeholders for the crop number, dataset, and label. Default is PREDICTIONS_PATH set in cellmap-segmentation/config.py.

  • output_path (str, optional) – The path to save the processed output to, formatted as a string with a placeholders for the crop number, dataset, and label. Default is PROCESSED_PATH set in cellmap-segmentation/config.py.

  • overwrite (bool, optional) – Whether to overwrite the output dataset if it already exists. Default is False.

  • device (str | torch.device, optional) – The device to use for processing the data. Default is to use that specified in the config. If not specified, then defaults to “cuda” if available, then “mps”, otherwise “cpu”.

  • max_workers (int, optional) – The maximum number of workers to use for processing the data. Default is the number of CPUs on the system.

Return type:

None

cellmap_segmentation_challenge.score_submission(submission_path='/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/submission.zip', result_file=None, truth_path='/opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/data/ground_truth.zarr', instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance'])[source]#

Score a submission against the ground truth data.

Parameters:
  • submission_path (str) – The path to the zipped submission Zarr-2 file.

  • result_file (str) – The path to save the scores.

Returns:

A dictionary of scores for the submission.

Return type:

dict

Example usage:

scores = score_submission(‘submission.zip’)

The results json is a dictionary with the following structure: {

“volume” (the name of the ground truth volume): {
“label” (the name of the predicted class): {
(For semantic segmentation)

“iou”: (the intersection over union score), “dice_score”: (the dice score),

OR

(For instance segmentation)

“accuracy”: (the accuracy score), “haussdorf_distance”: (the haussdorf distance), “normalized_haussdorf_distance”: (the normalized haussdorf distance), “combined_score”: (the geometric mean of the accuracy and normalized haussdorf distance),

} “num_voxels”: (the number of voxels in the ground truth volume),

} “label_scores”: {

(the name of the predicted class): {
(For semantic segmentation)

“iou”: (the mean intersection over union score), “dice_score”: (the mean dice score),

OR

(For instance segmentation)

“accuracy”: (the mean accuracy score), “haussdorf_distance”: (the mean haussdorf distance), “combined_score”: (the mean geometric mean of the accuracy and haussdorf distance),

}

“overall_score”: (the mean of the combined scores across all classes),

}

cellmap_segmentation_challenge.train(config_path: str)[source]#

Train a model using the configuration file at the specified path. The model checkpoints and training logs, as well as the datasets used for training, will be saved to the paths specified in the configuration file.

Parameters:

config_path (str) – Path to the configuration file to use for training the model. This file should be a Python file that defines the hyperparameters and other configurations for training the model. This may include: - model_save_path: Path to save the model checkpoints. Default is ‘checkpoints/{model_name}_{epoch}.pth’. - logs_save_path: Path to save the logs for tensorboard. Default is ‘tensorboard/{model_name}’. Training progress may be monitored by running tensorboard –logdir <logs_save_path> in the terminal. - datasplit_path: Path to the datasplit file that defines the train/val split the dataloader should use. Default is ‘datasplit.csv’. - validation_prob: Proportion of the datasets to use for validation. This is used if the datasplit CSV specified by datasplit_path does not already exist. Default is 0.15. - learning_rate: Learning rate for the optimizer. Default is 0.0001. - batch_size: Batch size for the dataloader. Default is 8. - input_array_info: Dictionary containing the shape and scale of the input data. Default is {‘shape’: (1, 128, 128), ‘scale’: (8, 8, 8)}. - target_array_info: Dictionary containing the shape and scale of the target data. Default is to use input_array_info. - epochs: Number of epochs to train the model for. Default is 1000. - iterations_per_epoch: Number of iterations per epoch. Each iteration includes an independently generated random batch from the training set. Default is 1000. - random_seed: Random seed for reproducibility. Default is 42. - classes: List of classes to train the model to predict. This will be reflected in the data included in the datasplit, if generated de novo after calling this script. Default is [‘nuc’, ‘er’]. - model_name: Name of the model to use. If the config file constructs the PyTorch model, this name can be anything. If the config file does not construct the PyTorch model, the model_name will need to specify which included architecture to use. This includes ‘2d_unet’, ‘2d_resnet’, ‘3d_unet’, ‘3d_resnet’, and ‘vitnet’. Default is ‘2d_unet’. See the models module README.md for more information. - model_to_load: Name of the pre-trained model to load. Default is the same as model_name. - model_kwargs: Dictionary of keyword arguments to pass to the model constructor. Default is {}. If the PyTorch model is passed, this will be ignored. See the models module README.md for more information. - model: PyTorch model to use for training. If this is provided, the model_name and model_to_load can be any string. Default is None. - load_model: Which model checkpoint to load if it exists. Options are ‘latest’ or ‘best’. If no checkpoints exist, will silently use the already initialized model. Default is ‘latest’. - spatial_transforms: Dictionary of spatial transformations to apply to the training data. Default is {‘mirror’: {‘axes’: {‘x’: 0.5, ‘y’: 0.5}}, ‘transpose’: {‘axes’: [‘x’, ‘y’]}, ‘rotate’: {‘axes’: {‘x’: [-180, 180], ‘y’: [-180, 180]}}}. See the dataloader module documentation for more information. - validation_time_limit: Maximum time to spend on validation in seconds. If None, there is no time limit. Default is None. - validation_batch_limit: Maximum number of validation batches to process. If None, there is no limit. Default is None. - device: Device to use for training. If None, will use ‘cuda’ if available, ‘mps’ if available, or ‘cpu’ otherwise. Default is None. - use_s3: Whether to use the S3 bucket for the datasplit. Default is False. - optimizer: PyTorch optimizer to use for training. Default is torch.optim.RAdam(model.parameters(), lr=learning_rate, decoupled_weight_decay=True). - criterion: Uninstantiated PyTorch loss function to use for training. Default is torch.nn.BCEWithLogitsLoss. - criterion_kwargs: Dictionary of keyword arguments to pass to the loss function constructor. Default is {}. - weight_loss: Whether to weight the loss function by class counts found in the datasets. Default is True. - use_mutual_exclusion: Whether to use mutual exclusion to infer labels for unannotated pixels. Default is False. - weighted_sampler: Whether to use a sampler weighted by class counts for the dataloader. Default is True. - train_raw_value_transforms: Transform to apply to the raw values for training. Defaults to T.Compose([T.ToDtype(torch.float, scale=True), NaNtoNum({“nan”: 0, “posinf”: None, “neginf”: None})]) which normalizes the input data, converts it to float32, and replaces NaNs with 0. This can be used to add augmentations such as random erasing, blur, noise, etc. - val_raw_value_transforms: Transform to apply to the raw values for validation, similar to train_raw_value_transforms. Default is the same as train_raw_value_transforms. - target_value_transforms: Transform to apply to the target values. Default is T.Compose([T.ToDtype(torch.float), Binarize()]) which converts the input masks to float32 and threshold at 0 (turning object ID’s into binary masks for use with binary cross entropy loss). This can be used to specify other targets, such as distance transforms. - max_grad_norm: Maximum gradient norm for clipping. If None, no clipping is performed. Default is None. This can be useful to prevent exploding gradients which would lead to NaNs in the weights. - force_all_classes: Whether to force all classes to be present in each batch provided by dataloaders. Can either be True to force this for both validation and training dataloader, False to force for neither, or train / validate to restrict it to training or validation, respectively. Default is ‘validate’. - scheduler: PyTorch learning rate scheduler (or uninstantiated class) to use for training. Default is None. If provided, the scheduler will be called at the end of each epoch. - scheduler_kwargs: Dictionary of keyword arguments to pass to the scheduler constructor. Default is {}. If scheduler instantiation is provided, this will be ignored.

Return type:

None