cellmap_segmentation_challenge package#
Subpackages#
- cellmap_segmentation_challenge.cli package
- Submodules
- cellmap_segmentation_challenge.cli.datasplit module
- cellmap_segmentation_challenge.cli.evaluate module
- cellmap_segmentation_challenge.cli.fetch_data module
- cellmap_segmentation_challenge.cli.predict module
- cellmap_segmentation_challenge.cli.process module
- cellmap_segmentation_challenge.cli.speedtest module
- cellmap_segmentation_challenge.cli.train module
- cellmap_segmentation_challenge.cli.visualize module
- Module contents
- cellmap_segmentation_challenge.models package
- Submodules
- cellmap_segmentation_challenge.models.model_load module
- cellmap_segmentation_challenge.models.resnet module
- cellmap_segmentation_challenge.models.unet_model_2D module
- cellmap_segmentation_challenge.models.unet_model_3D module
- cellmap_segmentation_challenge.models.vitnet module
- Module contents
- cellmap_segmentation_challenge.utils package
- Submodules
- cellmap_segmentation_challenge.utils.crops module
- cellmap_segmentation_challenge.utils.dataloader module
- cellmap_segmentation_challenge.utils.datasplit module
- cellmap_segmentation_challenge.utils.fetch_data module
- cellmap_segmentation_challenge.utils.loss module
- cellmap_segmentation_challenge.utils.security module
- Module contents
Submodules#
cellmap_segmentation_challenge.config module#
cellmap_segmentation_challenge.evaluate module#
- cellmap_segmentation_challenge.evaluate.unzip_file(zip_path)[source]#
Unzip a zip file to a specified directory.
- Parameters:
zip_path (str) – The path to the zip file.
- Example usage:
unzip_file(‘submission.zip’)
- cellmap_segmentation_challenge.evaluate.save_numpy_class_labels_to_zarr(save_path, test_volume_name, label_name, labels, overwrite=False, attrs=None)[source]#
Save a single 3D numpy array of class labels to a Zarr-2 file with the required structure.
- Parameters:
save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (str) – The names of the labels.
labels (np.ndarray) – A 3D numpy array of class labels.
overwrite (bool) – Whether to overwrite the Zarr-2 file if it already exists.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.
- Example usage:
# Generate random class labels, with 0 as background labels = np.random.randint(0, 4, (128, 128, 128)) save_numpy_labels_to_zarr(‘submission.zarr’, ‘test_volume’, [‘label1’, ‘label2’, ‘label3’], labels)
- cellmap_segmentation_challenge.evaluate.save_numpy_class_arrays_to_zarr(save_path, test_volume_name, label_names, labels, mode='append', attrs=None)[source]#
Save a list of 3D numpy arrays of binary or instance labels to a Zarr-2 file with the required structure.
- Parameters:
save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (list) – A list of label names corresponding to the list of 3D numpy arrays.
labels (list) – A list of 3D numpy arrays of binary labels.
mode (str) – The mode to use when saving the Zarr-2 file. Options are ‘append’ or ‘overwrite’.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.
- Example usage:
label_names = [‘label1’, ‘label2’, ‘label3’] # Generate random binary volumes for each label labels = [np.random.randint(0, 2, (128, 128, 128)) for _ in range len(label_names)] save_numpy_binary_to_zarr(‘submission.zarr’, ‘test_volume’, label_names, labels)
- cellmap_segmentation_challenge.evaluate.resize_array(arr, target_shape, pad_value=0)[source]#
Resize an array to a target shape by padding or cropping as needed.
- Parameters:
arr (np.ndarray) – Input array to resize.
target_shape (tuple) – Desired shape for the output array.
pad_value (int, float, etc.) – Value to use for padding if the array is smaller than the target shape.
- Returns:
Resized array with the specified target shape.
- Return type:
np.ndarray
- cellmap_segmentation_challenge.evaluate.hausdorff_distance(image0, image1, voxel_size, max_distance=inf, method='standard')[source]#
Calculate the Hausdorff distance between nonzero elements of given images.
Modified from the scipy.spatial.distance.hausdorff function by Jeff Rhoades (2024) to handle non-isotropic resolutions.
- Parameters:
image0 (ndarray) – Arrays where
True
represents a point that is included in a set of points. Both arrays must have the same shape.image1 (ndarray) – Arrays where
True
represents a point that is included in a set of points. Both arrays must have the same shape.voxel_size (tuple) – The size of a voxel in each dimension. Assumes the same resolution for both images.
max_distance (float, optional) – The maximum distance to consider. Default is infinity.
method ({'standard', 'modified'}, optional, default = 'standard') – The method to use for calculating the Hausdorff distance.
standard
is the standard Hausdorff distance, whilemodified
is the modified Hausdorff distance.
- Returns:
distance – The Hausdorff distance between coordinates of nonzero pixels in
image0
andimage1
, using the Euclidean distance.- Return type:
float
Notes
The Hausdorff distance [1] is the maximum distance between any point on
image0
and its nearest point onimage1
, and vice-versa. The Modified Hausdorff Distance (MHD) has been shown to perform better than the directed Hausdorff Distance (HD) in the following work by Dubuisson et al. [2]. The function calculates forward and backward mean distances and returns the largest of the two.References
- cellmap_segmentation_challenge.evaluate.score_instance(pred_label, truth_label, voxel_size, hausdorff_distance_max=inf) dict[str, float] [source]#
Score a single instance label volume against the ground truth instance label volume.
- Parameters:
pred_label (np.ndarray) – The predicted instance label volume.
truth_label (np.ndarray) – The ground truth instance label volume.
voxel_size (tuple) – The size of a voxel in each dimension.
hausdorff_distance_max (float) – The maximum distance to consider for the Hausdorff distance.
- Returns:
A dictionary of scores for the instance label volume.
- Return type:
dict
- Example usage:
scores = score_instance(pred_label, truth_label)
- cellmap_segmentation_challenge.evaluate.score_semantic(pred_label, truth_label) dict[str, float] [source]#
Score a single semantic label volume against the ground truth semantic label volume.
- Parameters:
pred_label (np.ndarray) – The predicted semantic label volume.
truth_label (np.ndarray) – The ground truth semantic label volume.
- Returns:
A dictionary of scores for the semantic label volume.
- Return type:
dict
- Example usage:
scores = score_semantic(pred_label, truth_label)
- cellmap_segmentation_challenge.evaluate.score_label(pred_label_path, truth_path='/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/ground_truth.zarr', instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance']) dict[str, float] [source]#
Score a single label volume against the ground truth label volume.
- Parameters:
pred_label_path (str) – The path to the predicted label volume.
- Returns:
A dictionary of scores for the label volume.
- Return type:
dict
- Example usage:
scores = score_label(‘pred.zarr/test_volume/label1’)
- cellmap_segmentation_challenge.evaluate.score_volume(pred_volume_path, truth_path='/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/ground_truth.zarr', instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance']) dict[str, dict[str, float]] [source]#
Score a single volume against the ground truth volume.
- Parameters:
pred_volume_path (str) – The path to the predicted volume.
- Returns:
A dictionary of scores for the volume.
- Return type:
dict
- Example usage:
scores = score_volume(‘pred.zarr/test_volume’)
- cellmap_segmentation_challenge.evaluate.missing_volume_score(truth_volume_path, instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance']) dict[str, dict[str, float]] [source]#
Score a missing volume as 0’s, congruent with the score_volume function.
- Parameters:
truth_volume_path (str) – The path to the ground truth volume.
- Returns:
A dictionary of scores for the volume.
- Return type:
dict
- Example usage:
scores = missing_volume_score(‘truth.zarr/test_volume’)
- cellmap_segmentation_challenge.evaluate.combine_scores(scores, include_missing=True, instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance'])[source]#
Combine scores across volumes, normalizing by the number of voxels.
- Parameters:
scores (dict) – A dictionary of scores for each volume, as returned by score_volume.
include_missing (bool) – Whether to include missing volumes in the combined scores.
instance_classes (list) – A list of instance classes.
- Returns:
A dictionary of combined scores across all volumes.
- Return type:
dict
- Example usage:
combined_scores = combine_scores(scores)
- cellmap_segmentation_challenge.evaluate.score_submission(submission_path='/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/submission.zip', result_file=None, truth_path='/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/ground_truth.zarr', instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance'])[source]#
Score a submission against the ground truth data.
- Parameters:
submission_path (str) – The path to the zipped submission Zarr-2 file.
result_file (str) – The path to save the scores.
- Returns:
A dictionary of scores for the submission.
- Return type:
dict
- Example usage:
scores = score_submission(‘submission.zip’)
The results json is a dictionary with the following structure: {
- “volume” (the name of the ground truth volume): {
- “label” (the name of the predicted class): {
- (For semantic segmentation)
“iou”: (the intersection over union score), “dice_score”: (the dice score),
OR
- (For instance segmentation)
“accuracy”: (the accuracy score), “haussdorf_distance”: (the haussdorf distance), “normalized_haussdorf_distance”: (the normalized haussdorf distance), “combined_score”: (the geometric mean of the accuracy and normalized haussdorf distance),
} “num_voxels”: (the number of voxels in the ground truth volume),
} “label_scores”: {
- (the name of the predicted class): {
- (For semantic segmentation)
“iou”: (the mean intersection over union score), “dice_score”: (the mean dice score),
OR
- (For instance segmentation)
“accuracy”: (the mean accuracy score), “haussdorf_distance”: (the mean haussdorf distance), “combined_score”: (the mean geometric mean of the accuracy and haussdorf distance),
}
“overall_score”: (the mean of the combined scores across all classes),
}
- cellmap_segmentation_challenge.evaluate.package_submission(input_search_path: str | UPath = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/processed/{dataset}.zarr/{crop}', output_path: str | UPath = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/submission.zarr', overwrite: bool = False)[source]#
Package a submission for the CellMap challenge. This will create a zarr file, combining all the processed volumes, and then zip it.
- Parameters:
input_search_path (str) – The base path to the processed volumes, with placeholders for dataset and crops.
output_path (str | UPath) – The path to save the submission zarr to. (ending with <filename>.zarr; .zarr will be appended if not present, and replaced with .zip when zipped).
overwrite (bool) – Whether to overwrite the submission zarr if it already exists.
- cellmap_segmentation_challenge.evaluate.match_crop_space(path, class_label, voxel_size, shape, translation) ndarray [source]#
Match the resolution of a zarr array to a target resolution and shape, resampling as necessary with interpolation dependent on the class label. Instance segmentations will be resampled with nearest neighbor interpolation, while semantic segmentations will be resampled with linear interpolation and then thresholded.
- Parameters:
path (str | UPath) – The path to the zarr array to be adjusted. The zarr can be an OME-NGFF multiscale zarr file, or a traditional single scale formatted zarr.
class_label (str) – The class label of the array.
voxel_size (tuple) – The target voxel size.
shape (tuple) – The target shape.
translation (tuple) – The translation (i.e. offset) of the array in world units.
- Returns:
The rescaled array.
- Return type:
np.ndarray
- cellmap_segmentation_challenge.evaluate.zip_submission(zarr_path: str | UPath = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/submission.zarr')[source]#
(Re-)Zip a submission zarr file.
- Parameters:
zarr_path (str | UPath) – The path to the submission zarr file (ending with <filename>.zarr). .zarr will be replaced with .zip.
cellmap_segmentation_challenge.predict module#
- cellmap_segmentation_challenge.predict.predict_orthoplanes(model: Module, dataset_writer_kwargs: dict[str, Any], batch_size: int)[source]#
- Parameters:
model (Module)
dataset_writer_kwargs (dict[str, Any])
batch_size (int)
- cellmap_segmentation_challenge.predict.predict(config_path: str, crops: str = 'test', output_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/predictions/{dataset}.zarr/{crop}', do_orthoplanes: bool = True, overwrite: bool = False, search_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}')[source]#
Given a model configuration file and list of crop numbers, predicts the output of a model on a large dataset by splitting it into blocks and predicting each block separately.
- Parameters:
config_path (str) – The path to the model configuration file. This can be the same as the config file used for training.
crops (str, optional) – A comma-separated list of crop numbers to predict on, or “test” to predict on the entire test set. Default is “test”.
output_path (str, optional) – The path to save the output predictions to, formatted as a string with a placeholders for the dataset, crop number, and label. Default is PREDICTIONS_PATH set in cellmap-segmentation/config.py.
do_orthoplanes (bool, optional) – Whether to compute the average of predictions from x, y, and z orthogonal planes for the full 3D volume. This is sometimes called 2.5D predictions. It expects a model that yields 2D outputs. Similarly, it expects the input shape to the model to be 2D. Default is True for 2D models.
overwrite (bool, optional) – Whether to overwrite the output dataset if it already exists. Default is False.
search_path (str, optional) – The path to search for the raw dataset, with placeholders for dataset and name. Default is SEARCH_PATH set in cellmap-segmentation/config.py.
raw_name (str, optional) – The name of the raw dataset. Default is RAW_NAME set in cellmap-segmentation/config.py.
crop_name (str, optional) – The name of the crop dataset with placeholders for crop and label. Default is CROP_NAME set in cellmap-segmentation/config.py.
cellmap_segmentation_challenge.process module#
- cellmap_segmentation_challenge.process.process(config_path: str | UPath, crops: str = 'test', input_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/predictions/{dataset}.zarr/{crop}', output_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/processed/{dataset}.zarr/{crop}', overwrite: bool = False, device: str | device | None = None) None [source]#
Process and save arrays using an arbitrary process function defined in a config python file.
- Parameters:
config_path (str | UPath) – The path to the python file containing the process function and other configurations. The script should specify the process function as process_func; input_array_info and target_array_info corresponding to the chunk sizes and scales for the input and output datasets, respectively; batch_size; classes; and any other required configurations. The process function should take an array as input and return an array as output.
crops (str, optional) – A comma-separated list of crop numbers to process, or “test” to process the entire test set. Default is “test”.
input_path (str, optional) – The path to the data to process, formatted as a string with a placeholders for the crop number, dataset, and label. Default is PREDICTIONS_PATH set in cellmap-segmentation/config.py.
output_path (str, optional) – The path to save the processed output to, formatted as a string with a placeholders for the crop number, dataset, and label. Default is PROCESSED_PATH set in cellmap-segmentation/config.py.
overwrite (bool, optional) – Whether to overwrite the output dataset if it already exists. Default is False.
device (str | torch.device, optional) – The device to use for processing the data. Default is to use that specified in the config. If not specified, then defaults to “cuda” if available, then “mps”, otherwise “cpu”.
- Return type:
None
cellmap_segmentation_challenge.train module#
- cellmap_segmentation_challenge.train.train(config_path: str)[source]#
Train a model using the configuration file at the specified path. The model checkpoints and training logs, as well as the datasets used for training, will be saved to the paths specified in the configuration file.
- Parameters:
config_path (str) – Path to the configuration file to use for training the model. This file should be a Python file that defines the hyperparameters and other configurations for training the model. This may include: - model_save_path: Path to save the model checkpoints. Default is ‘checkpoints/{model_name}_{epoch}.pth’. - logs_save_path: Path to save the logs for tensorboard. Default is ‘tensorboard/{model_name}’. Training progress may be monitored by running tensorboard –logdir <logs_save_path> in the terminal. - datasplit_path: Path to the datasplit file that defines the train/val split the dataloader should use. Default is ‘datasplit.csv’. - validation_prob: Proportion of the datasets to use for validation. This is used if the datasplit CSV specified by datasplit_path does not already exist. Default is 0.15. - learning_rate: Learning rate for the optimizer. Default is 0.0001. - batch_size: Batch size for the dataloader. Default is 8. - input_array_info: Dictionary containing the shape and scale of the input data. Default is {‘shape’: (1, 128, 128), ‘scale’: (8, 8, 8)}. - target_array_info: Dictionary containing the shape and scale of the target data. Default is to use input_array_info. - epochs: Number of epochs to train the model for. Default is 1000. - iterations_per_epoch: Number of iterations per epoch. Each iteration includes an independently generated random batch from the training set. Default is 1000. - random_seed: Random seed for reproducibility. Default is 42. - classes: List of classes to train the model to predict. This will be reflected in the data included in the datasplit, if generated de novo after calling this script. Default is [‘nuc’, ‘er’]. - model_name: Name of the model to use. If the config file constructs the PyTorch model, this name can be anything. If the config file does not construct the PyTorch model, the model_name will need to specify which included architecture to use. This includes ‘2d_unet’, ‘2d_resnet’, ‘3d_unet’, ‘3d_resnet’, and ‘vitnet’. Default is ‘2d_unet’. See the models module README.md for more information. - model_to_load: Name of the pre-trained model to load. Default is the same as model_name. - model_kwargs: Dictionary of keyword arguments to pass to the model constructor. Default is {}. If the PyTorch model is passed, this will be ignored. See the models module README.md for more information. - model: PyTorch model to use for training. If this is provided, the model_name and model_to_load can be any string. Default is None. - load_model: Which model checkpoint to load if it exists. Options are ‘latest’ or ‘best’. If no checkpoints exist, will silently use the already initialized model. Default is ‘latest’. - spatial_transforms: Dictionary of spatial transformations to apply to the training data. Default is {‘mirror’: {‘axes’: {‘x’: 0.5, ‘y’: 0.5}}, ‘transpose’: {‘axes’: [‘x’, ‘y’]}, ‘rotate’: {‘axes’: {‘x’: [-180, 180], ‘y’: [-180, 180]}}}. See the dataloader module documentation for more information. - validation_time_limit: Maximum time to spend on validation in seconds. If None, there is no time limit. Default is None. - validation_batch_limit: Maximum number of validation batches to process. If None, there is no limit. Default is None. - device: Device to use for training. If None, will use ‘cuda’ if available, ‘mps’ if available, or ‘cpu’ otherwise. Default is None. - use_s3: Whether to use the S3 bucket for the datasplit. Default is False. - optimizer: PyTorch optimizer to use for training. Default is torch.optim.RAdam(model.parameters(), lr=learning_rate). - criterion: PyTorch loss function to use for training. Default is torch.nn.BCEWithLogitsLoss.
- Return type:
None
cellmap_segmentation_challenge.visualize module#
- cellmap_segmentation_challenge.visualize.visualize(datasets: str | Sequence[str] = '*', crops: int | list = ['*'], classes: str | Sequence[str] = '*', kinds: Sequence[str] = ['gt', 'predictions', 'processed', 'submission'])[source]#
Visualize datasets and crops in Neuroglancer.
- Parameters:
datasets (str | Sequence[str], optional) – The name of the dataset to visualize. Can be a string or a list of strings. Default is “*”. If “*”, all datasets will be visualized.
crops (int | Sequence[int], optional) – The crop number(s) to visualize. Can be an integer or a list of integers, or None. Default is None. If None, all crops will be visualized.
classes (str | Sequence[str], optional) – The class to visualize. Can be a string or a list of strings. Default is “*”. If “*”, all classes will be visualized.
kinds (Sequence[str], optional) – The type of layers to visualize. Can be “gt” for groundtruth, “predictions” for predictions, or “processed” for processed data. Default is [“gt”, “predictions”, “processed”, “submission”].
- cellmap_segmentation_challenge.visualize.add_layers(viewer: Viewer, kind: str, dataset_name: str, crops: Sequence, classes: Sequence[str]) Viewer | None [source]#
Add layers to a Neuroglancer viewer.
- Parameters:
viewer (neuroglancer.Viewer) – The viewer to add layers to.
kind (str) – The type of layers to add. Can be “gt” for groundtruth, “predictions” for predictions, or “processed” for processed data.
dataset_name (str) – The name of the dataset to add layers for.
crops (Sequence) – The crops to add layers for.
classes (Sequence[str]) – The class(es) to add layers for.
- Return type:
Viewer | None
- cellmap_segmentation_challenge.visualize.get_layer(data_path: str, layer_type: str = 'image', multiscale: bool = True) Layer [source]#
Get a Neuroglancer layer from a zarr data path for a LocalVolume.
- Parameters:
data_path (str) – The path to the zarr data.
layer_type (str) – The type of layer to get. Can be “image” or “segmentation”. Default is “image”.
multiscale (bool) – Whether the metadata is OME-NGFF multiscale. Default is True.
- Returns:
The Neuroglancer layer.
- Return type:
neuroglancer.Layer
Module contents#
- cellmap_segmentation_challenge.package_submission(input_search_path: str | UPath = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/processed/{dataset}.zarr/{crop}', output_path: str | UPath = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/submission.zarr', overwrite: bool = False)[source]#
Package a submission for the CellMap challenge. This will create a zarr file, combining all the processed volumes, and then zip it.
- Parameters:
input_search_path (str) – The base path to the processed volumes, with placeholders for dataset and crops.
output_path (str | UPath) – The path to save the submission zarr to. (ending with <filename>.zarr; .zarr will be appended if not present, and replaced with .zip when zipped).
overwrite (bool) – Whether to overwrite the submission zarr if it already exists.
- cellmap_segmentation_challenge.predict(config_path: str, crops: str = 'test', output_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/predictions/{dataset}.zarr/{crop}', do_orthoplanes: bool = True, overwrite: bool = False, search_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/{dataset}/{dataset}.zarr/recon-1/{name}', raw_name: str = 'em/fibsem-uint8', crop_name: str = 'labels/groundtruth/{crop}/{label}')[source]#
Given a model configuration file and list of crop numbers, predicts the output of a model on a large dataset by splitting it into blocks and predicting each block separately.
- Parameters:
config_path (str) – The path to the model configuration file. This can be the same as the config file used for training.
crops (str, optional) – A comma-separated list of crop numbers to predict on, or “test” to predict on the entire test set. Default is “test”.
output_path (str, optional) – The path to save the output predictions to, formatted as a string with a placeholders for the dataset, crop number, and label. Default is PREDICTIONS_PATH set in cellmap-segmentation/config.py.
do_orthoplanes (bool, optional) – Whether to compute the average of predictions from x, y, and z orthogonal planes for the full 3D volume. This is sometimes called 2.5D predictions. It expects a model that yields 2D outputs. Similarly, it expects the input shape to the model to be 2D. Default is True for 2D models.
overwrite (bool, optional) – Whether to overwrite the output dataset if it already exists. Default is False.
search_path (str, optional) – The path to search for the raw dataset, with placeholders for dataset and name. Default is SEARCH_PATH set in cellmap-segmentation/config.py.
raw_name (str, optional) – The name of the raw dataset. Default is RAW_NAME set in cellmap-segmentation/config.py.
crop_name (str, optional) – The name of the crop dataset with placeholders for crop and label. Default is CROP_NAME set in cellmap-segmentation/config.py.
- cellmap_segmentation_challenge.process(config_path: str | UPath, crops: str = 'test', input_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/predictions/{dataset}.zarr/{crop}', output_path: str = '/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/processed/{dataset}.zarr/{crop}', overwrite: bool = False, device: str | device | None = None) None [source]#
Process and save arrays using an arbitrary process function defined in a config python file.
- Parameters:
config_path (str | UPath) – The path to the python file containing the process function and other configurations. The script should specify the process function as process_func; input_array_info and target_array_info corresponding to the chunk sizes and scales for the input and output datasets, respectively; batch_size; classes; and any other required configurations. The process function should take an array as input and return an array as output.
crops (str, optional) – A comma-separated list of crop numbers to process, or “test” to process the entire test set. Default is “test”.
input_path (str, optional) – The path to the data to process, formatted as a string with a placeholders for the crop number, dataset, and label. Default is PREDICTIONS_PATH set in cellmap-segmentation/config.py.
output_path (str, optional) – The path to save the processed output to, formatted as a string with a placeholders for the crop number, dataset, and label. Default is PROCESSED_PATH set in cellmap-segmentation/config.py.
overwrite (bool, optional) – Whether to overwrite the output dataset if it already exists. Default is False.
device (str | torch.device, optional) – The device to use for processing the data. Default is to use that specified in the config. If not specified, then defaults to “cuda” if available, then “mps”, otherwise “cpu”.
- Return type:
None
- cellmap_segmentation_challenge.save_numpy_class_arrays_to_zarr(save_path, test_volume_name, label_names, labels, mode='append', attrs=None)[source]#
Save a list of 3D numpy arrays of binary or instance labels to a Zarr-2 file with the required structure.
- Parameters:
save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (list) – A list of label names corresponding to the list of 3D numpy arrays.
labels (list) – A list of 3D numpy arrays of binary labels.
mode (str) – The mode to use when saving the Zarr-2 file. Options are ‘append’ or ‘overwrite’.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.
- Example usage:
label_names = [‘label1’, ‘label2’, ‘label3’] # Generate random binary volumes for each label labels = [np.random.randint(0, 2, (128, 128, 128)) for _ in range len(label_names)] save_numpy_binary_to_zarr(‘submission.zarr’, ‘test_volume’, label_names, labels)
- cellmap_segmentation_challenge.save_numpy_class_labels_to_zarr(save_path, test_volume_name, label_name, labels, overwrite=False, attrs=None)[source]#
Save a single 3D numpy array of class labels to a Zarr-2 file with the required structure.
- Parameters:
save_path (str) – The path to save the Zarr-2 file (ending with <filename>.zarr).
test_volume_name (str) – The name of the test volume.
label_names (str) – The names of the labels.
labels (np.ndarray) – A 3D numpy array of class labels.
overwrite (bool) – Whether to overwrite the Zarr-2 file if it already exists.
attrs (dict) – A dictionary of attributes to save with the Zarr-2 file.
- Example usage:
# Generate random class labels, with 0 as background labels = np.random.randint(0, 4, (128, 128, 128)) save_numpy_labels_to_zarr(‘submission.zarr’, ‘test_volume’, [‘label1’, ‘label2’, ‘label3’], labels)
- cellmap_segmentation_challenge.score_submission(submission_path='/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/submission.zip', result_file=None, truth_path='/opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/data/ground_truth.zarr', instance_classes=['nuc', 'vim', 'ves', 'endo', 'lyso', 'ld', 'perox', 'mito', 'np', 'mt', 'cell', 'instance'])[source]#
Score a submission against the ground truth data.
- Parameters:
submission_path (str) – The path to the zipped submission Zarr-2 file.
result_file (str) – The path to save the scores.
- Returns:
A dictionary of scores for the submission.
- Return type:
dict
- Example usage:
scores = score_submission(‘submission.zip’)
The results json is a dictionary with the following structure: {
- “volume” (the name of the ground truth volume): {
- “label” (the name of the predicted class): {
- (For semantic segmentation)
“iou”: (the intersection over union score), “dice_score”: (the dice score),
OR
- (For instance segmentation)
“accuracy”: (the accuracy score), “haussdorf_distance”: (the haussdorf distance), “normalized_haussdorf_distance”: (the normalized haussdorf distance), “combined_score”: (the geometric mean of the accuracy and normalized haussdorf distance),
} “num_voxels”: (the number of voxels in the ground truth volume),
} “label_scores”: {
- (the name of the predicted class): {
- (For semantic segmentation)
“iou”: (the mean intersection over union score), “dice_score”: (the mean dice score),
OR
- (For instance segmentation)
“accuracy”: (the mean accuracy score), “haussdorf_distance”: (the mean haussdorf distance), “combined_score”: (the mean geometric mean of the accuracy and haussdorf distance),
}
“overall_score”: (the mean of the combined scores across all classes),
}
- cellmap_segmentation_challenge.train(config_path: str)[source]#
Train a model using the configuration file at the specified path. The model checkpoints and training logs, as well as the datasets used for training, will be saved to the paths specified in the configuration file.
- Parameters:
config_path (str) – Path to the configuration file to use for training the model. This file should be a Python file that defines the hyperparameters and other configurations for training the model. This may include: - model_save_path: Path to save the model checkpoints. Default is ‘checkpoints/{model_name}_{epoch}.pth’. - logs_save_path: Path to save the logs for tensorboard. Default is ‘tensorboard/{model_name}’. Training progress may be monitored by running tensorboard –logdir <logs_save_path> in the terminal. - datasplit_path: Path to the datasplit file that defines the train/val split the dataloader should use. Default is ‘datasplit.csv’. - validation_prob: Proportion of the datasets to use for validation. This is used if the datasplit CSV specified by datasplit_path does not already exist. Default is 0.15. - learning_rate: Learning rate for the optimizer. Default is 0.0001. - batch_size: Batch size for the dataloader. Default is 8. - input_array_info: Dictionary containing the shape and scale of the input data. Default is {‘shape’: (1, 128, 128), ‘scale’: (8, 8, 8)}. - target_array_info: Dictionary containing the shape and scale of the target data. Default is to use input_array_info. - epochs: Number of epochs to train the model for. Default is 1000. - iterations_per_epoch: Number of iterations per epoch. Each iteration includes an independently generated random batch from the training set. Default is 1000. - random_seed: Random seed for reproducibility. Default is 42. - classes: List of classes to train the model to predict. This will be reflected in the data included in the datasplit, if generated de novo after calling this script. Default is [‘nuc’, ‘er’]. - model_name: Name of the model to use. If the config file constructs the PyTorch model, this name can be anything. If the config file does not construct the PyTorch model, the model_name will need to specify which included architecture to use. This includes ‘2d_unet’, ‘2d_resnet’, ‘3d_unet’, ‘3d_resnet’, and ‘vitnet’. Default is ‘2d_unet’. See the models module README.md for more information. - model_to_load: Name of the pre-trained model to load. Default is the same as model_name. - model_kwargs: Dictionary of keyword arguments to pass to the model constructor. Default is {}. If the PyTorch model is passed, this will be ignored. See the models module README.md for more information. - model: PyTorch model to use for training. If this is provided, the model_name and model_to_load can be any string. Default is None. - load_model: Which model checkpoint to load if it exists. Options are ‘latest’ or ‘best’. If no checkpoints exist, will silently use the already initialized model. Default is ‘latest’. - spatial_transforms: Dictionary of spatial transformations to apply to the training data. Default is {‘mirror’: {‘axes’: {‘x’: 0.5, ‘y’: 0.5}}, ‘transpose’: {‘axes’: [‘x’, ‘y’]}, ‘rotate’: {‘axes’: {‘x’: [-180, 180], ‘y’: [-180, 180]}}}. See the dataloader module documentation for more information. - validation_time_limit: Maximum time to spend on validation in seconds. If None, there is no time limit. Default is None. - validation_batch_limit: Maximum number of validation batches to process. If None, there is no limit. Default is None. - device: Device to use for training. If None, will use ‘cuda’ if available, ‘mps’ if available, or ‘cpu’ otherwise. Default is None. - use_s3: Whether to use the S3 bucket for the datasplit. Default is False. - optimizer: PyTorch optimizer to use for training. Default is torch.optim.RAdam(model.parameters(), lr=learning_rate). - criterion: PyTorch loss function to use for training. Default is torch.nn.BCEWithLogitsLoss.
- Return type:
None
- cellmap_segmentation_challenge.visualize(datasets: str | Sequence[str] = '*', crops: int | list = ['*'], classes: str | Sequence[str] = '*', kinds: Sequence[str] = ['gt', 'predictions', 'processed', 'submission'])[source]#
Visualize datasets and crops in Neuroglancer.
- Parameters:
datasets (str | Sequence[str], optional) – The name of the dataset to visualize. Can be a string or a list of strings. Default is “*”. If “*”, all datasets will be visualized.
crops (int | Sequence[int], optional) – The crop number(s) to visualize. Can be an integer or a list of integers, or None. Default is None. If None, all crops will be visualized.
classes (str | Sequence[str], optional) – The class to visualize. Can be a string or a list of strings. Default is “*”. If “*”, all classes will be visualized.
kinds (Sequence[str], optional) – The type of layers to visualize. Can be “gt” for groundtruth, “predictions” for predictions, or “processed” for processed data. Default is [“gt”, “predictions”, “processed”, “submission”].