dacapo.experiments.datasplits

Submodules

Classes

DataSplit

A class for creating a simple train dataset and no validation dataset. It is derived from DataSplit class.

DataSplitConfig

A class used to create a DataSplit configuration object.

DummyDataSplit

A class for creating a simple train dataset and no validation dataset. It is derived from DataSplit class.

DummyDataSplitConfig

A simple class representing config for Dummy DataSplit.

TrainValidateDataSplit

A DataSplit that contains a list of training and validation datasets. This

TrainValidateDataSplitConfig

This is the standard Train/Validate DataSplit config. It contains a list of

DataSplitGenerator

Generates DataSplitConfig for a given task config and datasets.

DatasetSpec

A class for dataset specification. It is used to specify the dataset.

SimpleDataSplitConfig

A convention over configuration datasplit that can handle many of the most

Package Contents

class dacapo.experiments.datasplits.DataSplit

A class for creating a simple train dataset and no validation dataset. It is derived from DataSplit class. It is used to split the data into training and validation datasets. The training and validation datasets are used to train and validate the model respectively.

train

list The list containing training datasets. In this class, it contains only one dataset for training.

validate

list The list containing validation datasets. In this class, it is an empty list as no validation dataset is set.

__init__(self, datasplit_config)

The constructor for DummyDataSplit class. It initialises a list with training datasets according to the input configuration.

Notes

This class is used to split the data into training and validation datasets.

train: List[dacapo.experiments.datasplits.datasets.Dataset]
validate: List[dacapo.experiments.datasplits.datasets.Dataset] | None
class dacapo.experiments.datasplits.DataSplitConfig

A class used to create a DataSplit configuration object.

name

str A name for the datasplit. This name will be saved so it can be found and reused easily. It is recommended to keep it short and avoid special characters.

verify() Tuple[bool, str]

Validates if it is a valid data split configuration.

Notes

This class is used to create a DataSplit configuration object.

name: str
verify() Tuple[bool, str]

Validates if the current configuration is a valid data split configuration.

Returns:

Tuple[bool, str]

True if the configuration is valid, False otherwise along with respective validation error message.

Raises:

NotImplementedError – If the method is not implemented in the derived class.

Examples

>>> datasplit_config = DataSplitConfig(name="datasplit")
>>> datasplit_config.verify()
(True, "No validation for this DataSplit")

Notes

This method is used to validate the configuration of DataSplit.

class dacapo.experiments.datasplits.DummyDataSplit(datasplit_config)

A class for creating a simple train dataset and no validation dataset. It is derived from DataSplit class. It is used to split the data into training and validation datasets. The training and validation datasets are used to train and validate the model respectively.

train

list The list containing training datasets. In this class, it contains only one dataset for training.

validate

list The list containing validation datasets. In this class, it is an empty list as no validation dataset is set.

__init__(self, datasplit_config)

The constructor for DummyDataSplit class. It initialises a list with training datasets according to the input configuration.

Notes

This class is used to split the data into training and validation datasets.

train: List[dacapo.experiments.datasplits.datasets.Dataset]
validate: List[dacapo.experiments.datasplits.datasets.Dataset]
class dacapo.experiments.datasplits.DummyDataSplitConfig

A simple class representing config for Dummy DataSplit.

This class is derived from ‘DataSplitConfig’ and is initialized with ‘DatasetConfig’ for training dataset.

datasplit_type

Class of dummy data split functionality.

train_config

Config for the training dataset. Defaults to DummyDatasetConfig.

verify()

A method for verification. This method always return ‘False’ plus a string indicating the condition.

Notes

This class is used to represent the configuration for Dummy DataSplit.

datasplit_type
train_config: dacapo.experiments.datasplits.datasets.DatasetConfig
verify() Tuple[bool, str]

A method for verification. This method always return ‘False’ plus a string indicating the condition.

Returns:

A tuple contains a boolean ‘False’ and a string.

Return type:

Tuple[bool, str]

Examples

>>> dummy_datasplit_config = DummyDataSplitConfig(train_config)
>>> dummy_datasplit_config.verify()
(False, "This is a DummyDataSplit and is never valid")

Notes

This method is used to verify the configuration of DummyDataSplit.

class dacapo.experiments.datasplits.TrainValidateDataSplit(datasplit_config)

A DataSplit that contains a list of training and validation datasets. This class is used to split the data into training and validation datasets. The training and validation datasets are used to train and validate the model respectively.

train

list The list of training datasets.

validate

list The list of validation datasets.

__init__(datasplit_config)

Initializes the TrainValidateDataSplit class with specified config to split the data into training and validation datasets.

Notes

This class is used to split the data into training and validation datasets.

train: List[dacapo.experiments.datasplits.datasets.Dataset]
validate: List[dacapo.experiments.datasplits.datasets.Dataset]
class dacapo.experiments.datasplits.TrainValidateDataSplitConfig

This is the standard Train/Validate DataSplit config. It contains a list of training and validation datasets. This class is used to split the data into training and validation datasets. The training and validation datasets are used to train and validate the model respectively.

train_configs

list The list of training datasets.

validate_configs

list The list of validation datasets.

__init__(datasplit_config)

Initializes the TrainValidateDataSplitConfig class with specified config to split the data into training and validation datasets.

Notes

This class is used to split the data into training and validation datasets.

datasplit_type
train_configs: List[dacapo.experiments.datasplits.datasets.DatasetConfig]
validate_configs: List[dacapo.experiments.datasplits.datasets.DatasetConfig]
class dacapo.experiments.datasplits.DataSplitGenerator(name: str, datasets: List[DatasetSpec], input_resolution: Sequence[int] | funlib.geometry.Coordinate, output_resolution: Sequence[int] | funlib.geometry.Coordinate, targets: List[str] | None = None, segmentation_type: str | SegmentationType = 'semantic', max_gt_downsample=32, max_gt_upsample=4, max_raw_training_downsample=16, max_raw_training_upsample=2, max_raw_validation_downsample=8, max_raw_validation_upsample=2, min_training_volume_size=8000, raw_min=0, raw_max=255, classes_separator_character='&', use_negative_class=False, max_validation_volume_size=None, binarize_gt=False)

Generates DataSplitConfig for a given task config and datasets.

Class names in gt_dataset should be within [] e.g. [mito&peroxisome&er] for multiple classes or [mito] for one class.

Currently only supports:
  • semantic segmentation.

Supports:
  • 2D and 3D datasets.

  • Zarr, N5 and OME-Zarr datasets.

  • Multi class targets.

  • Different resolutions for raw and ground truth datasets.

  • Different resolutions for training and validation datasets.

name

str The name of the data split generator.

datasets

list The list of dataset specifications.

input_resolution

obj The input resolution.

output_resolution

obj The output resolution.

targets

list The list of targets.

segmentation_type

obj The segmentation type.

max_gt_downsample

int The maximum ground truth downsample.

max_gt_upsample

int The maximum ground truth upsample.

max_raw_training_downsample

int The maximum raw training downsample.

max_raw_training_upsample

int The maximum raw training upsample.

max_raw_validation_downsample

int The maximum raw validation downsample.

max_raw_validation_upsample

int The maximum raw validation upsample.

min_training_volume_size

int The minimum training volume size.

raw_min

int The minimum raw value.

raw_max

int The maximum raw value.

classes_separator_character

str The classes separator character.

max_validation_volume_size

int The maximum validation volume size. Default is None. If None, the validation volume size is not limited. else, the validation volume size is limited to the specified value. e.g. 600**3 for 600^3 voxels = 216_000_000 voxels.

__init__(name, datasets, input_resolution, output_resolution, targets, segmentation_type, max_gt_downsample, max_gt_upsample, max_raw_training_downsample, max_raw_training_upsample, max_raw_validation_downsample, max_raw_validation_upsample, min_training_volume_size, raw_min, raw_max, classes_separator_character)

Initializes the DataSplitGenerator class with the specified name, datasets, input resolution, output resolution, targets, segmentation type, maximum ground truth downsample, maximum ground truth upsample, maximum raw training downsample, maximum raw training upsample, maximum raw validation downsample, maximum raw validation upsample, minimum training volume size, minimum raw value, maximum raw value, and classes separator character.

__str__(self)

A method to get the string representation of the class.

class_name(self)

A method to get the class name.

check_class_name(self, class_name)

A method to check the class name.

compute(self)

A method to compute the data split.

__generate_semantic_seg_datasplit(self)

A method to generate the semantic segmentation data split.

__generate_semantic_seg_dataset_crop(self, dataset)

A method to generate the semantic segmentation dataset crop.

generate_csv(datasets, csv_path)

A method to generate the CSV file.

generate_from_csv(csv_path, input_resolution, output_resolution, name, **kwargs)

A method to generate the data split from the CSV file.

Notes

  • This class is used to generate the DataSplitConfig for a given task config and datasets.

  • Class names in gt_dataset shoulb be within [] e.g. [mito&peroxisome&er] for mutiple classes or [mito] for one class

name
datasets
input_resolution
output_resolution
targets = None
segmentation_type = 'semantic'
max_gt_downsample = 32
max_gt_upsample = 4
max_raw_training_downsample = 16
max_raw_training_upsample = 2
max_raw_validation_downsample = 8
max_raw_validation_upsample = 2
min_training_volume_size = 8000
raw_min = 0
raw_max = 255
classes_separator_character = '&'
use_negative_class = False
max_validation_volume_size = None
binarize_gt = False
property class_name

Get the class name.

Parameters:

self – obj The object.

Returns:

The class name.

Return type:

obj

Raises:
  • ValueError

  • If the class name is already set, a ValueError is raised.

Examples

>>> class_name

Notes

This function is used to get the class name.

check_class_name(class_name)

Check the class name.

Parameters:
  • self – obj The object.

  • class_name – obj The class name.

Returns:

The class name.

Return type:

obj

Raises:
  • ValueError

  • If the class name is already set, a ValueError is raised.

Examples

>>> check_class_name(class_name)

Notes

This function is used to check the class name.

compute()

Compute the data split.

Parameters:

self – obj The object.

Returns:

The data split.

Return type:

obj

Raises:
  • NotImplementedError

  • If the segmentation type is not implemented, a NotImplementedError is raised.

Examples

>>> compute()

Notes

This function is used to compute the data split.

static generate_from_csv(csv_path: upath.UPath, input_resolution: Sequence[int] | funlib.geometry.Coordinate, output_resolution: Sequence[int] | funlib.geometry.Coordinate, name: str | None = None, **kwargs)

Generate the data split from the CSV file.

Parameters:
  • csv_path – obj The CSV file path.

  • input_resolution – obj The input resolution.

  • output_resolution – obj The output resolution.

  • name – str The name.

  • **kwargs – dict The keyword arguments.

Returns:

The data split.

Return type:

obj

Raises:
  • FileNotFoundError

  • If the file does not exist, a FileNotFoundError is raised.

Examples

>>> generate_from_csv(csv_path, input_resolution, output_resolution, name, **kwargs)

Notes

This function is used to generate the data split from the CSV file.

class dacapo.experiments.datasplits.DatasetSpec(dataset_type: str | DatasetType, raw_container: str | upath.UPath, raw_dataset: str, gt_container: str | upath.UPath, gt_dataset: str)

A class for dataset specification. It is used to specify the dataset.

dataset_type

obj The dataset type.

raw_container

obj The raw container.

raw_dataset

str The raw dataset.

gt_container

obj The ground truth container.

gt_dataset

str The ground truth dataset.

__init__(dataset_type, raw_container, raw_dataset, gt_container, gt_dataset)

Initializes the DatasetSpec class with the specified dataset type, raw container, raw dataset, ground truth container, and ground truth dataset.

__str__(self)

A method to get the string representation of the class.

Notes

This class is used to specify the dataset.

dataset_type
raw_container
raw_dataset
gt_container
gt_dataset
class dacapo.experiments.datasplits.SimpleDataSplitConfig

A convention over configuration datasplit that can handle many of the most basic cases.

path: pathlib.Path
name: str
train_group_name: str
validate_group_name: str
raw_name: str
gt_name: str
mask_name: str
static datasplit_type(datasplit_config)
get_paths(group_name: str) list[pathlib.Path]
property train: list[dacapo.experiments.datasplits.datasets.simple.SimpleDataset]
property validate: list[dacapo.experiments.datasplits.datasets.simple.SimpleDataset]