cellmap_data.CellMapDataSplit#

class cellmap_data.CellMapDataSplit(input_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], classes: Sequence[str], empty_value: int | float = nan, pad: bool | str = False, datasets: Mapping[str, Sequence[CellMapDataset]] | None = None, dataset_dict: Mapping[str, Sequence[Mapping[str, str]]] | None = None, csv_path: str | None = None, spatial_transforms: Mapping[str, Any] | None = None, train_raw_value_transforms: Callable | None = None, val_raw_value_transforms: Callable | None = None, target_value_transforms: Callable | Sequence[Callable] | Mapping[str, Callable] | None = None, class_relation_dict: Mapping[str, Sequence[str]] | None = None, force_has_data: bool = False, context: Context | None = None, device: str | device | None = None)[source]#

Initializes the CellMapDatasets class.

Parameters:
  • input_arrays (dict[str, dict[str, Sequence[int | float]]]) –

    A dictionary containing the arrays of the dataset to input to the network. The dictionary should have the following structure:

    {
        "array_name": {
            "shape": tuple[int],
            "scale": Sequence[float],
        },
        ...
    }
    

  • target_arrays (dict[str, dict[str, Sequence[int | float]]]) – A dictionary containing the arrays of the dataset to use as targets for the network. The dictionary should have the same structure as input_arrays.

  • classes (Sequence[str]) – A list of classes for segmentation training. Class order will be preserved in the output arrays.

  • empty_value (int | float) – The value to use for empty data. Defaults to torch.nan.

  • pad (bool | str) – Whether to pad the data. If a string, it should be either “train” or “validate”. Defaults to False.

  • datasets (Optional[Mapping[str, Sequence[CellMapDataset]]]) –

    A dictionary containing the dataset objects. Defaults to None. The dictionary should have the following structure:

    {
        "train": Iterable[CellMapDataset],
        "validate": Iterable[CellMapDataset],
    }.
    

  • dataset_dict (Optional[Mapping[str, Sequence[Mapping[str, str]]]) –

    A dictionary containing the dataset data. Defaults to None. The dictionary should have the following structure:

    {
        "train" | "validate": [{
            "raw": str (path to raw data),
            "gt": str (path to ground truth data),
        }],
        ...
    }
    

  • csv_path (Optional[str]) –

    A path to a csv file containing the dataset data. Defaults to None. Each row in the csv file should have the following structure:”

    train | validate, raw path, gt path

  • spatial_transforms (Optional[Sequence[dict[str, Any]]]) –

    A sequence of dictionaries containing the spatial transformations to apply to the data. Defaults to None. The dictionary should have the following structure:

    {transform_name: {transform_args}}
    

  • train_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in training datasets. Defaults to None. Example is to add gaussian noise to the raw data.

  • val_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in validation datasets. Defaults to None. Example is to normalize the raw data.

  • target_value_transforms (Optional[Callable | Sequence[Callable] | Mapping[str, Callable]]) – A function to convert the ground truth data to target arrays. Defaults to None. Example is to convert the ground truth data to a signed distance transform. May be a single function, a list of functions, or a dictionary of functions for each class. In the case of a list of functions, it is assumed that the functions correspond to each class in the classes list in order.

  • class_relation_dict (Optional[Mapping[str, Sequence[str]]]) –

    A dictionary containing the class relations. The dictionary should have the following structure:

    {
        "class_name": [mutually_exclusive_class_name, ...],
        ...
    }
    

  • force_has_data (bool) – Whether to force the datasets to have data even if no ground truth data is found. Defaults to False. Useful for training with only raw data.

  • context (Optional[tensorstore.Context]) – The TensorStore context for the image data. Defaults to None.

  • device (Optional[str | torch.device]) – Device to use for the dataloaders. Defaults to None.

Note

The csv_path, dataset_dict, and datasets arguments are mutually exclusive, but one must be supplied.

__init__(input_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], classes: Sequence[str], empty_value: int | float = nan, pad: bool | str = False, datasets: Mapping[str, Sequence[CellMapDataset]] | None = None, dataset_dict: Mapping[str, Sequence[Mapping[str, str]]] | None = None, csv_path: str | None = None, spatial_transforms: Mapping[str, Any] | None = None, train_raw_value_transforms: Callable | None = None, val_raw_value_transforms: Callable | None = None, target_value_transforms: Callable | Sequence[Callable] | Mapping[str, Callable] | None = None, class_relation_dict: Mapping[str, Sequence[str]] | None = None, force_has_data: bool = False, context: Context | None = None, device: str | device | None = None) None[source]#

Initializes the CellMapDatasets class.

Parameters:
  • input_arrays (dict[str, dict[str, Sequence[int | float]]]) –

    A dictionary containing the arrays of the dataset to input to the network. The dictionary should have the following structure:

    {
        "array_name": {
            "shape": tuple[int],
            "scale": Sequence[float],
        },
        ...
    }
    

  • target_arrays (dict[str, dict[str, Sequence[int | float]]]) – A dictionary containing the arrays of the dataset to use as targets for the network. The dictionary should have the same structure as input_arrays.

  • classes (Sequence[str]) – A list of classes for segmentation training. Class order will be preserved in the output arrays.

  • empty_value (int | float) – The value to use for empty data. Defaults to torch.nan.

  • pad (bool | str) – Whether to pad the data. If a string, it should be either “train” or “validate”. Defaults to False.

  • datasets (Optional[Mapping[str, Sequence[CellMapDataset]]]) –

    A dictionary containing the dataset objects. Defaults to None. The dictionary should have the following structure:

    {
        "train": Iterable[CellMapDataset],
        "validate": Iterable[CellMapDataset],
    }.
    

  • dataset_dict (Optional[Mapping[str, Sequence[Mapping[str, str]]]) –

    A dictionary containing the dataset data. Defaults to None. The dictionary should have the following structure:

    {
        "train" | "validate": [{
            "raw": str (path to raw data),
            "gt": str (path to ground truth data),
        }],
        ...
    }
    

  • csv_path (Optional[str]) –

    A path to a csv file containing the dataset data. Defaults to None. Each row in the csv file should have the following structure:”

    train | validate, raw path, gt path

  • spatial_transforms (Optional[Sequence[dict[str, Any]]]) –

    A sequence of dictionaries containing the spatial transformations to apply to the data. Defaults to None. The dictionary should have the following structure:

    {transform_name: {transform_args}}
    

  • train_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in training datasets. Defaults to None. Example is to add gaussian noise to the raw data.

  • val_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in validation datasets. Defaults to None. Example is to normalize the raw data.

  • target_value_transforms (Optional[Callable | Sequence[Callable] | Mapping[str, Callable]]) – A function to convert the ground truth data to target arrays. Defaults to None. Example is to convert the ground truth data to a signed distance transform. May be a single function, a list of functions, or a dictionary of functions for each class. In the case of a list of functions, it is assumed that the functions correspond to each class in the classes list in order.

  • class_relation_dict (Optional[Mapping[str, Sequence[str]]]) –

    A dictionary containing the class relations. The dictionary should have the following structure:

    {
        "class_name": [mutually_exclusive_class_name, ...],
        ...
    }
    

  • force_has_data (bool) – Whether to force the datasets to have data even if no ground truth data is found. Defaults to False. Useful for training with only raw data.

  • context (Optional[tensorstore.Context]) – The TensorStore context for the image data. Defaults to None.

  • device (Optional[str | torch.device]) – Device to use for the dataloaders. Defaults to None.

Return type:

None

Note

The csv_path, dataset_dict, and datasets arguments are mutually exclusive, but one must be supplied.

Methods

__init__(input_arrays, target_arrays, classes)

Initializes the CellMapDatasets class.

construct(dataset_dict)

Constructs the datasets from the dataset dictionary.

from_csv(csv_path)

Loads the dataset_dict data from a csv file.

set_arrays(arrays[, type, usage])

Sets the input or target arrays for the training or validation datasets.

set_raw_value_transforms([train_transforms, ...])

Sets the raw value transforms for each dataset in the training/validation multi-datasets.

set_spatial_transforms([train_transforms, ...])

Sets the raw value transforms for each dataset in the training/validation multi-dataset.

set_target_value_transforms(transforms)

Sets the target value transforms for each dataset in the multi-datasets.

to(device)

Sets the device for the dataloaders.

verify_datasets()

Verifies that the datasets have data, and removes ones that don't from self.train_datasets and self.validation_datasets.

Attributes

class_counts

A dictionary containing the class counts for the training and validation datasets.

train_datasets_combined

A multi-dataset from the combination of all training datasets.

validation_blocks

A subset of the validation datasets, tiling the validation datasets with non-overlapping blocks.

validation_datasets_combined

A multi-dataset from the combination of all validation datasets.

property train_datasets_combined: CellMapMultiDataset#

A multi-dataset from the combination of all training datasets.

property validation_datasets_combined: CellMapMultiDataset#

A multi-dataset from the combination of all validation datasets.

property validation_blocks: CellMapSubset#

A subset of the validation datasets, tiling the validation datasets with non-overlapping blocks.

property class_counts: dict[str, dict[str, float]]#

A dictionary containing the class counts for the training and validation datasets.

from_csv(csv_path) dict[str, Sequence[dict[str, str]]][source]#

Loads the dataset_dict data from a csv file.

Return type:

dict[str, Sequence[dict[str, str]]]

construct(dataset_dict) None[source]#

Constructs the datasets from the dataset dictionary.

Return type:

None

verify_datasets() None[source]#

Verifies that the datasets have data, and removes ones that don’t from self.train_datasets and self.validation_datasets.

Return type:

None

set_raw_value_transforms(train_transforms: Callable | None = None, val_transforms: Callable | None = None) None[source]#

Sets the raw value transforms for each dataset in the training/validation multi-datasets.

Parameters:
  • train_transforms (Callable | None)

  • val_transforms (Callable | None)

Return type:

None

set_target_value_transforms(transforms: Callable) None[source]#

Sets the target value transforms for each dataset in the multi-datasets.

Parameters:

transforms (Callable)

Return type:

None

set_spatial_transforms(train_transforms: dict[str, Any] | None = None, val_transforms: dict[str, Any] | None = None) None[source]#

Sets the raw value transforms for each dataset in the training/validation multi-dataset.

Parameters:
  • train_transforms (dict[str, Any] | None)

  • val_transforms (dict[str, Any] | None)

Return type:

None

set_arrays(arrays: Mapping[str, Mapping[str, Sequence[int | float]]], type: str = 'target', usage: str = 'validate') None[source]#

Sets the input or target arrays for the training or validation datasets.

Parameters:
  • arrays (Mapping[str, Mapping[str, Sequence[int | float]]])

  • type (str)

  • usage (str)

Return type:

None

to(device: str | device) None[source]#

Sets the device for the dataloaders.

Parameters:

device (str | device)

Return type:

None