cellmap_data.datasplit#

Classes

CellMapDataSplit(input_arrays, ...)

Initializes the CellMapDatasets class.

class cellmap_data.datasplit.CellMapDataSplit(input_arrays: ~typing.Mapping[str, ~typing.Mapping[str, ~typing.Sequence[int | float]]], target_arrays: ~typing.Mapping[str, ~typing.Mapping[str, ~typing.Sequence[int | float]]] | None = None, classes: ~typing.Sequence[str] | None = None, empty_value: int | float = nan, pad: bool | str = False, datasets: ~typing.Mapping[str, ~typing.Sequence[~cellmap_data.dataset.CellMapDataset]] | None = None, dataset_dict: ~typing.Mapping[str, ~typing.Sequence[~typing.Mapping[str, str]]] | None = None, csv_path: str | None = None, spatial_transforms: ~typing.Mapping[str, ~typing.Any] | None = None, train_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( Normalize ToDtype(scale=True) NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), val_raw_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( Normalize ToDtype(scale=True) NaNtoNum(params={'nan': 0, 'posinf': None, 'neginf': None}) ), target_value_transforms: ~torchvision.transforms.v2._transform.Transform | None = Compose( ToDtype(scale=False) Binarize(threshold=0) ), class_relation_dict: ~typing.Mapping[str, ~typing.Sequence[str]] | None = None, force_has_data: bool = False, context: ~tensorstore.Context | None = None, device: str | ~torch.device | None = None)[source]#

Initializes the CellMapDatasets class.

Parameters:

input_arrays (dict[str, dict[str, Sequence[int | float]]]) –
A dictionary containing the arrays of the dataset to input to the network. The dictionary should have the following structure:
```
{
    "array_name": {
        "shape": tuple[int],
        "scale": Sequence[float],
    },
    ...
}
```
target_arrays (dict[str, dict[str, Sequence[int | float]]]) – A dictionary containing the arrays of the dataset to use as targets for the network. The dictionary should have the same structure as input_arrays.
classes (Sequence[str]) – A list of classes for segmentation training. Class order will be preserved in the output arrays.
empty_value (int | float) – The value to use for empty data. Defaults to torch.nan.
pad (bool | str) – Whether to pad the data. If a string, it should be either “train” or “validate”. Defaults to False.
datasets (Optional[Mapping[str, Sequence[CellMapDataset]]]) –
A dictionary containing the dataset objects. Defaults to None. The dictionary should have the following structure:
```
{
    "train": Iterable[CellMapDataset],
    "validate": Iterable[CellMapDataset],
}.
```

dataset_dict (Optional[Mapping[str, Sequence[Mapping[str, str]]]) –

A dictionary containing the dataset data. Defaults to None. The dictionary should have the following structure:

{
    "train" | "validate": [{
        "raw": str (path to raw data),
        "gt": str (path to ground truth data),
    }],
    ...
}

csv_path (Optional[str]) –
A path to a csv file containing the dataset data. Defaults to None. Each row in the csv file should have the following structure:”

train | validate, raw path, gt path
spatial_transforms (Optional[Sequence[dict[str, Any]]]) –
A sequence of dictionaries containing the spatial transformations to apply to the data. Defaults to None. The dictionary should have the following structure:
```
{transform_name: {transform_args}}
```
train_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in training datasets. Defaults to None. Example is to add gaussian noise to the raw data.
val_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in validation datasets. Defaults to None. Example is to normalize the raw data.
target_value_transforms (Optional[Callable | Sequence[Callable] | Mapping[str, Callable]]) – A function to convert the ground truth data to target arrays. Defaults to None. Example is to convert the ground truth data to a signed distance transform. May be a single function, a list of functions, or a dictionary of functions for each class. In the case of a list of functions, it is assumed that the functions correspond to each class in the classes list in order.
class_relation_dict (Optional[Mapping[str, Sequence[str]]]) –
A dictionary containing the class relations. The dictionary should have the following structure:
```
{
    "class_name": [mutually_exclusive_class_name, ...],
    ...
}
```
force_has_data (bool) – Whether to force the datasets to have data even if no ground truth data is found. Defaults to False. Useful for training with only raw data.
context (Optional[tensorstore.Context]) – The TensorStore context for the image data. Defaults to None.
device (Optional[str | torch.device]) – Device to use for the dataloaders. Defaults to None.

Note

The csv_path, dataset_dict, and datasets arguments are mutually exclusive, but one must be supplied.

property train_datasets_combined: CellMapMultiDataset#: A multi-dataset from the combination of all training datasets.

property validation_datasets_combined: CellMapMultiDataset#: A multi-dataset from the combination of all validation datasets.

property validation_blocks: CellMapSubset#: A subset of the validation datasets, tiling the validation datasets with non-overlapping blocks.

property class_counts: dict[str, dict[str, float]]#: A dictionary containing the class counts for the training and validation datasets.

from_csv(csv_path) → dict[str, Sequence[dict[str, str]]][source]#

Loads the dataset_dict data from a csv file.

Return type:: dict[str, Sequence[dict[str, str]]]

construct(dataset_dict) → None[source]#

Constructs the datasets from the dataset dictionary.

Return type:: None

verify_datasets() → None[source]#

Verifies that the datasets have data, and removes ones that don’t from self.train_datasets and self.validation_datasets.

Return type:: None

set_raw_value_transforms(train_transforms: Callable | None = None, val_transforms: Callable | None = None) → None[source]#

Sets the raw value transforms for each dataset in the training/validation multi-datasets.

Parameters:

train_transforms (Callable | None)
val_transforms (Callable | None)

Return type:

None

set_target_value_transforms(transforms: Callable) → None[source]#

Sets the target value transforms for each dataset in the multi-datasets.

Parameters:: transforms (Callable)
Return type:: None

set_spatial_transforms(train_transforms: dict[str, Any] | None = None, val_transforms: dict[str, Any] | None = None) → None[source]#

Sets the raw value transforms for each dataset in the training/validation multi-dataset.

Parameters:

train_transforms (dict[str, Any] | None)
val_transforms (dict[str, Any] | None)

Return type:

None

set_arrays(arrays: Mapping[str, Mapping[str, Sequence[int | float]]], type: str = 'target', usage: str = 'validate') → None[source]#

Sets the input or target arrays for the training or validation datasets.

Parameters:

arrays (Mapping[str, Mapping[str, Sequence[int | float]]])
type (str)
usage (str)

Return type:

None

to(device: str | device, non_blocking: bool = True) → None[source]#

Sets the device for the dataloaders.

Parameters:

device (str | device)
non_blocking (bool)

Return type:

None

cellmap_data.datasplit

Contents

cellmap_data.datasplit#