cellmap_data.CellMapDataSplit#
- class cellmap_data.CellMapDataSplit(input_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], classes: Sequence[str], empty_value: int | float = nan, pad: bool | str = False, datasets: Mapping[str, Sequence[CellMapDataset]] | None = None, dataset_dict: Mapping[str, Sequence[Mapping[str, str]]] | None = None, csv_path: str | None = None, spatial_transforms: Mapping[str, Any] | None = None, train_raw_value_transforms: Callable | None = None, val_raw_value_transforms: Callable | None = None, target_value_transforms: Callable | Sequence[Callable] | Mapping[str, Callable] | None = None, class_relation_dict: Mapping[str, Sequence[str]] | None = None, force_has_data: bool = False, context: Context | None = None, device: str | device | None = None)[source]#
Initializes the CellMapDatasets class.
- Parameters:
input_arrays (dict[str, dict[str, Sequence[int | float]]]) –
A dictionary containing the arrays of the dataset to input to the network. The dictionary should have the following structure:
{ "array_name": { "shape": tuple[int], "scale": Sequence[float], }, ... }
target_arrays (dict[str, dict[str, Sequence[int | float]]]) – A dictionary containing the arrays of the dataset to use as targets for the network. The dictionary should have the same structure as input_arrays.
classes (Sequence[str]) – A list of classes for segmentation training. Class order will be preserved in the output arrays.
empty_value (int | float) – The value to use for empty data. Defaults to torch.nan.
pad (bool | str) – Whether to pad the data. If a string, it should be either “train” or “validate”. Defaults to False.
datasets (Optional[Mapping[str, Sequence[CellMapDataset]]]) –
A dictionary containing the dataset objects. Defaults to None. The dictionary should have the following structure:
{ "train": Iterable[CellMapDataset], "validate": Iterable[CellMapDataset], }.
dataset_dict (Optional[Mapping[str, Sequence[Mapping[str, str]]]) –
A dictionary containing the dataset data. Defaults to None. The dictionary should have the following structure:
{ "train" | "validate": [{ "raw": str (path to raw data), "gt": str (path to ground truth data), }], ... }
csv_path (Optional[str]) –
A path to a csv file containing the dataset data. Defaults to None. Each row in the csv file should have the following structure:”
train | validate, raw path, gt path
spatial_transforms (Optional[Sequence[dict[str, Any]]]) –
A sequence of dictionaries containing the spatial transformations to apply to the data. Defaults to None. The dictionary should have the following structure:
{transform_name: {transform_args}}
train_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in training datasets. Defaults to None. Example is to add gaussian noise to the raw data.
val_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in validation datasets. Defaults to None. Example is to normalize the raw data.
target_value_transforms (Optional[Callable | Sequence[Callable] | Mapping[str, Callable]]) – A function to convert the ground truth data to target arrays. Defaults to None. Example is to convert the ground truth data to a signed distance transform. May be a single function, a list of functions, or a dictionary of functions for each class. In the case of a list of functions, it is assumed that the functions correspond to each class in the classes list in order.
class_relation_dict (Optional[Mapping[str, Sequence[str]]]) –
A dictionary containing the class relations. The dictionary should have the following structure:
{ "class_name": [mutually_exclusive_class_name, ...], ... }
force_has_data (bool) – Whether to force the datasets to have data even if no ground truth data is found. Defaults to False. Useful for training with only raw data.
context (Optional[tensorstore.Context]) – The TensorStore context for the image data. Defaults to None.
device (Optional[str | torch.device]) – Device to use for the dataloaders. Defaults to None.
Note
The csv_path, dataset_dict, and datasets arguments are mutually exclusive, but one must be supplied.
- __init__(input_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], classes: Sequence[str], empty_value: int | float = nan, pad: bool | str = False, datasets: Mapping[str, Sequence[CellMapDataset]] | None = None, dataset_dict: Mapping[str, Sequence[Mapping[str, str]]] | None = None, csv_path: str | None = None, spatial_transforms: Mapping[str, Any] | None = None, train_raw_value_transforms: Callable | None = None, val_raw_value_transforms: Callable | None = None, target_value_transforms: Callable | Sequence[Callable] | Mapping[str, Callable] | None = None, class_relation_dict: Mapping[str, Sequence[str]] | None = None, force_has_data: bool = False, context: Context | None = None, device: str | device | None = None) None [source]#
Initializes the CellMapDatasets class.
- Parameters:
input_arrays (dict[str, dict[str, Sequence[int | float]]]) –
A dictionary containing the arrays of the dataset to input to the network. The dictionary should have the following structure:
{ "array_name": { "shape": tuple[int], "scale": Sequence[float], }, ... }
target_arrays (dict[str, dict[str, Sequence[int | float]]]) – A dictionary containing the arrays of the dataset to use as targets for the network. The dictionary should have the same structure as input_arrays.
classes (Sequence[str]) – A list of classes for segmentation training. Class order will be preserved in the output arrays.
empty_value (int | float) – The value to use for empty data. Defaults to torch.nan.
pad (bool | str) – Whether to pad the data. If a string, it should be either “train” or “validate”. Defaults to False.
datasets (Optional[Mapping[str, Sequence[CellMapDataset]]]) –
A dictionary containing the dataset objects. Defaults to None. The dictionary should have the following structure:
{ "train": Iterable[CellMapDataset], "validate": Iterable[CellMapDataset], }.
dataset_dict (Optional[Mapping[str, Sequence[Mapping[str, str]]]) –
A dictionary containing the dataset data. Defaults to None. The dictionary should have the following structure:
{ "train" | "validate": [{ "raw": str (path to raw data), "gt": str (path to ground truth data), }], ... }
csv_path (Optional[str]) –
A path to a csv file containing the dataset data. Defaults to None. Each row in the csv file should have the following structure:”
train | validate, raw path, gt path
spatial_transforms (Optional[Sequence[dict[str, Any]]]) –
A sequence of dictionaries containing the spatial transformations to apply to the data. Defaults to None. The dictionary should have the following structure:
{transform_name: {transform_args}}
train_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in training datasets. Defaults to None. Example is to add gaussian noise to the raw data.
val_raw_value_transforms (Optional[Callable]) – A function to apply to the raw data in validation datasets. Defaults to None. Example is to normalize the raw data.
target_value_transforms (Optional[Callable | Sequence[Callable] | Mapping[str, Callable]]) – A function to convert the ground truth data to target arrays. Defaults to None. Example is to convert the ground truth data to a signed distance transform. May be a single function, a list of functions, or a dictionary of functions for each class. In the case of a list of functions, it is assumed that the functions correspond to each class in the classes list in order.
class_relation_dict (Optional[Mapping[str, Sequence[str]]]) –
A dictionary containing the class relations. The dictionary should have the following structure:
{ "class_name": [mutually_exclusive_class_name, ...], ... }
force_has_data (bool) – Whether to force the datasets to have data even if no ground truth data is found. Defaults to False. Useful for training with only raw data.
context (Optional[tensorstore.Context]) – The TensorStore context for the image data. Defaults to None.
device (Optional[str | torch.device]) – Device to use for the dataloaders. Defaults to None.
- Return type:
None
Note
The csv_path, dataset_dict, and datasets arguments are mutually exclusive, but one must be supplied.
Methods
__init__
(input_arrays, target_arrays, classes)Initializes the CellMapDatasets class.
construct
(dataset_dict)Constructs the datasets from the dataset dictionary.
from_csv
(csv_path)Loads the dataset_dict data from a csv file.
set_arrays
(arrays[, type, usage])Sets the input or target arrays for the training or validation datasets.
set_raw_value_transforms
([train_transforms, ...])Sets the raw value transforms for each dataset in the training/validation multi-datasets.
set_spatial_transforms
([train_transforms, ...])Sets the raw value transforms for each dataset in the training/validation multi-dataset.
set_target_value_transforms
(transforms)Sets the target value transforms for each dataset in the multi-datasets.
to
(device)Sets the device for the dataloaders.
Verifies that the datasets have data, and removes ones that don't from
self.train_datasets
andself.validation_datasets
.Attributes
A dictionary containing the class counts for the training and validation datasets.
A multi-dataset from the combination of all training datasets.
A subset of the validation datasets, tiling the validation datasets with non-overlapping blocks.
A multi-dataset from the combination of all validation datasets.
- property train_datasets_combined: CellMapMultiDataset#
A multi-dataset from the combination of all training datasets.
- property validation_datasets_combined: CellMapMultiDataset#
A multi-dataset from the combination of all validation datasets.
- property validation_blocks: CellMapSubset#
A subset of the validation datasets, tiling the validation datasets with non-overlapping blocks.
- property class_counts: dict[str, dict[str, float]]#
A dictionary containing the class counts for the training and validation datasets.
- from_csv(csv_path) dict[str, Sequence[dict[str, str]]] [source]#
Loads the dataset_dict data from a csv file.
- Return type:
dict[str, Sequence[dict[str, str]]]
- construct(dataset_dict) None [source]#
Constructs the datasets from the dataset dictionary.
- Return type:
None
- verify_datasets() None [source]#
Verifies that the datasets have data, and removes ones that don’t from
self.train_datasets
andself.validation_datasets
.- Return type:
None
- set_raw_value_transforms(train_transforms: Callable | None = None, val_transforms: Callable | None = None) None [source]#
Sets the raw value transforms for each dataset in the training/validation multi-datasets.
- Parameters:
train_transforms (Callable | None)
val_transforms (Callable | None)
- Return type:
None
- set_target_value_transforms(transforms: Callable) None [source]#
Sets the target value transforms for each dataset in the multi-datasets.
- Parameters:
transforms (Callable)
- Return type:
None
- set_spatial_transforms(train_transforms: dict[str, Any] | None = None, val_transforms: dict[str, Any] | None = None) None [source]#
Sets the raw value transforms for each dataset in the training/validation multi-dataset.
- Parameters:
train_transforms (dict[str, Any] | None)
val_transforms (dict[str, Any] | None)
- Return type:
None
- set_arrays(arrays: Mapping[str, Mapping[str, Sequence[int | float]]], type: str = 'target', usage: str = 'validate') None [source]#
Sets the input or target arrays for the training or validation datasets.
- Parameters:
arrays (Mapping[str, Mapping[str, Sequence[int | float]]])
type (str)
usage (str)
- Return type:
None