cellmap_data.dataset_writer

cellmap_data.dataset_writer#

Functions

split_target_path(path)

Splits a path to groundtruth data into the main path string, and the classes supplied for it.

Classes

CellMapDatasetWriter(raw_path, target_path, ...)

Initializes the CellMapDatasetWriter.

cellmap_data.dataset_writer.split_target_path(path: str) → tuple[str, list[str]][source]#

Splits a path to groundtruth data into the main path string, and the classes supplied for it.

Parameters:: path (str)
Return type:: tuple[str, list[str]]

class cellmap_data.dataset_writer.CellMapDatasetWriter(raw_path: str, target_path: str, classes: Sequence[str], input_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_bounds: Mapping[str, Mapping[str, list[float]]], raw_value_transforms: Callable | None = None, axis_order: str = 'zyx', context: Context | None = None, rng: Generator | None = None, empty_value: float | int = 0, overwrite: bool = False, device: str | device | None = None)[source]#

Initializes the CellMapDatasetWriter.

Parameters:

raw_path (str) – The full path to the raw data zarr, excluding the mulstiscale level.
target_path (str) – The full path to the ground truth data zarr, excluding the mulstiscale level and the class name.
classes (Sequence[str]) – The classes in the dataset.

input_arrays (Mapping[str, Mapping[str, Sequence[int | float]]]) –

The input arrays to return for processing. The dictionary should have the following structure:

{
    "array_name": {
        "shape": tuple[int],
        "scale": Sequence[float],

        and optionally:
        "scale_level": int,
    },
    ...
}

array (and 'scale' is the scale of the array in world units. The 'scale_level' is the multiscale level to use for the)
voxels ('shape' is the shape of the array in)
array
supplied. (otherwise set to 0 if not)
target_arrays (Mapping[str, Mapping[str, Sequence[int | float]]]) – The target arrays to write to disk, with format matching that for input_arrays.
target_bounds (Mapping[str, Mapping[str, list[float]]]) – The bounding boxes for each target array, in world units. Example: {“array_1”: {“x”: [12.0, 102.0], “y”: [12.0, 102.0], “z”: [12.0, 102.0]}}.
raw_value_transforms (Optional[Callable]) – The value transforms to apply to the raw data.
axis_order (str) – The order of the axes in the data.
context (Optional[tensorstore.Context]) – The context to use for the tensorstore.
rng (Optional[torch.Generator]) – The random number generator to use.
empty_value (float | int) – The value to use for empty data in an array.
overwrite (bool) – Whether to overwrite existing data.
device (Optional[str | torch.device]) – The device to use for the dataset. If None, will default to “cuda” if available, then “mps”, otherwise “cpu”.

property center: Mapping[str, float] | None#: Returns the center of the dataset in world units.

property smallest_voxel_sizes: Mapping[str, float]#: Returns the smallest voxel size of the dataset.

property smallest_target_array: Mapping[str, float]#: Returns the smallest target array in world units.

property bounding_box: Mapping[str, list[float]]#: Returns the bounding box inclusive of all the target images.

property bounding_box_shape: Mapping[str, int]#: Returns the shape of the bounding box of the dataset in voxels of the smallest voxel size requested.

property sampling_box: Mapping[str, list[float]]#: Returns the sampling box of the dataset (i.e. where centers should be drawn from and to fully sample within the bounding box).

property sampling_box_shape: dict[str, int]#: Returns the shape of the sampling box of the dataset in voxels of the smallest voxel size requested.

property size: int#: Returns the size of the dataset in voxels of the smallest voxel size requested.

property writer_indices: Sequence[int]#: Returns the indices of the dataset that will produce non-overlapping tiles for use in writer, based on the smallest requested target array.

property blocks: Subset#: A subset of the validation datasets, tiling the validation datasets with non-overlapping blocks.

loader(batch_size: int = 1, num_workers: int = 0, **kwargs)[source]#

Returns a DataLoader for the dataset.

Parameters:

batch_size (int)
num_workers (int)

collate_fn(batch: list[dict]) → dict[str, Tensor][source]#

Combine a list of dictionaries from different sources into a single dictionary for output.

Parameters:: batch (list[dict])
Return type:: dict[str, Tensor]

property device: device#: Returns the device for the dataset.

__len__() → int[source]#

Returns the length of the dataset, determined by the number of coordinates that could be sampled as the center for an array request.

Return type:: int

get_center(idx: int) → dict[str, float][source]#

Parameters:: idx (int)
Return type:: dict[str, float]

__getitem__(idx: int) → dict[str, Tensor][source]#

Returns a crop of the input and target data as PyTorch tensors, corresponding to the coordinate of the unwrapped index.

Parameters:: idx (int)
Return type:: dict[str, Tensor]

get_target_array_writer(array_name: str, array_info: Mapping[str, Sequence[int | float]]) → dict[str, ImageWriter][source]#

Returns a dictionary of ImageWriter for the target images (per class) for a given array.

Parameters:

array_name (str)
array_info (Mapping[str, Sequence[int | float]])

Return type:

dict[str, ImageWriter]

get_image_writer(array_name: str, label: str, array_info: Mapping[str, Sequence[int | float] | int]) → ImageWriter[source]#

Parameters:

array_name (str)
label (str)
array_info (Mapping[str, Sequence[int | float] | int])

Return type:

ImageWriter

verify() → bool[source]#

Verifies that the dataset is valid to draw samples from.

Return type:: bool

get_indices(chunk_size: Mapping[str, float]) → Sequence[int][source]#

Returns the indices of the dataset that will tile the dataset according to the chunk_size (supplied in world units).

Parameters:: chunk_size (Mapping[str, float])
Return type:: Sequence[int]

to(device: str | device, non_blocking: bool = True) → CellMapDatasetWriter[source]#

Sets the device for the dataset.

Parameters:

device (str | device)
non_blocking (bool)

Return type:

CellMapDatasetWriter

set_raw_value_transforms(transforms: Callable) → None[source]#

Sets the raw value transforms for the dataset.

Parameters:: transforms (Callable)
Return type:: None

cellmap_data.dataset_writer

Contents

cellmap_data.dataset_writer#