cellmap_data.dataset_writer#

Functions

split_target_path(path)

Splits a path to groundtruth data into the main path string, and the classes supplied for it.

Classes

CellMapDatasetWriter(raw_path, target_path, ...)

Initializes the CellMapDatasetWriter.

cellmap_data.dataset_writer.split_target_path(path: str) tuple[str, list[str]][source]#

Splits a path to groundtruth data into the main path string, and the classes supplied for it.

Parameters:

path (str)

Return type:

tuple[str, list[str]]

class cellmap_data.dataset_writer.CellMapDatasetWriter(raw_path: str, target_path: str, classes: Sequence[str], input_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_bounds: Mapping[str, Mapping[str, list[float]]], raw_value_transforms: Callable | None = None, axis_order: str = 'zyx', context: Context | None = None, rng: Generator | None = None, empty_value: float | int = 0, overwrite: bool = False)[source]#

Initializes the CellMapDatasetWriter.

Parameters:
  • raw_path (str) – The full path to the raw data zarr, excluding the mulstiscale level.

  • target_path (str) – The full path to the ground truth data zarr, excluding the mulstiscale level and the class name.

  • classes (Sequence[str]) – The classes in the dataset.

  • input_arrays (Mapping[str, Mapping[str, Sequence[int | float]]]) –

    The input arrays to return for processing. The dictionary should have the following structure:

    {
        "array_name": {
            "shape": tuple[int],
            "scale": Sequence[float],
    
            and optionally:
            "scale_level": int,
        },
        ...
    }
    

  • array (and 'scale' is the scale of the array in world units. The 'scale_level' is the multiscale level to use for the)

  • voxels ('shape' is the shape of the array in)

  • array

  • supplied. (otherwise set to 0 if not)

  • target_arrays (Mapping[str, Mapping[str, Sequence[int | float]]]) – The target arrays to write to disk, with format matching that for input_arrays.

  • target_bounds (Mapping[str, Mapping[str, list[float]]]) – The bounding boxes for each target array, in world units. Example: {“array_1”: {“x”: [12.0, 102.0], “y”: [12.0, 102.0], “z”: [12.0, 102.0]}}.

  • raw_value_transforms (Optional[Callable]) – The value transforms to apply to the raw data.

  • axis_order (str) – The order of the axes in the data.

  • context (Optional[tensorstore.Context]) – The context to use for the tensorstore.

  • rng (Optional[torch.Generator]) – The random number generator to use.

  • empty_value (float | int) – The value to use for empty data in an array.

  • overwrite (bool) – Whether to overwrite existing data.

property center: Mapping[str, float] | None#

Returns the center of the dataset in world units.

property smallest_voxel_sizes: Mapping[str, float]#

Returns the smallest voxel size of the dataset.

property smallest_target_array: Mapping[str, float]#

Returns the smallest target array in world units.

property bounding_box: Mapping[str, list[float]]#

Returns the bounding box inclusive of all the target images.

property bounding_box_shape: Mapping[str, int]#

Returns the shape of the bounding box of the dataset in voxels of the smallest voxel size requested.

property sampling_box: Mapping[str, list[float]]#

Returns the sampling box of the dataset (i.e. where centers should be drawn from and to fully sample within the bounding box).

property sampling_box_shape: dict[str, int]#

Returns the shape of the sampling box of the dataset in voxels of the smallest voxel size requested.

property size: int#

Returns the size of the dataset in voxels of the smallest voxel size requested.

property writer_indices: Sequence[int]#

Returns the indices of the dataset that will produce non-overlapping tiles for use in writer, based on the smallest requested target array.

property blocks: Subset#

A subset of the validation datasets, tiling the validation datasets with non-overlapping blocks.

loader(batch_size: int = 1, num_workers: int = 0, **kwargs)[source]#

Returns a DataLoader for the dataset.

Parameters:
  • batch_size (int)

  • num_workers (int)

collate_fn(batch: list[dict]) dict[str, Tensor][source]#

Combine a list of dictionaries from different sources into a single dictionary for output.

Parameters:

batch (list[dict])

Return type:

dict[str, Tensor]

property device: device#

Returns the device for the dataset.

__len__() int[source]#

Returns the length of the dataset, determined by the number of coordinates that could be sampled as the center for an array request.

Return type:

int

get_center(idx: int) dict[str, float][source]#
Parameters:

idx (int)

Return type:

dict[str, float]

__getitem__(idx: int) dict[str, Tensor][source]#

Returns a crop of the input and target data as PyTorch tensors, corresponding to the coordinate of the unwrapped index.

Parameters:

idx (int)

Return type:

dict[str, Tensor]

get_target_array_writer(array_name: str, array_info: Mapping[str, Sequence[int | float]]) dict[str, ImageWriter][source]#

Returns a dictionary of ImageWriter for the target images (per class) for a given array.

Parameters:
  • array_name (str)

  • array_info (Mapping[str, Sequence[int | float]])

Return type:

dict[str, ImageWriter]

get_image_writer(array_name: str, label: str, array_info: Mapping[str, Sequence[int | float] | int]) ImageWriter[source]#
Parameters:
  • array_name (str)

  • label (str)

  • array_info (Mapping[str, Sequence[int | float] | int])

Return type:

ImageWriter

verify() bool[source]#

Verifies that the dataset is valid to draw samples from.

Return type:

bool

get_indices(chunk_size: Mapping[str, float]) Sequence[int][source]#

Returns the indices of the dataset that will tile the dataset according to the chunk_size (supplied in world units).

Parameters:

chunk_size (Mapping[str, float])

Return type:

Sequence[int]

to(device: str | device) CellMapDatasetWriter[source]#

Sets the device for the dataset.

Parameters:

device (str | device)

Return type:

CellMapDatasetWriter

set_raw_value_transforms(transforms: Callable) None[source]#

Sets the raw value transforms for the dataset.

Parameters:

transforms (Callable)

Return type:

None