cellmap_data.dataset_writer#
Functions
|
Splits a path to groundtruth data into the main path string, and the classes supplied for it. |
Classes
|
Initializes the CellMapDatasetWriter. |
- cellmap_data.dataset_writer.split_target_path(path: str) tuple[str, list[str]] [source]#
Splits a path to groundtruth data into the main path string, and the classes supplied for it.
- Parameters:
path (str)
- Return type:
tuple[str, list[str]]
- class cellmap_data.dataset_writer.CellMapDatasetWriter(raw_path: str, target_path: str, classes: Sequence[str], input_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_arrays: Mapping[str, Mapping[str, Sequence[int | float]]], target_bounds: Mapping[str, Mapping[str, list[float]]], raw_value_transforms: Callable | None = None, axis_order: str = 'zyx', context: Context | None = None, rng: Generator | None = None, empty_value: float | int = 0, overwrite: bool = False, device: str | device | None = None)[source]#
Initializes the CellMapDatasetWriter.
- Parameters:
raw_path (str) – The full path to the raw data zarr, excluding the mulstiscale level.
target_path (str) – The full path to the ground truth data zarr, excluding the mulstiscale level and the class name.
classes (Sequence[str]) – The classes in the dataset.
input_arrays (Mapping[str, Mapping[str, Sequence[int | float]]]) –
The input arrays to return for processing. The dictionary should have the following structure:
{ "array_name": { "shape": tuple[int], "scale": Sequence[float], and optionally: "scale_level": int, }, ... }
array (and 'scale' is the scale of the array in world units. The 'scale_level' is the multiscale level to use for the)
voxels ('shape' is the shape of the array in)
array
supplied. (otherwise set to 0 if not)
target_arrays (Mapping[str, Mapping[str, Sequence[int | float]]]) – The target arrays to write to disk, with format matching that for input_arrays.
target_bounds (Mapping[str, Mapping[str, list[float]]]) – The bounding boxes for each target array, in world units. Example: {“array_1”: {“x”: [12.0, 102.0], “y”: [12.0, 102.0], “z”: [12.0, 102.0]}}.
raw_value_transforms (Optional[Callable]) – The value transforms to apply to the raw data.
axis_order (str) – The order of the axes in the data.
context (Optional[tensorstore.Context]) – The context to use for the tensorstore.
rng (Optional[torch.Generator]) – The random number generator to use.
empty_value (float | int) – The value to use for empty data in an array.
overwrite (bool) – Whether to overwrite existing data.
device (Optional[str | torch.device]) – The device to use for the dataset. If None, will default to “cuda” if available, then “mps”, otherwise “cpu”.
- property center: Mapping[str, float] | None#
Returns the center of the dataset in world units.
- property smallest_voxel_sizes: Mapping[str, float]#
Returns the smallest voxel size of the dataset.
- property smallest_target_array: Mapping[str, float]#
Returns the smallest target array in world units.
- property bounding_box: Mapping[str, list[float]]#
Returns the bounding box inclusive of all the target images.
- property bounding_box_shape: Mapping[str, int]#
Returns the shape of the bounding box of the dataset in voxels of the smallest voxel size requested.
- property sampling_box: Mapping[str, list[float]]#
Returns the sampling box of the dataset (i.e. where centers should be drawn from and to fully sample within the bounding box).
- property sampling_box_shape: dict[str, int]#
Returns the shape of the sampling box of the dataset in voxels of the smallest voxel size requested.
- property size: int#
Returns the size of the dataset in voxels of the smallest voxel size requested.
- property writer_indices: Sequence[int]#
Returns the indices of the dataset that will produce non-overlapping tiles for use in writer, based on the smallest requested target array.
- property blocks: Subset#
A subset of the validation datasets, tiling the validation datasets with non-overlapping blocks.
- loader(batch_size: int = 1, num_workers: int = 0, **kwargs)[source]#
Returns a DataLoader for the dataset.
- Parameters:
batch_size (int)
num_workers (int)
- collate_fn(batch: list[dict]) dict[str, Tensor] [source]#
Combine a list of dictionaries from different sources into a single dictionary for output.
- Parameters:
batch (list[dict])
- Return type:
dict[str, Tensor]
- property device: device#
Returns the device for the dataset.
- __len__() int [source]#
Returns the length of the dataset, determined by the number of coordinates that could be sampled as the center for an array request.
- Return type:
int
- __getitem__(idx: int) dict[str, Tensor] [source]#
Returns a crop of the input and target data as PyTorch tensors, corresponding to the coordinate of the unwrapped index.
- Parameters:
idx (int)
- Return type:
dict[str, Tensor]
- get_target_array_writer(array_name: str, array_info: Mapping[str, Sequence[int | float]]) dict[str, ImageWriter] [source]#
Returns a dictionary of ImageWriter for the target images (per class) for a given array.
- Parameters:
array_name (str)
array_info (Mapping[str, Sequence[int | float]])
- Return type:
dict[str, ImageWriter]
- get_image_writer(array_name: str, label: str, array_info: Mapping[str, Sequence[int | float] | int]) ImageWriter [source]#
- Parameters:
array_name (str)
label (str)
array_info (Mapping[str, Sequence[int | float] | int])
- Return type:
- get_indices(chunk_size: Mapping[str, float]) Sequence[int] [source]#
Returns the indices of the dataset that will tile the dataset according to the chunk_size (supplied in world units).
- Parameters:
chunk_size (Mapping[str, float])
- Return type:
Sequence[int]
- to(device: str | device) CellMapDatasetWriter [source]#
Sets the device for the dataset.
- Parameters:
device (str | device)
- Return type: