YAML Configuration

cellmap_flow_yaml lets you define and run multiple models from a single YAML file. It is the recommended way to launch inference jobs, and the same YAML format is used by the blockwise processor (cellmap_flow_blockwise).

Usage

# Run inference
cellmap_flow_yaml config.yaml

# Validate without running
cellmap_flow_yaml config.yaml --validate-only

# List available model types
cellmap_flow_yaml --list-types

# Set log level
cellmap_flow_yaml config.yaml --log-level DEBUG

YAML Structure

A configuration file has the following top-level fields:

Field	Required	Description
`data_path`	Yes	Path to the input dataset (zarr/n5).
`charge_group`	Yes	Project billing group.
`queue`	No	Job queue (default: `gpu_h100`).
`models`	Yes	Dict or list of model entries (see below).
`json_data`	No	Input normalizers and postprocessors.
`wrap_raw`	No	Wrap raw data in neuroglancer (default: `true`).
`output_path`	No	Output zarr path (used by blockwise processing).
`task_name`	No	Task name (used by blockwise processing).
`workers`	No	Number of GPU workers (blockwise).
`cpu_workers`	No	Number of CPU workers (blockwise).
`tmp_dir`	No	Temporary directory for intermediate files.
`bounding_boxes`	No	List of bounding boxes to process (blockwise).
`separate_bounding_boxes_zarrs`	No	Write each bounding box to a separate zarr (blockwise).

Model Entries

Each model entry requires a type field and the parameters for that model type. Use cellmap_flow_yaml --list-types to see all available types and their required parameters.

Models can be specified as a dict (keys become model names) or a list (each entry must include a name field).

Dict format (recommended):

models:
  my_mito_model:
    type: fly
    checkpoint: /path/to/checkpoint
    resolution: 16
    classes:
      - mito
  my_dacapo_model:
    type: dacapo
    run_name: my_run
    iteration: 100

List format:

models:
  - name: my_mito_model
    type: fly
    checkpoint: /path/to/checkpoint
    resolution: 16
    classes:
      - mito

Available Model Types

Type	Class	Key Parameters
`script`	ScriptModelConfig	`script_path` (required)
`dacapo`	DaCapoModelConfig	`run_name` (required), `iteration` (required)
`fly`	FlyModelConfig	`checkpoint` (required), `classes` (required), `resolution` (required)
`bio`	BioModelConfig	`model_path` (required)
`cellmap`	CellMapModelConfig	`config_folder` (required)

Common optional parameters: name, scale.

Normalizers and Postprocessors

Define input normalization and output postprocessing under json_data:

json_data:
  input_norm:
    MinMaxNormalizer:
      min_value: 0
      max_value: 250
      invert: false
    LambdaNormalizer:
      expression: "x*2-1"
  postprocess:
    DefaultPostprocessor:
      clip_min: 0
      clip_max: 1.0
      bias: 0.0
      multiplier: 127.5
    ThresholdPostprocessor:
      threshold: 0.5

Normalizers are applied in order before inference. Postprocessors are applied in order after inference.

Bounding Boxes

For blockwise processing, you can specify regions of interest:

bounding_boxes:
  - offset: [59611, 52237, 5627]
    shape: [4674, 11566, 10067]
  - offset: [64285, 26408, 15695]
    shape: [11626, 12405, 26847]

Set separate_bounding_boxes_zarrs: true to write each bounding box to its own zarr subdirectory (box_1, box_2, etc).

Examples

Minimal configuration

data_path: /nrs/cellmap/data/my_dataset/my_dataset.zarr/recon-1/em/fibsem-uint8
charge_group: cellmap
queue: gpu_h100

models:
  my_model:
    type: dacapo
    run_name: my_run
    iteration: 50000

Full configuration with normalizers

data_path: /nrs/cellmap/data/jrc_mus-salivary-1/jrc_mus-salivary-1.zarr/recon-1/em/fibsem-uint8
queue: gpu_h100
charge_group: cellmap

json_data:
  input_norm:
    MinMaxNormalizer:
      min_value: 0
      max_value: 250
      invert: false
    LambdaNormalizer:
      expression: "x*2-1"
  postprocess:
    DefaultPostprocessor:
      clip_min: 0
      clip_max: 1.0
      bias: 0.0
      multiplier: 127.5
    ThresholdPostprocessor:
      threshold: 127.5

models:
  model_tmp1:
    type: fly
    checkpoint: /path/to/model_checkpoint_362000
    resolution: 16
    classes:
      - mito

Blockwise processing

data_path: /nrs/cellmap/data/jrc_mus-salivary-1/jrc_mus-salivary-1.zarr/recon-1/em/fibsem-uint8
output_path: /path/to/output.zarr
task_name: cellmap_flow_mito_task
charge_group: cellmap
queue: gpu_h100
workers: 14
cpu_workers: 12
tmp_dir: /path/to/tmp

models:
  - name: model_tmp1
    type: fly
    channels:
      - mito
    checkpoint_path: /path/to/model_checkpoint_362000
    input_size: [178, 178, 178]
    input_voxel_size: [16, 16, 16]
    output_size: [56, 56, 56]
    output_voxel_size: [16, 16, 16]

bounding_boxes:
  - offset: [59611, 52237, 5627]
    shape: [4674, 11566, 10067]
  - offset: [64285, 26408, 15695]
    shape: [11626, 12405, 26847]

json_data:
  input_norm:
    MinMaxNormalizer:
      invert: false
      max_value: 250
      min_value: 0
    LambdaNormalizer:
      expression: "x*2-1"
  postprocess:
    ThresholdPostprocessor:
      threshold: 0.5

Run blockwise processing with:

cellmap_flow_blockwise config.yaml
cellmap_flow_blockwise config.yaml --log-level DEBUG