YAML Configuration

cellmap_flow_yaml lets you define and run multiple models from a single YAML file. It is the recommended way to launch inference jobs, and the same YAML format is used by the blockwise processor (cellmap_flow_blockwise).

Usage

# Run inference
cellmap_flow_yaml config.yaml

# Validate without running
cellmap_flow_yaml config.yaml --validate-only

# List available model types
cellmap_flow_yaml --list-types

# Set log level
cellmap_flow_yaml config.yaml --log-level DEBUG

YAML Structure

A configuration file has the following top-level fields:

Field

Required

Description

data_path

Yes

Path to the input dataset (zarr/n5).

charge_group

Yes

Project billing group.

queue

No

Job queue (default: gpu_h100).

models

Yes

Dict or list of model entries (see below).

json_data

No

Input normalizers and postprocessors.

wrap_raw

No

Wrap raw data in neuroglancer (default: true).

output_path

No

Output zarr path (used by blockwise processing).

task_name

No

Task name (used by blockwise processing).

workers

No

Number of GPU workers (blockwise).

cpu_workers

No

Number of CPU workers (blockwise).

tmp_dir

No

Temporary directory for intermediate files.

bounding_boxes

No

List of bounding boxes to process (blockwise).

separate_bounding_boxes_zarrs

No

Write each bounding box to a separate zarr (blockwise).

Model Entries

Each model entry requires a type field and the parameters for that model type. Use cellmap_flow_yaml --list-types to see all available types and their required parameters.

Models can be specified as a dict (keys become model names) or a list (each entry must include a name field).

Dict format (recommended):

models:
  my_mito_model:
    type: fly
    checkpoint: /path/to/checkpoint
    resolution: 16
    classes:
      - mito
  my_dacapo_model:
    type: dacapo
    run_name: my_run
    iteration: 100

List format:

models:
  - name: my_mito_model
    type: fly
    checkpoint: /path/to/checkpoint
    resolution: 16
    classes:
      - mito

Available Model Types

Type

Class

Key Parameters

script

ScriptModelConfig

script_path (required)

dacapo

DaCapoModelConfig

run_name (required), iteration (required)

fly

FlyModelConfig

checkpoint (required), classes (required), resolution (required)

bio

BioModelConfig

model_path (required)

cellmap

CellMapModelConfig

config_folder (required)

Common optional parameters: name, scale.

Normalizers and Postprocessors

Define input normalization and output postprocessing under json_data:

json_data:
  input_norm:
    MinMaxNormalizer:
      min_value: 0
      max_value: 250
      invert: false
    LambdaNormalizer:
      expression: "x*2-1"
  postprocess:
    DefaultPostprocessor:
      clip_min: 0
      clip_max: 1.0
      bias: 0.0
      multiplier: 127.5
    ThresholdPostprocessor:
      threshold: 0.5

Normalizers are applied in order before inference. Postprocessors are applied in order after inference.

Bounding Boxes

For blockwise processing, you can specify regions of interest:

bounding_boxes:
  - offset: [59611, 52237, 5627]
    shape: [4674, 11566, 10067]
  - offset: [64285, 26408, 15695]
    shape: [11626, 12405, 26847]

Set separate_bounding_boxes_zarrs: true to write each bounding box to its own zarr subdirectory (box_1, box_2, etc).

Examples

Minimal configuration

data_path: /nrs/cellmap/data/my_dataset/my_dataset.zarr/recon-1/em/fibsem-uint8
charge_group: cellmap
queue: gpu_h100

models:
  my_model:
    type: dacapo
    run_name: my_run
    iteration: 50000

Full configuration with normalizers

data_path: /nrs/cellmap/data/jrc_mus-salivary-1/jrc_mus-salivary-1.zarr/recon-1/em/fibsem-uint8
queue: gpu_h100
charge_group: cellmap

json_data:
  input_norm:
    MinMaxNormalizer:
      min_value: 0
      max_value: 250
      invert: false
    LambdaNormalizer:
      expression: "x*2-1"
  postprocess:
    DefaultPostprocessor:
      clip_min: 0
      clip_max: 1.0
      bias: 0.0
      multiplier: 127.5
    ThresholdPostprocessor:
      threshold: 127.5

models:
  model_tmp1:
    type: fly
    checkpoint: /path/to/model_checkpoint_362000
    resolution: 16
    classes:
      - mito

Blockwise processing

data_path: /nrs/cellmap/data/jrc_mus-salivary-1/jrc_mus-salivary-1.zarr/recon-1/em/fibsem-uint8
output_path: /path/to/output.zarr
task_name: cellmap_flow_mito_task
charge_group: cellmap
queue: gpu_h100
workers: 14
cpu_workers: 12
tmp_dir: /path/to/tmp

models:
  - name: model_tmp1
    type: fly
    channels:
      - mito
    checkpoint_path: /path/to/model_checkpoint_362000
    input_size: [178, 178, 178]
    input_voxel_size: [16, 16, 16]
    output_size: [56, 56, 56]
    output_voxel_size: [16, 16, 16]

bounding_boxes:
  - offset: [59611, 52237, 5627]
    shape: [4674, 11566, 10067]
  - offset: [64285, 26408, 15695]
    shape: [11626, 12405, 26847]

json_data:
  input_norm:
    MinMaxNormalizer:
      invert: false
      max_value: 250
      min_value: 0
    LambdaNormalizer:
      expression: "x*2-1"
  postprocess:
    ThresholdPostprocessor:
      threshold: 0.5

Run blockwise processing with:

cellmap_flow_blockwise config.yaml
cellmap_flow_blockwise config.yaml --log-level DEBUG