YAML Configuration =================== ``cellmap_flow_yaml`` lets you define and run multiple models from a single YAML file. It is the recommended way to launch inference jobs, and the same YAML format is used by the blockwise processor (``cellmap_flow_blockwise``). Usage ----- .. code-block:: bash # Run inference cellmap_flow_yaml config.yaml # Validate without running cellmap_flow_yaml config.yaml --validate-only # List available model types cellmap_flow_yaml --list-types # Set log level cellmap_flow_yaml config.yaml --log-level DEBUG YAML Structure -------------- A configuration file has the following top-level fields: .. list-table:: :header-rows: 1 :widths: 25 10 65 * - Field - Required - Description * - ``data_path`` - Yes - Path to the input dataset (zarr/n5). * - ``charge_group`` - Yes - Project billing group. * - ``queue`` - No - Job queue (default: ``gpu_h100``). * - ``models`` - Yes - Dict or list of model entries (see below). * - ``json_data`` - No - Input normalizers and postprocessors. * - ``wrap_raw`` - No - Wrap raw data in neuroglancer (default: ``true``). * - ``output_path`` - No - Output zarr path (used by blockwise processing). * - ``task_name`` - No - Task name (used by blockwise processing). * - ``workers`` - No - Number of GPU workers (blockwise). * - ``cpu_workers`` - No - Number of CPU workers (blockwise). * - ``tmp_dir`` - No - Temporary directory for intermediate files. * - ``bounding_boxes`` - No - List of bounding boxes to process (blockwise). * - ``separate_bounding_boxes_zarrs`` - No - Write each bounding box to a separate zarr (blockwise). Model Entries ------------- Each model entry requires a ``type`` field and the parameters for that model type. Use ``cellmap_flow_yaml --list-types`` to see all available types and their required parameters. Models can be specified as a **dict** (keys become model names) or a **list** (each entry must include a ``name`` field). **Dict format** (recommended): .. code-block:: yaml models: my_mito_model: type: fly checkpoint: /path/to/checkpoint resolution: 16 classes: - mito my_dacapo_model: type: dacapo run_name: my_run iteration: 100 **List format**: .. code-block:: yaml models: - name: my_mito_model type: fly checkpoint: /path/to/checkpoint resolution: 16 classes: - mito Available Model Types ~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 15 20 65 * - Type - Class - Key Parameters * - ``script`` - ScriptModelConfig - ``script_path`` (required) * - ``dacapo`` - DaCapoModelConfig - ``run_name`` (required), ``iteration`` (required) * - ``fly`` - FlyModelConfig - ``checkpoint`` (required), ``classes`` (required), ``resolution`` (required) * - ``bio`` - BioModelConfig - ``model_path`` (required) * - ``cellmap`` - CellMapModelConfig - ``config_folder`` (required) Common optional parameters: ``name``, ``scale``. Normalizers and Postprocessors ------------------------------ Define input normalization and output postprocessing under ``json_data``: .. code-block:: yaml json_data: input_norm: MinMaxNormalizer: min_value: 0 max_value: 250 invert: false LambdaNormalizer: expression: "x*2-1" postprocess: DefaultPostprocessor: clip_min: 0 clip_max: 1.0 bias: 0.0 multiplier: 127.5 ThresholdPostprocessor: threshold: 0.5 Normalizers are applied in order before inference. Postprocessors are applied in order after inference. Bounding Boxes -------------- For blockwise processing, you can specify regions of interest: .. code-block:: yaml bounding_boxes: - offset: [59611, 52237, 5627] shape: [4674, 11566, 10067] - offset: [64285, 26408, 15695] shape: [11626, 12405, 26847] Set ``separate_bounding_boxes_zarrs: true`` to write each bounding box to its own zarr subdirectory (``box_1``, ``box_2``, etc). Examples -------- Minimal configuration ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: yaml data_path: /nrs/cellmap/data/my_dataset/my_dataset.zarr/recon-1/em/fibsem-uint8 charge_group: cellmap queue: gpu_h100 models: my_model: type: dacapo run_name: my_run iteration: 50000 Full configuration with normalizers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: yaml data_path: /nrs/cellmap/data/jrc_mus-salivary-1/jrc_mus-salivary-1.zarr/recon-1/em/fibsem-uint8 queue: gpu_h100 charge_group: cellmap json_data: input_norm: MinMaxNormalizer: min_value: 0 max_value: 250 invert: false LambdaNormalizer: expression: "x*2-1" postprocess: DefaultPostprocessor: clip_min: 0 clip_max: 1.0 bias: 0.0 multiplier: 127.5 ThresholdPostprocessor: threshold: 127.5 models: model_tmp1: type: fly checkpoint: /path/to/model_checkpoint_362000 resolution: 16 classes: - mito Blockwise processing ~~~~~~~~~~~~~~~~~~~~ .. code-block:: yaml data_path: /nrs/cellmap/data/jrc_mus-salivary-1/jrc_mus-salivary-1.zarr/recon-1/em/fibsem-uint8 output_path: /path/to/output.zarr task_name: cellmap_flow_mito_task charge_group: cellmap queue: gpu_h100 workers: 14 cpu_workers: 12 tmp_dir: /path/to/tmp models: - name: model_tmp1 type: fly channels: - mito checkpoint_path: /path/to/model_checkpoint_362000 input_size: [178, 178, 178] input_voxel_size: [16, 16, 16] output_size: [56, 56, 56] output_voxel_size: [16, 16, 16] bounding_boxes: - offset: [59611, 52237, 5627] shape: [4674, 11566, 10067] - offset: [64285, 26408, 15695] shape: [11626, 12405, 26847] json_data: input_norm: MinMaxNormalizer: invert: false max_value: 250 min_value: 0 LambdaNormalizer: expression: "x*2-1" postprocess: ThresholdPostprocessor: threshold: 0.5 Run blockwise processing with: .. code-block:: bash cellmap_flow_blockwise config.yaml cellmap_flow_blockwise config.yaml --log-level DEBUG