1. Postprocessing

1. Postprocessing#

[1]:
from pathlib import Path
import sys
import os
import pickle

import veloev as vev

To begin, you should define the benchmark_info configuration dictionary. This dictionary controls the inputs and parameters for the evaluation pipeline. The required keys are:

  • methods (list): A list of method names. These names must correspond to the velocity layer keys in adata.layers and the method identifiers in the filenames.

  • methods_map (dict): A dictionary for visualization purposes. It maps the internal method names (keys) to their display labels (values).

  • datasets_name (list): A list of dataset names. Each name must match a corresponding folder name in your data directory.

  • tasks (list): A list of tasks to execute. The package supports eight tasks: directional, temporal, directional_temporal, negative_control, seq_depth_directional, seq_depth_temporal, seq_depth_directional_temporal, and simulation.

  • k_folds (list): A list defining the number of cross-validation folds (k-folds) to run for each respective dataset.

  • cluster_key (list): The column name in adata.obs containing cluster annotations. Required for directional tasks; set to None otherwise.

  • time_key (list): The column name in adata.obs containing latent time or pseudotime. Required for temporal tasks; set to None otherwise.

  • cell_type_transitions (list): Defines the ground-truth transitions between cell types for directional tasks.

  • time_transitions (list): Defines the ground-truth time transitions for temporal tasks.

If you are using our toy example (available via this link), you can use the configuration below directly. For custom datasets, please adapt the dictionary to follow this template.

[2]:
benchmark_info = {
    'methods': ['sdevelo','unitvelo_uni','velocyto'],
    'methods_map': {'sdevelo': 'SDEvelo',
                    'unitvelo_uni': 'UniTVelo (uni)',
                    'velocyto': 'Velocyto'},
    'datasets_name': ['01_bone_marrow',
                    '11_fucci_u2os',
                    '17_pbmc68k'],
    'tasks': ['directional',
                'temporal',
                'negative_control'],
    'k_fold': [3, 3, 5],
    'cluster_key': ['clusters', None,'celltype'],
    'time_key': [None, 'dtime',None],
    'cell_type_transitions':[[("HSC_1", "Ery_1"), ("HSC_1", "HSC_2"), ("Ery_1", "Ery_2")],
                                None,
                                None],
    'time_transitions': [None,
                         [(0, 1), (1, 2), (2, 3), (3, 4)],
                         None]}

# If use full data, please set k_fold to 0

Then, we provide a function to validate the contents of benchmark_info and summarize the benchmark configuration.

[3]:
vev.pp.check_save_summarize_info(benchmark_info)

==================================================
🔹 STEP 1: VALIDATION CHECK
==================================================
✅ Optional Check: 'methods_map' correctly covers all methods.
✅ Validation Successful: Configuration is valid.
💾 File Saved: benchmark_info.pkl

==================================================
🔹 STEP 2: SUMMARY REPORT
==================================================
• Total Methods:  3
  └─ sdevelo (SDEvelo), unitvelo_uni (UniTVelo (uni)), velocyto (Velocyto)
• Total Datasets: 3

📋 Dataset Summary:
[3]:
Dataset Task Dataset Names Count
0 directional 01_bone_marrow 1
1 temporal 11_fucci_u2os 1
2 negative_control 17_pbmc68k 1
[4]:
with open("benchmark_info.pkl", "rb") as f:
    benchmark_info = pickle.load(f)

You are now ready to begin post-processing.

[5]:
vev.pp.run_postprocessing(benchmark_info, base_dir='./demo', n_jobs=20)
🚀 Starting post-processing for 3 datasets...
Processing 11_fucci_u2os:  33%|███▎      | 1/3 [00:16<00:33, 16.61s/dataset]
✅ Successfully processed 01_bone_marrow.
Processing 17_pbmc68k:  67%|██████▋   | 2/3 [00:22<00:10, 10.31s/dataset]
✅ Successfully processed 11_fucci_u2os.
Processing 17_pbmc68k: 100%|██████████| 3/3 [02:20<00:00, 46.84s/dataset]
✅ Successfully processed 17_pbmc68k.

✅ Post-processing completed.

In the next tutorial, we will cover the evaluation process!