Each example is a self-contained directory under
examples/ in the registry repo, with a pinned
bv.toml, a bv.lock, and a runnable script.
bv sync reproduces the exact bytes anywhere; the example's
run.sh drives the pipeline. Two flavors:
bv-authored pipelines
Germline SNV and indel calling from short reads, mirroring the
GATK best-practices recipe through annotated VCF.
fastpbwa-mem2samtools
picardgatk4bcftools
vcftoolsensembl-vepmultiqc
examples/variant-calling →
Selective-alignment quantification with Salmon. Output drops straight
into tximport / DESeq2.
fastpsalmonsamtools
multiqc
examples/rnaseq-bulk →
Oxford Nanopore genome assembly + polish + QC. Flye, then medaka, then
BUSCO and QUAST to score the result.
choppernanostatflye
minimap2medakabusco
quastsamtools
examples/longread-assembly →
Shotgun metagenomics: read-based taxonomy (kraken2/bracken,
metaphlan4), function (humann), assembly + binning (megahit,
metabat2), and MAG QC (checkm2, gtdb-tk, bakta).
fastpkraken2bracken
metaphlan4humannmegahit
metabat2checkm2gtdb-tk
baktamultiqc
examples/metagenomics →
Backbone → ProteinMPNN → ColabFold → TM-align filter → Foldseek search
for natural homologs. Uses bv's GPU-layer sharing across MPNN and
ColabFold.
proteinmpnncolabfold
foldseektmalignusalign
examples/protein-design →
"What does this protein do?" Fold first with ColabFold, then search by
structure with Foldseek and re-rank with US-align.
colabfoldfoldseek
tmalignusalignmmseqs2
examples/structure-search →
Genomes → orthogroups → per-OG MSA + trimming + ML gene trees →
concatenated species tree with bootstrap support.
prodigalorthofindermafft
trimaliqtree2fasttree
treetime
examples/phylo-pipeline →
Single-end ChIP-seq with input control: bowtie2 → dedup → MACS3 narrow
peaks → HOMER motifs → deepTools coverage tracks.
fastpbowtie2samtools
picardmacs3homer
deeptoolsmultiqc
examples/chipseq →
Bacterial AMR profiling from short reads: assembly with SPAdes,
annotation with bakta, AMR with three databases (abricate, AMRFinder+,
RGI/CARD) for cross-validation.
fastpspadesbakta
abricateamrfinderplusrgi
examples/amr-surveillance →
ports of published pipelines
Each port pins exact tool versions for one published paper so the
pipeline is reproducible by digest. The science still belongs
to the original authors - cite their papers, not bv.
foldseek (2024)
van Kempen et al., Nat Biotechnol
Reproduces the SCOPe40 structure-search benchmarks: fold queries with
ColabFold, search with Foldseek and MMseqs2, ground-truth with
US-align.
colabfoldfoldseekmmseqs2
tmalignusalign
examples/papers/foldseek-2024 →
RFdiffusion (2023) - validation half
Watson et al., Nature
The half of the design loop that's fully reproducible without
RFdiffusion's custom kernels: ProteinMPNN sequence design + ColabFold
validation + structural-novelty Foldseek search.
proteinmpnncolabfold
foldseektmalignusalign
examples/papers/rfdiffusion-2023 →
CheckM2 (2023)
Chklovski et al., Nat Methods
Recovers MAGs from a metagenome and scores them with CheckM2's ML
quality classifier. Goes through the full assemble + bin + score +
dereplicate + GTDB place + annotate pipeline so the comparison with
CheckM1 can be reproduced locally.
fastpmegahitbwa-mem2
metabat2checkm2drep
gtdb-tkbaktasamtools
examples/papers/checkm2-2023 →
Nextstrain (2018)
Hadfield et al., Bioinformatics
The phylogenetic build layer that augur orchestrates, replaced with
direct bv exec calls so every step is digest-pinned. Aligns,
infers an ML tree, dates it with treetime, and assigns clades.
mafftiqtree2treetime
nextclademinimap2mash
seqkitsamtools
examples/papers/nextstrain-2018 →
More candidates in
SHORTLIST.md
(Tara Oceans, GATK best-practices, ESM-Atlas, EMP 16S meta-analysis,
HOMER ChIP-seq motifs). Pull requests welcome.