dev setup
contributing to the bv codebase or registrygetting started
You need Rust stable, Docker (for integration tests), and git.
git clone https://github.com/tejasprabhune/bv
cd bv
cargo build # build all workspace crates
cargo test # unit tests only (no Docker needed)
cargo test --test integration -- --include-ignored # integration tests (needs Docker)
Set BV_CACHE_DIR to a temp path so each integration test run starts with a clean cache:
BV_CACHE_DIR=/tmp/bv-test cargo test --test integration -- --include-ignored
Build a release binary and put it on your PATH for manual testing:
cargo build --release -p bv-cli
export PATH="$PWD/target/release:$PATH"
bv doctor
workspace layout
| crate | what it is |
|---|---|
bv-cli | the bv binary. All commands live here. Depends on everything below. |
bv-core | shared types: Manifest, Lockfile, CacheLayout, error types. No I/O. |
bv-index | IndexBackend trait plus GitIndex, which clones bv-registry and resolves tool lookups. |
bv-runtime | ContainerRuntime trait plus DockerRuntime. Handles pull, run, inspect, image availability. |
bv-runtime-apptainer | Apptainer/Singularity backend. Converts OCI images to SIF files, runs with --nv for GPU. |
bv-types | The bv-types vocabulary: FASTA, FASTQ, BAM, VCF, and ~40 others. Used for typed I/O in manifests. |
bv-conformance | Library for smoke-testing a tool's declared binaries. Used by bv conformance and the registry CI. |
bv-builder | Builds factored OCI images from conda specs (one layer per package). Used only in registry CI, not shipped in the user-facing binary. |
bv-bench | Install-path benchmark harness: compares bv against mamba, conda, and pixi on install time, footprint, and cold-run latency. |
bv codebase
working on the bv CLI and runtimedev commands
| command | when to use it |
|---|---|
cargo build -p bv-cli | build just the CLI (faster than building the whole workspace) |
cargo test -p bv-core | unit tests for a single crate |
cargo test --test integration -- --include-ignored | full integration suite; needs a running Docker daemon |
cargo clippy -- -D warnings | must be clean before opening a PR |
cargo fmt --check | formatting check; run cargo fmt to fix |
cargo deny check | license and vulnerability audit (uses deny.toml) |
adding a command
- Add a variant to the relevant
Commandsor sub-command enum inbv-cli/src/cli.rs. - Add a
pub (async) fn run(...)inbv-cli/src/commands/<name>.rs. - Wire it in
bv-cli/src/commands/mod.rsand the dispatch match inbv-cli/src/main.rs. - Add an integration test in
bv-cli/tests/integration.rs. Mark#[ignore]if Docker is required.
All user-visible output goes to stderr. Only structured data (tables, JSON) goes to stdout.
conformance (bv-conformance)
bv-conformance is the library behind bv conformance <tool>. It pulls the tool's image and smoke-checks each binary declared in [tool.binaries] by trying --version, -version, --help, -h, -v, version, and a bare invocation in order. A binary passes if any probe exits 0 or produces more than 30 bytes of output.
# run locally against the live registry
bv conformance samtools
bv conformance samtools --backend apptainer
# run against a local registry clone (useful when reviewing a new manifest)
bv conformance samtools --registry /path/to/bv-registry
# run against a specific image digest (skip the registry lookup)
bv conformance samtools --digest sha256:abc123...
The library is in bv-conformance/src/runner.rs. The entry point is runner::run(manifest, image_digest, runtime), which returns a ConformanceResult with per-binary pass/fail messages.
tools/**, and on a weekly schedule for the full set of core tools. See .github/workflows/conformance.yml in bv-registry.
conformance sweep
A conformance sweep runs bv conformance --all against every tool in the registry to verify that each tool's declared binaries produce correct output. Run a sweep before merging changes that affect many tools, or after a builder upgrade.
prerequisites
- Docker running locally
bvinstalled and on your PATH- The registry cloned locally (only needed if you pass
--registry)
running a sweep
# full sweep, skipping GPU-only tools and tools that require reference data
bv conformance --all --skip-gpu --skip-reference-data
# single tool
bv conformance samtools
output status codes
| status | meaning |
|---|---|
PASS | all declared binaries responded to at least one probe |
FAIL | one or more binaries did not respond to any probe |
SKIP | tool was excluded by a flag (--skip-gpu, --skip-reference-data) or has no declared binaries |
ERR | the tool's image could not be pulled or the container failed to start |
when a tool fails
- Check the tool's manifest
conformancesection to see which binaries are declared and what probes are expected. - Rebuild the image using
bv-builderor by re-running thebuild-factored.ymlworkflow. - Re-run
bv conformance <tool-name>to confirm the fix before opening a PR.
conformance.yml CI workflow enforces this automatically on every PR that touches tools/**.
benchmark (bv-bench)
bv-bench compares bv against mamba, conda, and pixi across three metrics: install time, disk footprint, and cold-run latency. It uses two fixture suites:
- mac suite (default): all tools available on osx-arm64. mamba, conda, and pixi succeed on macOS.
- linux suite: includes Linux-only tools (blast, diamond, hmmer, mafft, seqkit 2.8.1). mamba/conda/pixi will fail on macOS for some fixtures, which illustrates bv's main advantage.
Each suite has three fixture sizes: 1 tool, 5 tools, and 10 or 20 tools.
# build the harness
cargo build --release -p bv-bench
# mac suite: bv + mamba + conda + pixi
./target/release/bv-bench --suite mac --mamba --conda --pixi
# linux suite: bv + mamba (pixi/conda will show n/a for linux-only fixtures on macOS)
./target/release/bv-bench --suite linux --mamba --conda --pixi
# output as JSON instead of a table
./target/release/bv-bench --suite mac --mamba --json-out results.json
# override the work directory (default: /tmp/bv-bench)
./target/release/bv-bench --suite mac --mamba --work-dir /tmp/my-bench
how footprint is measured
bv reads image_size_bytes from the bv.lock file that bv add writes. This is the compressed OCI image size for each tool. mamba, conda, and pixi report the uncompressed size of the conda environment directory.
The fixtures are defined in bv-bench/src/fixture.rs. The install paths (bv, mamba, conda, pixi) are in bv-bench/src/main.rs.
bv-registry
adding tools, rebuilding images, understanding CIlayout
bv-registry/
tools/ # manifests, one .toml per (tool, version)
samtools/
1.21.0.toml
specs/ # build specs for the factored OCI builder
samtools/
1.21.0.toml
web/ # static website (GitHub Pages)
index.json # pre-computed search index, updated by CI
scripts/ # update-index.py, update-popularity.py
A tool entry needs both a manifest (tools/) and a build spec (specs/). The manifest describes the tool's I/O, hardware, and entrypoint. The spec tells the builder which conda packages to include and which platform to target.
adding a tool spec
For conda-based tools (the common case), add two files:
# 1. build spec: tells bv-builder what to build
cat > specs/seqtk/1.4.0.toml << 'EOF'
name = "seqtk"
version = "1.4.0"
channels = [
"https://conda.anaconda.org/bioconda",
"https://conda.anaconda.org/conda-forge",
]
packages = ["seqtk ==1.4.0"]
platform = "linux/amd64"
[entrypoint]
command = "/opt/conda/bin/seqtk"
EOF
# 2. tool manifest: describes the tool for users
cat > tools/seqtk/1.4.0.toml << 'EOF'
[tool]
id = "seqtk"
version = "1.4.0"
description = "seqtk: toolkit for processing FASTA/FASTQ sequences"
homepage = "https://github.com/lh3/seqtk"
license = "MIT"
tier = "community"
maintainers = ["github:lh3"]
[[tool.inputs]]
name = "reads"
type = "fastq"
cardinality = "one"
description = "Input reads"
[[tool.outputs]]
name = "reads_out"
type = "fastq"
cardinality = "one"
description = "Processed reads"
[tool.entrypoint]
command = "seqtk"
EOF
Push to a branch and open a PR. CI builds the image and runs conformance automatically. You do not need to run the builder locally.
[tool.image] section (digest, reference) is written by CI after the build, not by hand. Leave it out of the manifest you submit.
image builder (bv-builder)
bv-builder turns a specs/<tool>/<version>.toml into a factored OCI image: one layer per conda package, pushed to ghcr.io/tejasprabhune/bv-pkg/<tool>:<version>. CI runs the builder automatically on every new or changed spec file.
You rarely need to run it locally, but when you do:
cargo build --release -p bv-builder
# resolve a spec to a pinned package list (dry run, no image built)
./target/release/bv-builder resolve specs/samtools/1.21.0.toml
# build and save as a tar archive (does not push)
./target/release/bv-builder build specs/samtools/1.21.0.toml --out /tmp/samtools.tar
# build and push to GHCR (needs GHCR_TOKEN or GITHUB_TOKEN in env)
./target/release/bv-builder build specs/samtools/1.21.0.toml --push
how it works
- resolve: downloads
repodata.jsonfor each channel, BFS-resolves transitive deps with numeric version comparison. - build layers: downloads each conda package (
.condaor.tar.bz2), extracts it into a reproducible tar layer. One layer per package. - push: pushes all layers to GHCR, retrying on 429 rate-limit responses with exponential backoff (up to 8 attempts).
- update manifest: CI writes the pushed digest back to
tools/<tool>/<version>.tomland commits.
Large tools (metaphlan4 has ~530 packages) take 20+ minutes to push due to GHCR's per-minute blob upload limit.
CI workflows
| workflow | trigger | what it does |
|---|---|---|
build-factored.yml | push to specs/**, or manual dispatch | builds and pushes OCI images for changed specs; updates digest in tools/ |
conformance.yml | PR touching tools/**, weekly schedule | smoke-tests each declared binary using bv conformance |
update-index.yml | push to tools/** | regenerates index.json for the website and bv search |
update-popularity.yml | weekly schedule | refreshes download counts from conda-stats |
pages.yml | push to web/** | deploys the static site to GitHub Pages |
rerunning a failed build
Build failures are usually transient (network error downloading a conda package, or GHCR rate limit on large images). To rerun a single tool:
gh workflow run build-factored.yml \
--repo tejasprabhune/bv-registry \
-f spec=specs/metaphlan4/4.1.1.toml
To rebuild all tools at once (after a breaking change to the builder, for example):
gh workflow run build-factored.yml --repo tejasprabhune/bv-registry
The detect job runs first and lists every specs/**/*.toml; each spec becomes a parallel matrix job.
conventions
code style
- Rust edition 2024; let chains are used throughout.
- All user-visible output goes to
stderr. Only table data and machine-readable output go tostdout. - Color output uses
owo_colorswithif_supports_color; ANSI is stripped automatically in CI. - No em-dashes in comments or strings.
- No
// ----separator blocks or banner comments. Doc comments (///) on public items are welcome. - No multi-line comments where a well-named function or type does the same job.
- No backwards-compatibility shims for removed code. Delete cleanly.
PR conventions
- One logical change per PR.
- Integration tests must pass:
cargo test --test integration -- --include-ignored. cargo clippy -- -D warningsmust be clean.cargo fmt --checkmust pass.- New commands need an integration test. New crate-level features need a unit test.
- Registry manifests: include a build spec (
specs/) and a tool manifest (tools/). Do not hand-write the image digest; CI fills it in.