bv: contributing

dev setup

contributing to the bv codebase or registry

getting started

You need Rust stable, Docker (for integration tests), and git.

git clone https://github.com/tejasprabhune/bv
cd bv
cargo build                          # build all workspace crates
cargo test                           # unit tests only (no Docker needed)
cargo test --test integration -- --include-ignored   # integration tests (needs Docker)

Set BV_CACHE_DIR to a temp path so each integration test run starts with a clean cache:

BV_CACHE_DIR=/tmp/bv-test cargo test --test integration -- --include-ignored

Build a release binary and put it on your PATH for manual testing:

cargo build --release -p bv-cli
export PATH="$PWD/target/release:$PATH"
bv doctor

workspace layout

crate	what it is
`bv-cli`	the `bv` binary. All commands live here. Depends on everything below.
`bv-core`	shared types: `Manifest`, `Lockfile`, `CacheLayout`, error types. No I/O.
`bv-index`	`IndexBackend` trait plus `GitIndex`, which clones bv-registry and resolves tool lookups.
`bv-runtime`	`ContainerRuntime` trait plus `DockerRuntime`. Handles pull, run, inspect, image availability.
`bv-runtime-apptainer`	Apptainer/Singularity backend. Converts OCI images to SIF files, runs with `--nv` for GPU.
`bv-types`	The `bv-types` vocabulary: FASTA, FASTQ, BAM, VCF, and ~40 others. Used for typed I/O in manifests.
`bv-conformance`	Library for smoke-testing a tool's declared binaries. Used by `bv conformance` and the registry CI.
`bv-builder`	Builds factored OCI images from conda specs (one layer per package). Used only in registry CI, not shipped in the user-facing binary.
`bv-bench`	Install-path benchmark harness: compares bv against mamba, conda, and pixi on install time, footprint, and cold-run latency.

bv codebase

working on the bv CLI and runtime

dev commands

command	when to use it
`cargo build -p bv-cli`	build just the CLI (faster than building the whole workspace)
`cargo test -p bv-core`	unit tests for a single crate
`cargo test --test integration -- --include-ignored`	full integration suite; needs a running Docker daemon
`cargo clippy -- -D warnings`	must be clean before opening a PR
`cargo fmt --check`	formatting check; run `cargo fmt` to fix
`cargo deny check`	license and vulnerability audit (uses `deny.toml`)

adding a command

Add a variant to the relevant Commands or sub-command enum in bv-cli/src/cli.rs.
Add a pub (async) fn run(...) in bv-cli/src/commands/<name>.rs.
Wire it in bv-cli/src/commands/mod.rs and the dispatch match in bv-cli/src/main.rs.
Add an integration test in bv-cli/tests/integration.rs. Mark #[ignore] if Docker is required.

All user-visible output goes to stderr. Only structured data (tables, JSON) goes to stdout.

conformance (bv-conformance)

bv-conformance is the library behind bv conformance <tool>. It pulls the tool's image and smoke-checks each binary declared in [tool.binaries] by trying --version, -version, --help, -h, -v, version, and a bare invocation in order. A binary passes if any probe exits 0 or produces more than 30 bytes of output.

# run locally against the live registry
bv conformance samtools
bv conformance samtools --backend apptainer

# run against a local registry clone (useful when reviewing a new manifest)
bv conformance samtools --registry /path/to/bv-registry

# run against a specific image digest (skip the registry lookup)
bv conformance samtools --digest sha256:abc123...

The library is in bv-conformance/src/runner.rs. The entry point is runner::run(manifest, image_digest, runtime), which returns a ConformanceResult with per-binary pass/fail messages.

The conformance suite also runs in registry CI on every PR that touches tools/**, and on a weekly schedule for the full set of core tools. See .github/workflows/conformance.yml in bv-registry.

conformance sweep

A conformance sweep runs bv conformance --all against every tool in the registry to verify that each tool's declared binaries produce correct output. Run a sweep before merging changes that affect many tools, or after a builder upgrade.

prerequisites

Docker running locally
bv installed and on your PATH
The registry cloned locally (only needed if you pass --registry)

running a sweep

# full sweep, skipping GPU-only tools and tools that require reference data
bv conformance --all --skip-gpu --skip-reference-data

# single tool
bv conformance samtools

output status codes

status	meaning
`PASS`	all declared binaries responded to at least one probe
`FAIL`	one or more binaries did not respond to any probe
`SKIP`	tool was excluded by a flag (`--skip-gpu`, `--skip-reference-data`) or has no declared binaries
`ERR`	the tool's image could not be pulled or the container failed to start

when a tool fails

Check the tool's manifest conformance section to see which binaries are declared and what probes are expected.
Rebuild the image using bv-builder or by re-running the build-factored.yml workflow.
Re-run bv conformance <tool-name> to confirm the fix before opening a PR.

Conformance must pass for all affected tools before a PR can be merged to the core or community tier. The conformance.yml CI workflow enforces this automatically on every PR that touches tools/**.

benchmark (bv-bench)

bv-bench compares bv against mamba, conda, and pixi across three metrics: install time, disk footprint, and cold-run latency. It uses two fixture suites:

mac suite (default): all tools available on osx-arm64. mamba, conda, and pixi succeed on macOS.
linux suite: includes Linux-only tools (blast, diamond, hmmer, mafft, seqkit 2.8.1). mamba/conda/pixi will fail on macOS for some fixtures, which illustrates bv's main advantage.

Each suite has three fixture sizes: 1 tool, 5 tools, and 10 or 20 tools.

# build the harness
cargo build --release -p bv-bench

# mac suite: bv + mamba + conda + pixi
./target/release/bv-bench --suite mac --mamba --conda --pixi

# linux suite: bv + mamba (pixi/conda will show n/a for linux-only fixtures on macOS)
./target/release/bv-bench --suite linux --mamba --conda --pixi

# output as JSON instead of a table
./target/release/bv-bench --suite mac --mamba --json-out results.json

# override the work directory (default: /tmp/bv-bench)
./target/release/bv-bench --suite mac --mamba --work-dir /tmp/my-bench

how footprint is measured

bv reads image_size_bytes from the bv.lock file that bv add writes. This is the compressed OCI image size for each tool. mamba, conda, and pixi report the uncompressed size of the conda environment directory.

The fixtures are defined in bv-bench/src/fixture.rs. The install paths (bv, mamba, conda, pixi) are in bv-bench/src/main.rs.

bv-registry

adding tools, rebuilding images, understanding CI

layout

bv-registry/
  tools/          # manifests, one .toml per (tool, version)
    samtools/
      1.21.0.toml
  specs/          # build specs for the factored OCI builder
    samtools/
      1.21.0.toml
  web/            # static website (GitHub Pages)
  index.json      # pre-computed search index, updated by CI
  scripts/        # update-index.py, update-popularity.py

A tool entry needs both a manifest (tools/) and a build spec (specs/). The manifest describes the tool's I/O, hardware, and entrypoint. The spec tells the builder which conda packages to include and which platform to target.

adding a tool spec

For conda-based tools (the common case), add two files:

# 1. build spec: tells bv-builder what to build
cat > specs/seqtk/1.4.0.toml << 'EOF'
name = "seqtk"
version = "1.4.0"
channels = [
    "https://conda.anaconda.org/bioconda",
    "https://conda.anaconda.org/conda-forge",
]
packages = ["seqtk ==1.4.0"]
platform = "linux/amd64"

[entrypoint]
command = "/opt/conda/bin/seqtk"
EOF

# 2. tool manifest: describes the tool for users
cat > tools/seqtk/1.4.0.toml << 'EOF'
[tool]
id = "seqtk"
version = "1.4.0"
description = "seqtk: toolkit for processing FASTA/FASTQ sequences"
homepage = "https://github.com/lh3/seqtk"
license = "MIT"
tier = "community"
maintainers = ["github:lh3"]

[[tool.inputs]]
name = "reads"
type = "fastq"
cardinality = "one"
description = "Input reads"

[[tool.outputs]]
name = "reads_out"
type = "fastq"
cardinality = "one"
description = "Processed reads"

[tool.entrypoint]
command = "seqtk"
EOF

Push to a branch and open a PR. CI builds the image and runs conformance automatically. You do not need to run the builder locally.

The [tool.image] section (digest, reference) is written by CI after the build, not by hand. Leave it out of the manifest you submit.

image builder (bv-builder)

bv-builder turns a specs/<tool>/<version>.toml into a factored OCI image: one layer per conda package, pushed to ghcr.io/tejasprabhune/bv-pkg/<tool>:<version>. CI runs the builder automatically on every new or changed spec file.

You rarely need to run it locally, but when you do:

cargo build --release -p bv-builder

# resolve a spec to a pinned package list (dry run, no image built)
./target/release/bv-builder resolve specs/samtools/1.21.0.toml

# build and save as a tar archive (does not push)
./target/release/bv-builder build specs/samtools/1.21.0.toml --out /tmp/samtools.tar

# build and push to GHCR (needs GHCR_TOKEN or GITHUB_TOKEN in env)
./target/release/bv-builder build specs/samtools/1.21.0.toml --push

how it works

resolve: downloads repodata.json for each channel, BFS-resolves transitive deps with numeric version comparison.
build layers: downloads each conda package (.conda or .tar.bz2), extracts it into a reproducible tar layer. One layer per package.
push: pushes all layers to GHCR, retrying on 429 rate-limit responses with exponential backoff (up to 8 attempts).
update manifest: CI writes the pushed digest back to tools/<tool>/<version>.toml and commits.

Large tools (metaphlan4 has ~530 packages) take 20+ minutes to push due to GHCR's per-minute blob upload limit.

CI workflows

workflow	trigger	what it does
`build-factored.yml`	push to `specs/**`, or manual dispatch	builds and pushes OCI images for changed specs; updates digest in `tools/`
`conformance.yml`	PR touching `tools/**`, weekly schedule	smoke-tests each declared binary using `bv conformance`
`update-index.yml`	push to `tools/**`	regenerates `index.json` for the website and `bv search`
`update-popularity.yml`	weekly schedule	refreshes download counts from conda-stats
`pages.yml`	push to `web/**`	deploys the static site to GitHub Pages

rerunning a failed build

Build failures are usually transient (network error downloading a conda package, or GHCR rate limit on large images). To rerun a single tool:

gh workflow run build-factored.yml \
  --repo tejasprabhune/bv-registry \
  -f spec=specs/metaphlan4/4.1.1.toml

To rebuild all tools at once (after a breaking change to the builder, for example):

gh workflow run build-factored.yml --repo tejasprabhune/bv-registry

The detect job runs first and lists every specs/**/*.toml; each spec becomes a parallel matrix job.

conventions

code style

Rust edition 2024; let chains are used throughout.
All user-visible output goes to stderr. Only table data and machine-readable output go to stdout.
Color output uses owo_colors with if_supports_color; ANSI is stripped automatically in CI.
No em-dashes in comments or strings.
No // ---- separator blocks or banner comments. Doc comments (///) on public items are welcome.
No multi-line comments where a well-named function or type does the same job.
No backwards-compatibility shims for removed code. Delete cleanly.

PR conventions

One logical change per PR.
Integration tests must pass: cargo test --test integration -- --include-ignored.
cargo clippy -- -D warnings must be clean.
cargo fmt --check must pass.
New commands need an integration test. New crate-level features need a unit test.
Registry manifests: include a build spec (specs/) and a tool manifest (tools/). Do not hand-write the image digest; CI fills it in.