for end users
running tools in your projecttl;dr
# 1. install a runtime (pick one)
curl -fsSL https://get.docker.com/rootless | sh # docker, on a laptop
conda install -c conda-forge apptainer # apptainer, on HPC
# 2. install bv
curl -fsSL https://raw.githubusercontent.com/tejasprabhune/bv/main/install.sh | sh
# 3. add a tool, call its binary, commit the lockfile
mkdir myproj && cd myproj
bv add blast
bv run blastn -version
git add bv.toml bv.lock && git commit -m "pin tools"
# 4. on any other machine: same images, by digest
bv sync
install
bv needs Docker or Apptainer/Singularity, plus git.
Docker is typical on a laptop. Use the rootless installer when you can; on a GPU box install nvidia-container-toolkit too.
Apptainer is typical on shared HPC nodes since it does not need a daemon or root. Install with conda: conda install -c conda-forge apptainer.
Then install bv itself:
curl -fsSL https://raw.githubusercontent.com/tejasprabhune/bv/main/install.sh | sh
# or, with cargo:
cargo install biov
Verify with bv doctor, which prints the runtimes it can see and any missing pieces.
core commands
| command | what it does |
|---|---|
bv add <tool>[@ver] [--jobs N] | resolve from the registry, pull the image, write bv.toml and bv.lock, generate shims. --jobs controls concurrent image pulls (default: min(8, num_cpus)); also settable via BV_JOBS |
bv run <binary> <args> | run a binary inside its container; current directory is mounted at /workspace |
bv exec [--no-sync] <cmd> | run any command with the project's binaries on PATH (good for scripts, Make, Snakemake, CI). auto-syncs before running unless --no-sync or BV_EXEC_NO_SYNC=1 is set |
bv shell | interactive subshell with the project active |
bv sync [--jobs N] | pull every image pinned in bv.lock by digest and regenerate shims. --jobs controls concurrent pulls |
bv list [--binaries] | show installed tools, or the binary routing table |
bv search <query> | search the registry by name, description, or I/O type |
bv show <tool> [--format json|mcp|json-schema] | typed I/O schema and metadata |
bv lock [--check] | regenerate bv.lock; --check exits 1 if anything would change |
bv cache size | list | prune | inspect and clean the local bv cache (tool manifests, SIFs, indexes, datasets). see cache management |
bv export --format conda [-o file] | emit a conda environment.yml from bv.lock for sharing with conda users. see bv export |
bv doctor | environment check (runtimes, GPU, project state) |
bv.toml & bv.lock
bv.toml is what you wrote (or what bv add wrote for you). bv.lock is the resolved, pinned state. Commit both. .bv/ (generated shim directory) is gitignored automatically.
# bv.toml
[project]
name = "myproj"
[[tools]]
id = "blast"
version = "=2.15.0"
[[tools]]
id = "hmmer"
[runtime]
backend = "auto" # docker | apptainer | auto (default)
bv exec automatically runs bv sync before the command if the lockfile has drifted (stale images or bv.toml newer than bv.lock), matching uv run-style behavior. Pass --no-sync or set BV_EXEC_NO_SYNC=1 to skip this check and run immediately.
If two tools expose the same binary name, bv lock fails with a clear error. Resolve it with [binary_overrides]:
[binary_overrides]
samtools = "samtools" # this tool wins for the `samtools` shim
caches
Apptainer runs containers with a read-only root filesystem, so any tool that downloads model weights or scratches to disk inside the image will fail (think ColabFold writing to /cache/colabfold). bv binds writable host directories at the right paths automatically. The set of paths is resolved in three layers:
- Tool manifest (
cache_paths) : the tool author's authoritative list. ColabFold's manifest declarescache_paths = ["/cache/colabfold"]. - Your
[[cache]]entries inbv.toml: add new paths or redirect any existing path to a different host directory. - Apptainer fallbacks : for tools that haven't declared cache paths yet, bv auto-binds
/cacheand/root/.cache.
Default host path: ~/.cache/bv/<tool>/<slug>. Docker skips the apptainer fallbacks because its writable upper layer already covers them; manifest and user entries apply on both backends.
# bv.toml : redirect colabfold weights to a shared NFS cache
[[cache]]
match = "colabfold"
container_path = "/cache/colabfold"
host_path = "/srv/shared/colabfold-weights"
# add an extra path for every tool
[[cache]]
match = "*"
container_path = "/tmp/scratch"
host_path = "~/.cache/bv/{tool}/scratch"
The {tool} token is replaced with the tool id; ~ expands to $HOME.
cache management
bv's local cache holds tool manifests, Apptainer SIF files, registry index clones, and reference datasets. Three subcommands let you inspect and clean it:
| command | what it does |
|---|---|
bv cache size | print a per-category breakdown (manifests, SIFs, indexes, datasets, tmp) and total bytes |
bv cache list | tabular view of every cached entry: category, name, size, and path |
bv cache prune [flags] | remove cache entries not referenced by any known bv.lock, then prompt before deleting |
bv cache prune flags:
| flag | effect |
|---|---|
--dry-run | print what would be removed without deleting anything |
--yes | skip the confirmation prompt (good for CI/scripts) |
--all | remove everything regardless of reachability (full wipe) |
--keep-recent N | retain the N most recently used unreferenced manifest versions per tool |
Reachability is computed from the union of bv.lock files bv can find: $PWD/bv.lock plus every bv.lock under directories listed in BV_KNOWN_PROJECTS (colon-separated paths). Tmp entries older than one hour are always swept. Non-default index clones older than 30 days are pruned regardless of reachability. Docker images pulled by bv that are no longer referenced by any lockfile are also offered for removal via docker rmi.
bv cache size
bv cache list
bv cache prune --dry-run
bv cache prune --yes
BV_KNOWN_PROJECTS=/srv/projects/proj1:/srv/projects/proj2 bv cache prune
bv export
bv export reads bv.lock and produces a conda environment.yml so collaborators who use conda instead of bv can reproduce the same tool versions.
bv export --format conda # print to stdout
bv export --format conda -o environment.yml
For every tool whose image came from quay.io/biocontainers/*, the YAML lists a pinned bioconda::<name>=<version> dependency. Tools from custom OCI images (e.g. ghcr.io/*) have no conda equivalent; they appear in a comment block at the bottom of the file so you know what to install by hand.
example output for a project with blast and a custom image:
name: myproj
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::blast=2.15.0
- bioconda::hmmer=3.4
# Tools that have no known conda/bioconda equivalent. These come from
# custom OCI images and would need to be installed by hand:
# - genie2 (image: ghcr.io/tejasprabhune/genie2:1.0.0)
only --format conda is supported today; pixi and Dockerfile export are planned.
backends
bv auto-detects what runtime is available. Pin it explicitly with --backend, the BV_BACKEND env var, or [runtime] backend = "apptainer" in bv.toml.
| feature | docker | apptainer |
|---|---|---|
| root needed | daemon needs privileges (rootless mode available) | no |
| GPU flag | --gpus all | --nv |
| image cache | docker image store | SIF files in ~/.local/share/bv/sif |
| writable container FS | yes (upper layer) | no (use cache mounts) |
For ghcr.io images, bv can pull blobs directly via HTTPS instead of going through the Docker Desktop VM. It fetches all layer blobs concurrently, assembles an OCI Image Layout tar in memory, and loads it via docker load. This native pull path requires Docker 28+ and is used automatically when the image registry is ghcr.io. Docker Hub and quay.io images still go through the normal docker pull path. Credentials are read from ~/.docker/config.json (credential helpers, credHelpers, and static base64 auths entries are all supported).
reference data
Some tools need large reference databases (kraken2, blast pdbaa, etc.). The manifest declares them; bv add tells you what's needed; bv data fetch downloads.
bv add kraken2
bv data fetch pdbaa --yes
bv run kraken2 ... # bv mounts the data directory automatically
troubleshooting
Image not available locallyafterbv add: runbv syncto pull by digest.- Read-only filesystem errors from a tool on apptainer: the tool writes to a path bv hasn't bound. Add a
[[cache]]entry, or open an issue against the registry asking the maintainer to add it tocache_paths. - GPU not detected: check
nvidia-smifirst, thenbv doctor. On Docker, you neednvidia-container-toolkit; on Apptainer, the host's NVIDIA libraries. - Conflicting binary names across tools:
bv locktells you exactly which binaries collide. Use[binary_overrides].
for publishers
adding your tool to the registrytl;dr
# in a directory with a Dockerfile (or pointing at a github repo)
bv publish .
# or:
bv publish github:user/repo@v1.0.0
# answer the prompts (name, version, description, typed I/O),
# bv builds the image, pushes to ghcr, and opens a PR to bv-registry.
manifest schema
A manifest lives at tools/<id>/<version>.toml in bv-registry. The full reference is in SCHEMA.md; below is the cheat sheet.
[tool]
id = "colabfold"
version = "1.6.0"
description = "ColabFold: fast protein structure prediction"
homepage = "https://github.com/sokrypton/ColabFold"
license = "MIT"
tier = "core" # core | community | experimental
maintainers = ["github:sokrypton"]
[tool.image]
backend = "docker"
reference = "ghcr.io/sokrypton/colabfold:1.6.0-cuda12"
# digest is added automatically at lock time
[tool.hardware]
cpu_cores = 8
ram_gb = 16.0
disk_gb = 10.0
[tool.hardware.gpu]
required = true
min_vram_gb = 8
cuda_version = "12.0"
[[tool.inputs]]
name = "fasta"
type = "fasta[protein]"
cardinality = "one"
description = "Input protein sequences"
[[tool.outputs]]
name = "output_dir"
type = "dir"
cardinality = "one"
description = "Predicted structures and confidence scores"
[tool.entrypoint]
command = "colabfold_batch"
args_template = "--num-recycle 3 {fasta} {output_dir}"
# Container paths the tool writes to. bv binds these to writable host
# dirs (critical on apptainer's read-only SIF root). Skip if the tool
# does not write inside the image.
cache_paths = ["/cache/colabfold"]
[tool.binaries]
exposed = ["colabfold_batch"]
bv-types vocabulary (fasta, fasta[protein], blast_tab, pdb, etc.). Tools without typed I/O sit in the experimental tier and are hidden from default search results.
bv publish
bv publish handles fetching the source, generating a Dockerfile if you don't have one, building the image, pushing to GHCR, and opening the registry PR. It can run interactively or read a bv-publish.toml config.
bv publish ./my-tool # local directory, interactive
bv publish github:user/repo@v2.1.0 # github source, clones at the tag
bv publish . --non-interactive # CI mode (reads bv-publish.toml)
bv publish . --no-push --no-pr # dry run: build, print manifest, exit
sources it accepts
- local directory:
bv publish ./my-tooluses the directory as-is. The directory name becomes the tool-name hint;git describe --tagsin that directory becomes the version hint. - github:
bv publish github:owner/repo[@ref]shallow-clones the repo into a tempdir.repobecomes the tool-name hint. The version hint comes from the@ref(with leadingvstripped) or, if omitted, the latest tag in the clone.
build systems it knows about
If your repo doesn't ship a Dockerfile, bv generates a Dockerfile.bv based on whichever of these it finds first (in order):
| looks for | base image | generated build step |
|---|---|---|
Dockerfile | (used as-is) | no generation |
environment.yml / environment.yaml | mambaorg/micromamba:1.5 | micromamba install -y -n base -f env.yml |
pyproject.toml with [build-system] | python:3.11-slim | pip install --no-cache-dir . |
requirements.txt | python:3.11-slim | pip install --no-cache-dir -r requirements.txt |
Cargo.toml | rust:1.75 → debian:bookworm-slim | multi-stage: cargo build --release, copy binaries to /usr/local/bin |
Makefile | debian:bookworm-slim + build-essential | make |
If none match and there's no Dockerfile, bv publish fails with a clear error: add a Dockerfile or write a bv-publish.toml. The generated Dockerfile.bv is left in your working directory; commit it (or replace it with your own) before re-running.
where it gets the manifest fields
| manifest field | source |
|---|---|
id (tool name) | directory name or repo name; overridable with --tool-name or the interactive prompt. |
version | --tool-version → git tag from @ref → git describe --tags → prompt. |
description, homepage, license | prompts in interactive mode; from bv-publish.toml in non-interactive. |
[[tool.inputs]] / [[tool.outputs]] | prompts (one I/O per add); types must be in the bv-types vocabulary. |
[tool.hardware] | prompts for cpu / ram / disk / GPU; sensible defaults filled in. |
[tool.entrypoint] | prompted; usually the binary name your Dockerfile installs. |
tier | always starts as community. Promotion is a separate registry PR. |
[tool.image].digest | computed automatically from docker manifest inspect after the push. |
what it pushes, and where
- image: built and pushed to
ghcr.io/<your-github-username>/<tool>:<version>by default, so you don't need write access to any shared org: a normal GitHub token (withwrite:packagesscope) is enough. Override the namespace with--push-to <org>if you want to push somewhere else, like a lab-shared GHCR org.bvlogs in to GHCR with yourGHCR_TOKEN, falling back toGITHUB_TOKEN. Multi-arch builds use--platform. - digest: resolved by
docker manifest inspectimmediately after the push and embedded in the manifest[tool.image].digestfor reproducibility. - registry PR: a new branch
add-<tool>-<version>is opened againsttejasprabhune/bv-registry(override with--registry-repo owner/repo), addingtools/<tool>/<version>.toml. The PR body links back to your source URL (the local path orgithub.com/owner/repo).
Skip stages with flags: --no-push stops after building (manifest is printed; no GHCR write); --no-pr pushes the image but doesn't open the PR; passing both is a useful dry run.
For a release-on-tag GitHub Action, drop this into .github/workflows/bv-publish.yml:
on:
release:
types: [published]
jobs:
publish:
uses: tejasprabhune/bv/.github/workflows/bv-publish.yml@main
with:
tool-name: my-tool
secrets:
GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }}
BV_REGISTRY_TOKEN: ${{ secrets.BV_REGISTRY_TOKEN }}
conformance tests
By default, bv conformance <tool> pulls your image and smoke-tests every binary you declared in [tool.binaries]. For each one it tries --version, -version, --help, -h, -v, version in order, and considers the binary alive if any of them exits 0. Most tools need no extra config.
Run it locally before opening the PR:
bv conformance my-tool
bv conformance my-tool --backend apptainer
For unusual binaries, add a [tool.smoke] block:
[tool.smoke]
# Pin a specific probe arg for binaries that don't accept any of the defaults.
probes = { weird-tool = "--check", another = "" } # "" runs the binary with no args
# Skip binaries with no safe non-destructive invocation (daemons, REPLs that
# wait on stdin forever). They still get shims; conformance just skips them.
skip = ["server-daemon"]
Conformance runs in CI on every registry PR. Today it's a smoke check only; running tools on canonical inputs and validating typed outputs is on the v2 roadmap.
tiers
| tier | requirements |
|---|---|
experimental | basic checks pass; may lack typed I/O. Hidden from default search. |
community | typed I/O present, conformance passes, manifest valid. Default for new submissions. |
core | actively maintained, recognized publisher, runs on docker and apptainer, conformance passes on both. |
Promotion is a separate PR by a registry maintainer. See GOVERNANCE.md for the full criteria.
new versions
One file per version. Add tools/<id>/<newver>.toml; do not edit the old one. The website and bv search surface the latest available version per tool by default; users can request older versions explicitly with bv add <tool>@<ver>.
for maintainers
running bv-ingest, reviewing PRs, promoting toolstl;dr
# nightly auto-ingestion (also runs from .github/workflows/)
bv-ingest run --limit 50
# review the staging PRs that need typed I/O
bv-ingest review --staging-dir ./bv-registry-staging
# promote a reviewed tool from staging to bv-registry
bv-ingest promote samtools 1.20 --staging-dir ./bv-registry-staging
the pipeline
bv-ingest turns Bioconda recipes into draft manifests so the registry stays current without manual scraping. Two repos cooperate:
- bv-registry-staging : auto-generated drafts. Manifests start in
communitytier without typed I/O. PRs land here automatically. - bv-registry : the live registry users pull from. Tools graduate here only after a maintainer reviews and promotes them.
End-to-end:
- fetch recipes: clone bioconda-recipes, parse
meta.yamlfor build / test / run_exports, derive binary names. - resolve images: query quay.io/biocontainers for the matching tag and digest. Skip tools without a published BioContainer.
- generate manifest: write
tools/<id>/<version>.tomlin staging with hardware defaults, exposed binaries, but no typed I/O. - open PR: one PR per (tool, version) against bv-registry-staging.
- review: maintainer adds
[[tool.inputs]]/[[tool.outputs]]using thebv-typesvocabulary, and (if needed) a[tool.smoke]override. - promote:
bv-ingest promoteopens a PR to bv-registry. CI runs conformance; merge ships it.
bv-ingest commands
| command | what it does |
|---|---|
bv-ingest run [--dry-run] [--limit N] [--tool ID] | full pipeline. Default opens PRs against BV_STAGING_REPO. --dry-run prints what would happen. |
bv-ingest review --staging-dir <path> | list manifests still missing typed I/O, or with --show TOOL/VERSION, dump one for review. |
bv-ingest promote <tool> <version> --staging-dir <path> | copy the reviewed manifest from staging to bv-registry and open a PR. |
bv-ingest status --staging-dir <path> | count of staged, reviewed, and promoted manifests. |
Common env vars:
BV_STAGING_REPO = "tejasprabhune/bv-registry-staging"
BV_REGISTRY_REPO = "tejasprabhune/bv-registry"
BV_BIOCONDA_CACHE = "/var/tmp/bioconda-recipes" # local clone, optional
GITHUB_TOKEN = ... # falls back to `gh auth token`
reviewing PRs
Auto-generated PRs come in tagged auto-ingest. The fast path:
- Check the upstream tool's docs to identify what file types its main entrypoint reads and writes.
- Add
[[tool.inputs]]/[[tool.outputs]]blocks. If a needed type does not exist inbv-types, add it there first (separate PR). - If the tool downloads model weights or writes large scratch state inside the image, add
cache_paths = [...]. Look for clues in the upstream Dockerfile (WORKDIR,VOLUME) or runbv run <tool>on apptainer and watch for read-only-fs errors. - Add a
[tool.smoke]override only if any binary needs a non-default probe (the default loop covers--version,-version,--help,-h,-v,version). - Run
bv conformance <tool>locally on both backends. - Approve the staging PR. Once merged, run
bv-ingest promote.
promoting to core
A tool moves from community to core only when:
- typed I/O is complete and uses canonical types (no ad-hoc
stringplaceholders); - the maintainer is a recognized publisher (project author, lab, or accepted volunteer maintainer);
- conformance passes on both docker and apptainer in CI;
- the tool has had at least one published version active for 30 days without an unfixed issue tagged
broken.
Open a separate PR labelled tier-promote that flips tier = "core". Two maintainer approvals merge it.