bv: docs

for end users

running tools in your project

tl;dr

# 1. install a runtime (pick one)
curl -fsSL https://get.docker.com/rootless | sh        # docker, on a laptop
conda install -c conda-forge apptainer                 # apptainer, on HPC

# 2. install bv
curl -fsSL https://raw.githubusercontent.com/tejasprabhune/bv/main/install.sh | sh

# 3. add a tool, call its binary, commit the lockfile
mkdir myproj && cd myproj
bv add blast
bv run blastn -version
git add bv.toml bv.lock && git commit -m "pin tools"

# 4. on any other machine: same images, by digest
bv sync

install

bv needs Docker or Apptainer/Singularity, plus git.

Docker is typical on a laptop. Use the rootless installer when you can; on a GPU box install nvidia-container-toolkit too.

Apptainer is typical on shared HPC nodes since it does not need a daemon or root. Install with conda: conda install -c conda-forge apptainer.

Then install bv itself:

curl -fsSL https://raw.githubusercontent.com/tejasprabhune/bv/main/install.sh | sh
# or, with cargo:
cargo install biov

Verify with bv doctor, which prints the runtimes it can see and any missing pieces.

core commands

command	what it does
`bv add <tool>[@ver] [--jobs N]`	resolve from the registry, pull the image, write `bv.toml` and `bv.lock`, generate shims. `--jobs` controls concurrent image pulls (default: `min(8, num_cpus)`); also settable via `BV_JOBS`
`bv run <binary> <args>`	run a binary inside its container; current directory is mounted at `/workspace`
`bv exec [--no-sync] <cmd>`	run any command with the project's binaries on PATH (good for scripts, Make, Snakemake, CI). auto-syncs before running unless `--no-sync` or `BV_EXEC_NO_SYNC=1` is set
`bv shell`	interactive subshell with the project active
`bv sync [--jobs N]`	pull every image pinned in `bv.lock` by digest and regenerate shims. `--jobs` controls concurrent pulls
`bv list [--binaries]`	show installed tools, or the binary routing table
`bv search <query>`	search the registry by name, description, or I/O type
`bv show <tool> [--format json\|mcp\|json-schema]`	typed I/O schema and metadata
`bv lock [--check]`	regenerate `bv.lock`; `--check` exits 1 if anything would change
`bv cache size \| list \| prune`	inspect and clean the local bv cache (tool manifests, SIFs, indexes, datasets). see cache management
`bv export --format conda [-o file]`	emit a conda `environment.yml` from `bv.lock` for sharing with conda users. see bv export
`bv doctor`	environment check (runtimes, GPU, project state)

bv.toml & bv.lock

bv.toml is what you wrote (or what bv add wrote for you). bv.lock is the resolved, pinned state. Commit both. .bv/ (generated shim directory) is gitignored automatically.

# bv.toml
[project]
name = "myproj"

[[tools]]
id = "blast"
version = "=2.15.0"

[[tools]]
id = "hmmer"

[runtime]
backend = "auto"      # docker | apptainer | auto (default)

bv exec automatically runs bv sync before the command if the lockfile has drifted (stale images or bv.toml newer than bv.lock), matching uv run-style behavior. Pass --no-sync or set BV_EXEC_NO_SYNC=1 to skip this check and run immediately.

If two tools expose the same binary name, bv lock fails with a clear error. Resolve it with [binary_overrides]:

[binary_overrides]
samtools = "samtools"   # this tool wins for the `samtools` shim

caches

Apptainer runs containers with a read-only root filesystem, so any tool that downloads model weights or scratches to disk inside the image will fail (think ColabFold writing to /cache/colabfold). bv binds writable host directories at the right paths automatically. The set of paths is resolved in three layers:

Tool manifest (cache_paths) : the tool author's authoritative list. ColabFold's manifest declares cache_paths = ["/cache/colabfold"].
Your [[cache]] entries in bv.toml : add new paths or redirect any existing path to a different host directory.
Apptainer fallbacks : for tools that haven't declared cache paths yet, bv auto-binds /cache and /root/.cache.

Default host path: ~/.cache/bv/<tool>/<slug>. Docker skips the apptainer fallbacks because its writable upper layer already covers them; manifest and user entries apply on both backends.

# bv.toml : redirect colabfold weights to a shared NFS cache
[[cache]]
match = "colabfold"
container_path = "/cache/colabfold"
host_path = "/srv/shared/colabfold-weights"

# add an extra path for every tool
[[cache]]
match = "*"
container_path = "/tmp/scratch"
host_path = "~/.cache/bv/{tool}/scratch"

The {tool} token is replaced with the tool id; ~ expands to $HOME.

cache management

bv's local cache holds tool manifests, Apptainer SIF files, registry index clones, and reference datasets. Three subcommands let you inspect and clean it:

command	what it does
`bv cache size`	print a per-category breakdown (manifests, SIFs, indexes, datasets, tmp) and total bytes
`bv cache list`	tabular view of every cached entry: category, name, size, and path
`bv cache prune [flags]`	remove cache entries not referenced by any known `bv.lock`, then prompt before deleting

bv cache prune flags:

flag	effect
`--dry-run`	print what would be removed without deleting anything
`--yes`	skip the confirmation prompt (good for CI/scripts)
`--all`	remove everything regardless of reachability (full wipe)
`--keep-recent N`	retain the N most recently used unreferenced manifest versions per tool

Reachability is computed from the union of bv.lock files bv can find: $PWD/bv.lock plus every bv.lock under directories listed in BV_KNOWN_PROJECTS (colon-separated paths). Tmp entries older than one hour are always swept. Non-default index clones older than 30 days are pruned regardless of reachability. Docker images pulled by bv that are no longer referenced by any lockfile are also offered for removal via docker rmi.

bv cache size
bv cache list
bv cache prune --dry-run
bv cache prune --yes
BV_KNOWN_PROJECTS=/srv/projects/proj1:/srv/projects/proj2 bv cache prune

bv export

bv export reads bv.lock and produces a conda environment.yml so collaborators who use conda instead of bv can reproduce the same tool versions.

bv export --format conda                  # print to stdout
bv export --format conda -o environment.yml

For every tool whose image came from quay.io/biocontainers/*, the YAML lists a pinned bioconda::<name>=<version> dependency. Tools from custom OCI images (e.g. ghcr.io/*) have no conda equivalent; they appear in a comment block at the bottom of the file so you know what to install by hand.

example output for a project with blast and a custom image:

name: myproj
channels:
  - conda-forge
  - bioconda
dependencies:
  - bioconda::blast=2.15.0
  - bioconda::hmmer=3.4

# Tools that have no known conda/bioconda equivalent. These come from
# custom OCI images and would need to be installed by hand:
#   - genie2 (image: ghcr.io/tejasprabhune/genie2:1.0.0)

only --format conda is supported today; pixi and Dockerfile export are planned.

backends

bv auto-detects what runtime is available. Pin it explicitly with --backend, the BV_BACKEND env var, or [runtime] backend = "apptainer" in bv.toml.

feature	docker	apptainer
root needed	daemon needs privileges (rootless mode available)	no
GPU flag	`--gpus all`	`--nv`
image cache	docker image store	SIF files in `~/.local/share/bv/sif`
writable container FS	yes (upper layer)	no (use cache mounts)

For ghcr.io images, bv can pull blobs directly via HTTPS instead of going through the Docker Desktop VM. It fetches all layer blobs concurrently, assembles an OCI Image Layout tar in memory, and loads it via docker load. This native pull path requires Docker 28+ and is used automatically when the image registry is ghcr.io. Docker Hub and quay.io images still go through the normal docker pull path. Credentials are read from ~/.docker/config.json (credential helpers, credHelpers, and static base64 auths entries are all supported).

reference data

Some tools need large reference databases (kraken2, blast pdbaa, etc.). The manifest declares them; bv add tells you what's needed; bv data fetch downloads.

bv add kraken2
bv data fetch pdbaa --yes
bv run kraken2 ...      # bv mounts the data directory automatically

troubleshooting

Image not available locally after bv add: run bv sync to pull by digest.
Read-only filesystem errors from a tool on apptainer: the tool writes to a path bv hasn't bound. Add a [[cache]] entry, or open an issue against the registry asking the maintainer to add it to cache_paths.
GPU not detected: check nvidia-smi first, then bv doctor. On Docker, you need nvidia-container-toolkit; on Apptainer, the host's NVIDIA libraries.
Conflicting binary names across tools: bv lock tells you exactly which binaries collide. Use [binary_overrides].

for publishers

adding your tool to the registry

tl;dr

# in a directory with a Dockerfile (or pointing at a github repo)
bv publish .
# or:
bv publish github:user/repo@v1.0.0

# answer the prompts (name, version, description, typed I/O),
# bv builds the image, pushes to ghcr, and opens a PR to bv-registry.

manifest schema

A manifest lives at tools/<id>/<version>.toml in bv-registry. The full reference is in SCHEMA.md; below is the cheat sheet.

[tool]
id          = "colabfold"
version     = "1.6.0"
description = "ColabFold: fast protein structure prediction"
homepage    = "https://github.com/sokrypton/ColabFold"
license     = "MIT"
tier        = "core"        # core | community | experimental
maintainers = ["github:sokrypton"]

[tool.image]
backend   = "docker"
reference = "ghcr.io/sokrypton/colabfold:1.6.0-cuda12"
# digest is added automatically at lock time

[tool.hardware]
cpu_cores = 8
ram_gb    = 16.0
disk_gb   = 10.0

[tool.hardware.gpu]
required     = true
min_vram_gb  = 8
cuda_version = "12.0"

[[tool.inputs]]
name        = "fasta"
type        = "fasta[protein]"
cardinality = "one"
description = "Input protein sequences"

[[tool.outputs]]
name        = "output_dir"
type        = "dir"
cardinality = "one"
description = "Predicted structures and confidence scores"

[tool.entrypoint]
command       = "colabfold_batch"
args_template = "--num-recycle 3 {fasta} {output_dir}"

# Container paths the tool writes to. bv binds these to writable host
# dirs (critical on apptainer's read-only SIF root). Skip if the tool
# does not write inside the image.
cache_paths = ["/cache/colabfold"]

[tool.binaries]
exposed = ["colabfold_batch"]

Typed I/O matters. Inputs and outputs use the bv-types vocabulary (fasta, fasta[protein], blast_tab, pdb, etc.). Tools without typed I/O sit in the experimental tier and are hidden from default search results.

bv publish

bv publish handles fetching the source, generating a Dockerfile if you don't have one, building the image, pushing to GHCR, and opening the registry PR. It can run interactively or read a bv-publish.toml config.

bv publish ./my-tool                     # local directory, interactive
bv publish github:user/repo@v2.1.0       # github source, clones at the tag
bv publish . --non-interactive           # CI mode (reads bv-publish.toml)
bv publish . --no-push --no-pr           # dry run: build, print manifest, exit

sources it accepts

local directory: bv publish ./my-tool uses the directory as-is. The directory name becomes the tool-name hint; git describe --tags in that directory becomes the version hint.
github: bv publish github:owner/repo[@ref] shallow-clones the repo into a tempdir. repo becomes the tool-name hint. The version hint comes from the @ref (with leading v stripped) or, if omitted, the latest tag in the clone.

build systems it knows about

If your repo doesn't ship a Dockerfile, bv generates a Dockerfile.bv based on whichever of these it finds first (in order):

looks for	base image	generated build step
`Dockerfile`	(used as-is)	no generation
`environment.yml` / `environment.yaml`	`mambaorg/micromamba:1.5`	`micromamba install -y -n base -f env.yml`
`pyproject.toml` with `[build-system]`	`python:3.11-slim`	`pip install --no-cache-dir .`
`requirements.txt`	`python:3.11-slim`	`pip install --no-cache-dir -r requirements.txt`
`Cargo.toml`	`rust:1.75` → `debian:bookworm-slim`	multi-stage: `cargo build --release`, copy binaries to `/usr/local/bin`
`Makefile`	`debian:bookworm-slim` + `build-essential`	`make`

If none match and there's no Dockerfile, bv publish fails with a clear error: add a Dockerfile or write a bv-publish.toml. The generated Dockerfile.bv is left in your working directory; commit it (or replace it with your own) before re-running.

where it gets the manifest fields

manifest field	source
`id` (tool name)	directory name or repo name; overridable with `--tool-name` or the interactive prompt.
`version`	`--tool-version` → git tag from `@ref` → `git describe --tags` → prompt.
`description`, `homepage`, `license`	prompts in interactive mode; from `bv-publish.toml` in non-interactive.
`[[tool.inputs]]` / `[[tool.outputs]]`	prompts (one I/O per add); types must be in the `bv-types` vocabulary.
`[tool.hardware]`	prompts for cpu / ram / disk / GPU; sensible defaults filled in.
`[tool.entrypoint]`	prompted; usually the binary name your Dockerfile installs.
`tier`	always starts as `community`. Promotion is a separate registry PR.
`[tool.image].digest`	computed automatically from `docker manifest inspect` after the push.

what it pushes, and where

image: built and pushed to ghcr.io/<your-github-username>/<tool>:<version> by default, so you don't need write access to any shared org: a normal GitHub token (with write:packages scope) is enough. Override the namespace with --push-to <org> if you want to push somewhere else, like a lab-shared GHCR org. bv logs in to GHCR with your GHCR_TOKEN, falling back to GITHUB_TOKEN. Multi-arch builds use --platform.
digest: resolved by docker manifest inspect immediately after the push and embedded in the manifest [tool.image].digest for reproducibility.
registry PR: a new branch add-<tool>-<version> is opened against tejasprabhune/bv-registry (override with --registry-repo owner/repo), adding tools/<tool>/<version>.toml. The PR body links back to your source URL (the local path or github.com/owner/repo).

Skip stages with flags: --no-push stops after building (manifest is printed; no GHCR write); --no-pr pushes the image but doesn't open the PR; passing both is a useful dry run.

For a release-on-tag GitHub Action, drop this into .github/workflows/bv-publish.yml:

on:
  release:
    types: [published]
jobs:
  publish:
    uses: tejasprabhune/bv/.github/workflows/bv-publish.yml@main
    with:
      tool-name: my-tool
    secrets:
      GHCR_TOKEN:        ${{ secrets.GHCR_TOKEN }}
      BV_REGISTRY_TOKEN: ${{ secrets.BV_REGISTRY_TOKEN }}

conformance tests

By default, bv conformance <tool> pulls your image and smoke-tests every binary you declared in [tool.binaries]. For each one it tries --version, -version, --help, -h, -v, version in order, and considers the binary alive if any of them exits 0. Most tools need no extra config.

Run it locally before opening the PR:

bv conformance my-tool
bv conformance my-tool --backend apptainer

For unusual binaries, add a [tool.smoke] block:

[tool.smoke]
# Pin a specific probe arg for binaries that don't accept any of the defaults.
probes = { weird-tool = "--check", another = "" }   # "" runs the binary with no args

# Skip binaries with no safe non-destructive invocation (daemons, REPLs that
# wait on stdin forever). They still get shims; conformance just skips them.
skip = ["server-daemon"]

Conformance runs in CI on every registry PR. Today it's a smoke check only; running tools on canonical inputs and validating typed outputs is on the v2 roadmap.

tiers

tier	requirements
`experimental`	basic checks pass; may lack typed I/O. Hidden from default search.
`community`	typed I/O present, conformance passes, manifest valid. Default for new submissions.
`core`	actively maintained, recognized publisher, runs on docker and apptainer, conformance passes on both.

Promotion is a separate PR by a registry maintainer. See GOVERNANCE.md for the full criteria.

new versions

One file per version. Add tools/<id>/<newver>.toml; do not edit the old one. The website and bv search surface the latest available version per tool by default; users can request older versions explicitly with bv add <tool>@<ver>.

for maintainers

running bv-ingest, reviewing PRs, promoting tools

tl;dr

# nightly auto-ingestion (also runs from .github/workflows/)
bv-ingest run --limit 50

# review the staging PRs that need typed I/O
bv-ingest review --staging-dir ./bv-registry-staging

# promote a reviewed tool from staging to bv-registry
bv-ingest promote samtools 1.20 --staging-dir ./bv-registry-staging

the pipeline

bv-ingest turns Bioconda recipes into draft manifests so the registry stays current without manual scraping. Two repos cooperate:

bv-registry-staging : auto-generated drafts. Manifests start in community tier without typed I/O. PRs land here automatically.
bv-registry : the live registry users pull from. Tools graduate here only after a maintainer reviews and promotes them.

End-to-end:

fetch recipes: clone bioconda-recipes, parse meta.yaml for build / test / run_exports, derive binary names.
resolve images: query quay.io/biocontainers for the matching tag and digest. Skip tools without a published BioContainer.
generate manifest: write tools/<id>/<version>.toml in staging with hardware defaults, exposed binaries, but no typed I/O.
open PR: one PR per (tool, version) against bv-registry-staging.
review: maintainer adds [[tool.inputs]] / [[tool.outputs]] using the bv-types vocabulary, and (if needed) a [tool.smoke] override.
promote: bv-ingest promote opens a PR to bv-registry. CI runs conformance; merge ships it.

bv-ingest commands

command	what it does
`bv-ingest run [--dry-run] [--limit N] [--tool ID]`	full pipeline. Default opens PRs against `BV_STAGING_REPO`. `--dry-run` prints what would happen.
`bv-ingest review --staging-dir <path>`	list manifests still missing typed I/O, or with `--show TOOL/VERSION`, dump one for review.
`bv-ingest promote <tool> <version> --staging-dir <path>`	copy the reviewed manifest from staging to bv-registry and open a PR.
`bv-ingest status --staging-dir <path>`	count of staged, reviewed, and promoted manifests.

Common env vars:

BV_STAGING_REPO   = "tejasprabhune/bv-registry-staging"
BV_REGISTRY_REPO  = "tejasprabhune/bv-registry"
BV_BIOCONDA_CACHE = "/var/tmp/bioconda-recipes"   # local clone, optional
GITHUB_TOKEN      = ...                            # falls back to `gh auth token`

reviewing PRs

Auto-generated PRs come in tagged auto-ingest. The fast path:

Check the upstream tool's docs to identify what file types its main entrypoint reads and writes.
Add [[tool.inputs]] / [[tool.outputs]] blocks. If a needed type does not exist in bv-types, add it there first (separate PR).
If the tool downloads model weights or writes large scratch state inside the image, add cache_paths = [...]. Look for clues in the upstream Dockerfile (WORKDIR, VOLUME) or run bv run <tool> on apptainer and watch for read-only-fs errors.
Add a [tool.smoke] override only if any binary needs a non-default probe (the default loop covers --version, -version, --help, -h, -v, version).
Run bv conformance <tool> locally on both backends.
Approve the staging PR. Once merged, run bv-ingest promote.

Heuristic for typed I/O. Most bioconda tools take FASTA / FASTQ / BAM / VCF and emit one or two of the same. When in doubt, look at the test commands declared in the bioconda recipe; they reveal what the tool expects.

promoting to core

A tool moves from community to core only when:

typed I/O is complete and uses canonical types (no ad-hoc string placeholders);
the maintainer is a recognized publisher (project author, lab, or accepted volunteer maintainer);
conformance passes on both docker and apptainer in CI;
the tool has had at least one published version active for 30 days without an unfixed issue tagged broken.

Open a separate PR labelled tier-promote that flips tier = "core". Two maintainer approvals merge it.