git-arx – Internals & Design

Implementation details, design decisions, and architectural notes for contributors and maintainers.

For end-user documentation, see README.md.

Overview
File Structure
Why a Bash Script
Safety Flags
Entry Point and Dispatch
Abstraction Layer
File Backend
Refs Backend
Command Notes
Shell Completion
Testing
Benchmarking
Versioning
Known Limitations

Overview

git-arx is a single self-contained bash script. There are no dependencies beyond git and bash 4+. The script is structured in six sections separated by comment headers:

# --- CONFIG HELPERS ---
# --- BACKEND: FILE ---
# --- BACKEND: REFS ---
# --- ABSTRACTION LAYER ---
# --- COMMANDS ---
# --- ENTRY POINT ---

All commands go through an internal abstraction layer and never touch storage directly. This makes adding a new storage backend a matter of implementing three functions and wiring them into the layer – no commands need to change.

File Structure

git-arx                    Single executable bash script – the entire implementation
install.sh                 Installs git-arx to PATH and sets the git alias
uninstall.sh               Removes installed files and the git alias
git-arx-completion.bash    Bash tab completion script
test.sh                    Test suite
bench.sh                   Performance benchmark (see Benchmarking)
README.md                  End-user documentation
INTERNALS.md               This file
LICENSE                    MIT License

Why a Bash Script

No install dependencies – bash and git are already present everywhere this tool would be used
Git aliases with ! prefix (git config alias.arx '!git-arx') invoke external scripts on $PATH natively
The logic is simple enough that bash’s limitations (no proper data structures, string-heavy) are acceptable
A compiled binary (Go, Rust, etc.) would be the right choice if distribution to non-developers were a goal; it’s not

The one meaningful bash requirement is associative arrays (declare -A), which need bash 4+. Git for Windows ships bash 4.4+. macOS ships bash 3.2 (due to GPL licensing), but /usr/bin/env bash on modern macOS with Homebrew resolves to bash 5.x. This is a known trade-off.

Safety Flags

set -euo pipefail

-e: exit immediately on any command error
-u: treat unset variables as errors
-o pipefail: propagate errors through pipes (e.g. false | true fails)

This is important for a tool that writes to storage – silent failures would corrupt the archive or leave it in a partial state.

Caveat: Commands that are expected to return non-zero must be wrapped. Examples:

git cat-file -e "$sha" – used for existence checks, returns 1 if the object is missing. Wrapped in if ! ....
git update-ref -d – used when deleting refs that may not exist. Followed by || true.
(( counter++ )) – arithmetic (( expr )) returns 1 when the expression evaluates to 0. Use counter=$(( counter + 1 )) instead.
[[ $dry_run -eq 1 ]] && printf '...\n' – when dry_run=0, [[ ]] returns 1, which is the exit code of the whole && expression, triggering set -e. Always append || true: [[ $dry_run -eq 1 ]] && printf '...\n' || true.

Implementing --dry-run on a command:

Add a local dry_run=0 variable and a --dry-run) dry_run=1 ;; case in the option parser.
Keep all output (printf) statements identical to the non-dry-run path – the user sees the same output either way.
Guard every write/delete with [[ $dry_run -eq 0 ]] && ... or wrap in if [[ $dry_run -eq 0 ]]; then ... fi.
Append a single trailing line after the normal summary:
```
[[ $dry_run -eq 1 ]] && printf '(dry run – no changes written)\n' || true
```
The || true is mandatory – see the caveat above.

Entry Point and Dispatch

main() {
    _arx_require_git
    ARX_GIT_ROOT=$(_arx_git_root)
    ...
}

ARX_GIT_ROOT is set once at startup and used by _arx_load_config() to resolve the archive path relative to the repo root rather than the current working directory. This means git arx list works correctly regardless of which subdirectory the user is in when they run it.

_arx_load_config() runs right after, reading all arx.* settings into globals in one pass: ARX_STOREREFS, ARX_STOREFILE, ARX_FILE (full archive path), ARX_REFSPREFIX, and ARX_REMOTE_PREFIX (the remote-tracking namespace derived from the refs prefix). Everything downstream reads these variables directly – config is never re-read during a run. Before this, each _arx_config_* helper spawned a git config subprocess per call; a single _arx_write cost four of them, so arx update archiving N branches paid ~4N subprocesses for values that cannot change mid-run.

Commands that don’t apply to the configured storage call _arx_require_storage at the top of their function, which prints a descriptive error and exits:

git-arx: this command requires refs storage (set: git config arx.storerefs true)

Unknown commands print an error referencing git arx help.

Abstraction Layer

The core of the architecture is three functions that all commands call exclusively:

`_arx_read_all()`

Reads from configured backend(s) and emits normalized records to stdout, one per line:

<branch-name> <full-sha> <ISO-8601-date>

This is a streaming interface – callers pipe or redirect it with while read. No temporary files are needed for reads.

When both arx.storerefs and arx.storefile are enabled, the function performs a union merge:

Emit everything from the refs backend, recording branch names in a declare -A seen associative array
Emit file-only entries (those whose branch name is not in seen)

Refs are treated as primary in the union merge. This reflects the refs backend’s stronger guarantees (gc-safe, native git). The sync command surfaces conflicts between backends explicitly; _arx_read_all silently prefers refs to avoid making every command into a conflict reporter.

`_arx_write(branch, sha, date)`

Writes to all enabled backends. When both are enabled, writes to file first, then refs. Order doesn’t matter for correctness; file first means a crash between the two writes leaves the more portable copy updated.

`_arx_write_bulk()`

Reads branch sha date records from stdin and writes them to all enabled backends in one bulk pass per backend (file first, same order as _arx_write): _arx_file_write_bulk does a single filter-then-append rewrite of .gitarchive, and _arx_refs_write_bulk updates all refs in a single atomic git update-ref --stdin transaction. The transaction is all-or-nothing – if any ref update fails (e.g. a directory/file ref-name conflict), none of the refs are written. Used by update to flush all archive writes at once.

`_arx_delete(branch)`

Removes from all enabled backends. When both are enabled, removes from file first, then refs.

Helper Functions

Several helper functions are defined between the abstraction layer and the commands:

_arx_lookup_branch(branch) – calls _arx_read_all and returns sha date for the named branch. Used by single-branch commands (remove, rename, log, checkout); add and update scan the archive themselves in one pass instead (see the arx add and Performance sections).
_arx_sha_exists(sha) – checks object existence via git cat-file -e; used by log and checkout before operating on an archived SHA.

File Backend

Format

# git-arx archive – do not edit manually
feature/login a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2 2025-11-15T10:30:00+01:00
fix/bug-42 deadbeefdeadbeefdeadbeefdeadbeefdeadbeef 2025-10-01T08:00:00+00:00

Space-delimited, three fields: branch sha date
Full 40-character SHA (abbreviated SHAs become ambiguous as repos grow)
ISO-8601 date with timezone offset from git log -1 --format=%aI (author date of HEAD commit)
# lines are comments, skipped on read
Blank lines are skipped on read

The format is intentionally simple. Since git branch names cannot contain spaces (git itself rejects them), space as delimiter is unambiguous and requires no quoting or escaping.

Atomic Writes

The file is never modified in place. Every write uses a filter-then-append pattern:

Read the full archive into a temp file (${archive}.tmp.$$), skipping the entry being updated
Append the new entry to the temp file
Replace the archive with the temp file

The temp file name uses $$ (the shell’s PID) to avoid collisions if multiple instances run simultaneously (unlikely for an interactive CLI, but safe practice).

The temp file is created (and truncated) up front with : > "$tmpfile". The filter loops only append, so without this a delete that filters out every line (e.g. a header-less archive whose only entry is removed) would never create the temp file — the final mv would then fail after the old archive was already removed, aborting the script before the refs backend delete runs. Truncating up front also discards any stale temp file left behind by a crashed run with a recycled PID.

The replace step on Windows/MINGW64 requires an explicit rm -f before mv:

[[ -f "$archive" ]] && rm -f "$archive"
mv "$tmpfile" "$archive"

On Linux/macOS, mv over an existing file is atomic at the filesystem level. On Windows NTFS via Git Bash, mv can fail if the destination exists; the explicit remove makes it reliable.

Remove = Filter Out

Deleted entries are removed from the file entirely, not marked with a prefix like #archived. Rationale:

The git object itself still exists in the repository (until gc) – the SHA in the record is the real audit trail
Keeping deleted entries would mean the file grows unboundedly
_arx_file_write already implements filter-then-append, so delete is just filter-without-append – no new code path

Refs Backend

Namespace

Archived branches are stored as git refs under a configurable prefix, defaulting to refs/arx/. For a branch named feature/login, the default ref path is refs/arx/feature/login. The prefix is read from arx.refsprefix once at startup into ARX_REFSPREFIX (see _arx_load_config()).

Git ref names allow forward slashes and use them to create directory structure. refs/arx/feature/login is stored as the file .git/refs/arx/feature/login. This is the same mechanism used by refs/remotes/origin/feature/login – no special handling is needed.

The only characters illegal in git ref names are: space, ~, ^, :, ?, *, [, \, and the sequences .. and @{. Since git itself rejects branch names with these characters, any valid local branch name is a valid ref name in our namespace.

Why Refs Protect from gc

git gc prunes unreachable objects – commits, trees, and blobs that cannot be reached by following refs (branches, tags, stash, reflogs). When a local branch is deleted, its commits become unreachable unless something else references them. A ref under the arx prefix (e.g., refs/arx/) is a real git ref, so any commit it points to (and all ancestors of that commit) remain reachable and will not be pruned.

Reading Dates from Refs

The refs backend does not store dates explicitly – the date is read from the commit object at query time:

git for-each-ref \
    --format='%(refname) %(objectname) %(authordate:iso-strict)' \
    "$refsprefix"

%(refname) gives the full ref path (e.g., refs/arx/feature/login). The configured prefix is then stripped in _arx_refs_read to recover the branch name.

%(authordate:iso-strict) gives the ISO-8601 author date of the commit the ref points to — the same date the file backend stores (git log -1 --format=%aI), so the normalized output of both _arx_file_read and _arx_refs_read is identical. %(creatordate) would be the committer date, which diverges from the author date after rebase, amend, or cherry-pick — the two backends would then report different dates for the same entry.

Remote Operations

Refs under the arx prefix are not pushed by default. Git only pushes refs/heads/* and refs/tags/* in a standard git push. The push command uses an explicit refspec built from arx.refsprefix (default: refs/arx/):

git push origin 'refs/arx/*:refs/arx/*'

This pushes all refs under the prefix to the same path on the remote. Supported by GitHub, GitLab, Gitea, and Bitbucket.

After a successful push, arx push updates a local remote-tracking namespace derived from arx.refsprefix — e.g. refs/arx/ → refs/arx-remote/origin/. Each pushed ref is mirrored there via git update-ref so that arx list and arx status --all can report an accurate REMOTE column without a network call.

arx push --delete <branch> deletes a single ref from the remote using git push origin --delete refs/arx/<branch>, then removes the corresponding local remote-tracking ref.

arx push --prune adds --prune to the glob refspec push, which causes git to delete any remote refs under the prefix that have no local counterpart. After the push, the local remote-tracking namespace is rebuilt from scratch to match the new remote state (delete all tracking refs, then re-mirror from current local refs).

arx pull inverts this: it fetches refs/arx/* from the remote into the tracking namespace first, then copies those refs into refs/arx/* locally. This preserves a clean record of the last known remote state regardless of any local-only archive entries that were added between pulls. The fetch refspec is forced (+ prefix) so that refs re-archived at a different SHA and pushed with arx push --force from another machine are accepted (a plain refspec would reject them as non-fast-forward and abort the pull). The fetch also uses --prune, so tracking refs whose remote counterpart was deleted (e.g. via arx push --delete elsewhere) are removed — without it the REMOTE column would keep reporting pushed for refs that no longer exist on the remote. Note that pruning only affects the tracking namespace: the corresponding local refs/arx/* entry is kept, since the local archive is authoritative for local entries.

# pull fetches into tracking namespace (forced + pruned), then promotes to local refs
git fetch --prune origin '+refs/arx/*:refs/arx-remote/origin/*'
# then: git update-ref refs/arx/<branch> <sha>  for each tracking ref

When the file backend is also enabled, pull syncs all fetched entries to .gitarchive via _arx_file_write_bulk — one filter-then-append rewrite for the whole batch instead of one full file rewrite per entry (which would be O(n²)). File-only entries are preserved, matching the per-entry write semantics.

arx fetch is a read-only preview of what arx pull would bring in. It uses git ls-remote to query the remote ref list without downloading any objects, then compares against local refs/arx/*:

new — on the remote, not locally; pull would add it
up to date — same SHA on both sides; pull is a no-op for this branch
changed — different SHAs; pull would overwrite local with the remote SHA
local — local only, not on the remote; unaffected by pull

This is also how fully automatic remote sync is possible without git arx push/pull when using both backends: if .gitarchive is committed to the repository, it syncs as part of the normal git object graph.

Command Notes

`arx update` and `arx status`

Both commands use %(upstream) from git for-each-ref to classify local branches:

Empty %(upstream) — no upstream ever configured: the branch was never pushed (“local only”).
Non-empty %(upstream), ref does not exist locally — upstream was configured but the tracking ref no longer exists: the remote branch was deleted (after git fetch --prune or manual pruning).
Non-empty %(upstream), ref exists — live remote branch, skip.

The existence check is a lookup in an existing_refs set preloaded from a single git for-each-ref refs/heads/ refs/remotes/ call (see Performance section) — refs/remotes/ for normal remote-tracking upstreams, refs/heads/ because an upstream can also be a local branch (branch.<name>.remote = ., e.g. after git branch --track a b).

This approach is more robust than checking %(upstream:track) for the string [gone] because:

[gone] can vary by git version or locale
The ref-existence check is a direct, binary fact about the ref store

Note: git remote prune origin removes the remote tracking ref (refs/remotes/origin/branch) but does not clear branch.<name>.remote or branch.<name>.merge from .git/config. So %(upstream) still outputs the full ref path for pruned branches. Checking whether that ref resolves handles both the “never had a remote” and “remote was deleted” cases.

arx update only processes remote-deleted branches (non-empty upstream that doesn’t resolve). Never-pushed branches are skipped — they require an explicit git arx add if the user wants to archive them.

arx status (default) shows only remote-deleted branches, with archive states Not archived, Archived, Archived as "<name>", or Conflict. Nothing is written.

arx status --all additionally shows:

Never-pushed local branches — shown with status Local only when not yet archived (or Archived / Conflict if they have been archived manually).
Archived branches that no longer exist locally — after the git for-each-ref loop, the command compares arc_by_name keys against the local_branches set to find orphan entries. Their authors are fetched in a single git log --no-walk call, with (gc) as a fallback for pruned commits. These rows are appended to the same rows array and go through the same sort and print path.
A REMOTE column (when the refs backend is active) — loaded once from refs/arx-remote/origin/* into remote_sha_by_branch[branch]=sha before the print loop. Each row is classified as pushed (remote SHA matches archive SHA), ahead (remote SHA differs), local (no tracking ref), remote (not in local archive but remote tracking ref exists — branch was removed locally but not yet deleted from remote), or - (not in archive and no remote tracking ref).

arx status accepts --sort=name|date and --order=asc|desc. The default sort is name; the default order depends on the sort key — asc for name, desc for date — unless overridden explicitly. When sorting by date, name is used as a tiebreaker. Rows are collected first, then sorted as a post-processing step before printing. arx list uses the same sort/order logic.

Color output. Both arx status and arx list emit ANSI color codes only when stdout is a terminal ([[ -t 1 ]]). Piped or redirected output is always plain text. arx status colors the STATUS column: red (Not archived), green (Archived), light blue / bright cyan (Archived as "..."), yellow (Conflict), dim (Local only). Both commands color the REMOTE column: green (pushed), yellow (ahead), red (local), cyan (remote), dim (-).

printf byte-vs-character width. printf %-Ns pads a field to N bytes, not N display columns. Author names containing multibyte UTF-8 characters (e.g. ć, ž) are longer in bytes than in characters, so subsequent columns shift left for those rows. arx status corrects for this before printing each row: it measures the string in both character count (${#s} with the active locale) and byte count (${#s} with LC_ALL=C), then widens the format field by the difference. The same correction is applied to the STATUS column when the REMOTE column is present — ANSI color codes add invisible bytes to the colored STATUS string, which would otherwise throw off its fixed-width padding.

Performance (`arx update`, `arx status`, `arx list --author`)

For repos with many branches or archived entries, naive per-branch subprocess calls add up to tens of seconds. Three commands use bulk operations to avoid this.

arx update and arx status share the same three optimisations:

Archive loaded once – _arx_read_all is called once before the branch loop and its output is stored in two in-memory associative arrays: arc_by_name[branch]=sha and arc_by_sha[sha]=name. All per-branch archive lookups are then O(1) bash hash table reads instead of O(n) subprocess calls.
Single git for-each-ref call – a single call retrieves branch name, SHA, author date, and (for status) author name for every branch at once, replacing per-branch git rev-parse and git log calls:

# arx status (includes authorname for display)
git for-each-ref \
    --format='%(refname:short)%09%(objectname)%09%(authordate:iso-strict)%09%(authorname)%09%(upstream)' \
    refs/heads/

# arx update (authorname not needed)
git for-each-ref \
    --format='%(refname:short)%09%(objectname)%09%(authordate:iso-strict)%09%(upstream)' \
    refs/heads/

Upstream existence as a set lookup – before the branch loop, all refs an upstream could point to are loaded into an existing_refs associative array with one git for-each-ref --format='%(refname)' refs/heads/ refs/remotes/ call. The per-branch “does the upstream ref still exist” check is then a bash hash lookup instead of a git rev-parse --verify subprocess per branch — previously the dominant cost on repos with many tracked branches, even when there was nothing to archive.

arx update batches its writes. Branches to archive are collected into a pending array during the loop and flushed once at the end through _arx_write_bulk – one .gitarchive rewrite instead of a full rewrite per branch (which would be O(n²)), and one atomic git update-ref --stdin transaction instead of one subprocess per ref. Per-branch messages are printed as branches are classified, before the flush; if the flush fails, the exit status is non-zero even though Archived: lines were already shown.

arx update also keeps the in-memory maps current after processing each branch – arc_by_name and arc_by_sha are updated regardless of --dry-run – so the simulation is accurate, and subsequent branches see the correct in-memory state whether or not writes are actually happening.

%(upstream) must be last. Tab (%09) is an IFS whitespace character. When %(upstream) is empty, it produces two consecutive tabs. Because IFS whitespace collapses, read treats <TAB><TAB> as a single separator – the empty field disappears and all subsequent fields shift left. Placing %(upstream) last avoids this: the trailing tab is stripped cleanly, and upstream_ref is assigned an empty string, which is the correct behaviour.

set -u and associative arrays. Accessing a missing key in an associative array with set -u enabled triggers an “unbound variable” error in bash 4.x. All array reads use ${arr[key]:-} to provide an explicit empty-string default and suppress the error.

arx list --author – archived SHAs are not local branch refs, so git for-each-ref does not apply. Instead, all SHAs are collected from the sorted entries and passed to a single git log --no-walk call, reducing N subprocess calls to one:

git log --no-walk --format='%H %an' sha1 sha2 sha3 ...

The result is stored in author_by_sha[sha]=name and looked up during rendering. gc’d commits are absent from the output and fall back to (gc) via ${author_by_sha[$sha]:-$gc$}.

`arx add` – Conflict Detection

Before writing, add scans _arx_read_all once, resolving two things in the same pass: the SHA stored under the target name (which may be a custom archive name), and any names the branch’s current SHA is already stored under. Four outcomes:

Not archived – write and report Archived:.
Archived with same SHA – exit 0 with Already archived:. Idempotent; safe to call repeatedly.
Archived with different SHA – conflict. Exit 1 with an error and hints. --force overwrites; an archive-name argument stores under a different name instead.
Not archived by target name, but SHA already present under a different name – the same-pass reverse lookup finds the duplicate. Prints a Note: line, then writes anyway (the user explicitly requested this archive entry).

arx update applies the same conflict logic for every candidate branch, using the in-memory arc_by_name and arc_by_sha maps (see Performance section). If the current SHA is already stored under a different name, the branch is skipped with an Already safe: message and counted separately in the summary. This prevents silent duplicate SHA storage during automatic archiving. If the user wants the branch indexed under its natural name too, they can run git arx add <branch> explicitly.

`arx rename`

Implemented as _arx_write(new) + _arx_delete(old) – the abstraction layer fans out to all enabled backends automatically. There is no dedicated rename primitive in either backend; write-then-delete is equivalent.

`arx log` – Argument Passthrough

cmd_log() {
    local branch="$1"
    shift          # $@ now contains only the git log flags
    ...
    git log "$sha" "$@"
}

shift removes the branch name argument, leaving "$@" as whatever the user typed after the branch name. All git log flags, format strings, file path filters, and revision ranges work as expected.

exec is intentionally not used here. On Windows/MINGW64, exec does not properly transfer the pipe file descriptor to the replacement process, so git detects a terminal instead of a pipe, opens the pager, and the output never reaches the caller. A plain git log call avoids this and behaves correctly on all platforms.

`arx checkout` – gc Detection

Before attempting to restore, the script checks whether the commit still exists:

if ! git cat-file -e "$sha"; then
    # commit was garbage collected
fi

git cat-file -e <object> exits 0 if the object exists in the object store, non-zero otherwise. It does not print anything. This is the correct low-level check – it works for any object type (commit, tree, blob) and does not require the object to be reachable.

`arx prune`

Finds all archived branches that still exist as local branches, then deletes them.

Both halves are bulk operations: local branches are loaded once into a local_branches set with a single git for-each-ref refs/heads/ call (the per-archive-entry existence check is a hash lookup, not a git rev-parse --verify subprocess), and the deletion is a single git branch -D b1 b2 ... call for the whole batch – git itself prints the per-branch Deleted branch ... (was ...). lines.

Key behaviors:

The currently checked-out branch is always skipped (git would reject the deletion anyway). It is listed separately in the output with a “Skipped (currently checked out)” notice.
Without --force, the full list is printed and the user must type "yes" to proceed. This is intentional – git branch -D is irreversible from git’s perspective (the archive is the only recovery path).
--dry-run prints the same list and count as a real run but skips the confirmation prompt and does not delete anything.

`arx sync` – Union Merge Algorithm

sync is only meaningful when both arx.storerefs and arx.storefile are enabled, since it reconciles two backends that can theoretically drift.

When drift happens:

In normal usage, drift should not occur – every write operation hits both backends atomically (within the script). Drift can arise from:

Someone manually edits .gitarchive with a text editor
Someone manually creates/deletes refs with raw git commands
A script crash between the file write and the ref write
Running git fetch origin 'refs/arx/*:refs/arx/*' directly instead of git arx pull (updates local arx refs but bypasses the tracking namespace and the file backend)

Algorithm:

for each branch in (refs ∪ file):
    refs-only → write to file
    file-only → write to refs
    both, same SHA → no-op
    both, different SHA → conflict

Non-conflicting entries are always processed. A conflict does not block other entries from being synced. After processing all entries, if any conflicts occurred, sync exits with status 1.

--dry-run: Runs the same comparison logic and prints the same output as a real sync, but skips all writes. A trailing (dry run – no changes written) line is appended. Works with or without --force-file / --force-refs – output shows exactly what would happen if the flag were run without --dry-run.

--force-file / --force-refs: When a SHA conflict is detected and a force flag is present, the designated backend is treated as the source of truth and the other is overwritten. This is an escape hatch for the rare case where the user knows which side is correct.

Shell Completion

git-arx-completion.bash defines _git_arx() — the function name git-completion.bash looks for when the user presses Tab after git arx. The naming convention is the external command name with hyphens replaced by underscores.

The function is context-aware: it offers different completions depending on the subcommand at words[2]:

Subcommand names at cword == 2 (including aliases: ls, rm, mv)
Archived branch names (via git arx list) for checkout, log, remove/rm, rename/mv
Local branch names (via __git_heads) for add
Per-subcommand flags for everything else

Archived branch names are fetched by calling git arx list at tab-press time and stripping the two header lines with awk NR > 2. This is a subprocess invocation on every completion for those commands — fast enough in practice, but noticeable on repos with very large archives.

Testing

The test suite lives in test.sh and is an integration test suite – it runs the actual git-arx script against real git repositories created in a temporary directory. No mocking.

Running the tests

bash test.sh

No install required. The script resolves the path to git-arx relative to its own location, so it works from any working directory.

Structure

The suite is organized into sections, each exercising one command or scenario:

test_help          git arx help / -h
test_add           git arx add (normal, conflict, --force, archive-name)
test_remove        git arx remove
test_rename        git arx rename
test_list          git arx list (sorting, --author, --storage filter)
test_update        git arx update (--dry-run, --force, conflicts, already-safe)
test_sort_tiebreak name tiebreak when sorting by date
test_log           git arx log (passthrough flags)
test_checkout      git arx checkout (restore, gc'd commit)
test_prune         git arx prune (--dry-run, --force, current branch skipped)
test_merge         git arx merge (dedup, conflicts)
test_refs_backend  refs-only storage
test_both_backend  both backends enabled (union reads, sync)
test_push_pull     git arx push / fetch / pull (requires a bare remote)
test_sync          git arx sync (--dry-run, --force-file, --force-refs)
test_slashed_branches  branch names with slashes
test_double_add    idempotency of add
test_config_bool   git boolean spellings (yes/on/1/...) for storage flags
test_error_cases   unknown commands, missing args, bad config
test_overwrite_guard   bytes past the final { main; exit; } are never executed

Each section uses assert_ok, assert_fails, and assert_out helpers. assert_out greps the combined stdout+stderr for a fixed string – tests are intentionally coarse-grained (output substring match) rather than exact, so minor wording changes in messages don’t break the suite.

Test isolation

Each test section resets the archive state via reset_archive() before running. This deletes .gitarchive and removes all refs/arx/ refs, then resets storage to file-only. Branches deleted during a test are recreated by recreate_branches() where needed.

The entire repo lives in a mktemp -d temporary directory and is cleaned up via a trap ... EXIT at the end of the run.

Benchmarking

bench.sh times the subprocess-heavy commands (status, status --all, update, list, prune --dry-run) against a synthetic repository with many branches – each at a distinct commit (update dedups identical SHAs), each simulating a deleted remote (upstream configured, tracking ref absent). Like the test suite, it runs in a mktemp -d directory and cleans up after itself.

bash bench.sh                  # time the working-tree git-arx
bash bench.sh <rev>            # compare <rev> against the working tree
bash bench.sh <rev-a> <rev-b>  # compare two versions
N=200 bash bench.sh            # override the branch count (default: 80)

Arguments may be git revisions (git-arx is extracted from them via git show) or paths to git-arx executables, so an installed copy can be compared directly: bash bench.sh ~/bin/git-arx.

When to run it: when a change touches the branch loops in update, status, or prune, or adds a subprocess call anywhere reachable from them – not on a schedule. The failure mode it guards against is a subprocess inside a per-branch loop, which shows up as times scaling with N instead of staying flat.

Times are not tracked over time. Absolute numbers depend on machine, filesystem, and platform (process spawning on Windows/MINGW64 is an order of magnitude slower than on Linux, which is exactly why the bulk patterns matter there most). For reference, the one-time snapshot that motivated the bulk-operation work (N=80, Windows 11 / MINGW64, 3e11639 vs 85738a4):

Command	Before	After
`status`	5.4 s	1.8 s	3×
`status --all`	5.8 s	2.5 s	2.4×
`update` (archiving 80)	23.4 s	1.1 s	22×
`list`	0.8 s	0.7 s	– (already bulk)
`prune --dry-run`	4.2 s	0.7 s	6×

The ratios are the signal; the absolute numbers are not.

Versioning

git-arx uses commit hashes as its version identifier. The source file always contains VERSION="dev" — this placeholder is replaced at install time by install.sh.

How the stamp works:

Local install (bash install.sh): install.sh reads the short commit hash from the repo alongside it via git rev-parse --short HEAD and writes it into the installed file using sed.
Remote install (curl ... | bash): install.sh queries the GitHub API for the latest commit on master and writes that hash into the installed file.

After install, git arx --version reports the short commit hash that was current at install time (e.g. git-arx abc1234). To check whether the installed version is up to date, run git arx upgrade.

How upgrade works:

cmd_upgrade uses git ls-remote (the same mechanism as install.sh remote install) to fetch the latest commit hash on master without cloning the repo. It compares the 7-character prefix of that hash against $VERSION. If they differ, the user is prompted to confirm (or -y skips the prompt). On confirmation, install.sh is downloaded via curl and executed with the install directory derived from command -v git-arx, so the updated file lands in the same location as the running binary.

cmd_upgrade does not call _arx_require_git and is dispatched in main() before that check, so it works from any directory — not just inside a git repo.

Self-overwrite guard. upgrade overwrites the very script bash is executing. Bash reads scripts lazily: functions are parsed in full before they run, but after the last top-level command finishes, bash returns to the file to look for more input — at its saved byte offset, which now points into the middle of the replaced (differently sized) file, and it will try to execute whatever bytes it lands on. That is why the script’s last line is { main "$@"; exit; } rather than a bare main "$@": the braces force the whole block to be parsed up front, and the exit ends the process without ever reading the file again. test_overwrite_guard in the test suite asserts that bytes appended after this line are never executed.

VERSION="dev" in the source is intentional — it is never manually edited. Do not commit a real hash into the source file. Running upgrade when VERSION="dev" exits with an error.

Known Limitations

No locking. The archive file has no write lock. Concurrent invocations (unlikely for an interactive tool) could corrupt it. Acceptable trade-off.
bash 4+ required. Uses declare -A associative arrays. macOS ships bash 3.2; users need to install bash via Homebrew and ensure it’s on their PATH.
merge does not resolve SHA conflicts. When the same branch appears in two files with different SHAs, the conflict is reported and the entry is skipped. The user must manually decide which SHA is correct and edit the output file. There is no --force-file / --force-refs equivalent for merge (unlike sync) because merge has no concept of a “primary” source.
push/pull hardcodes origin. The remote name is not configurable. This could be added via arx.remote config in a future version.
Bash completion only. git-arx-completion.bash covers bash. zsh and fish completion scripts are not provided.