git-arx – Internals & Design
Implementation details, design decisions, and architectural notes for contributors and maintainers.
For end-user documentation, see README.md.
Contents
- Overview
- File Structure
- Why a Bash Script
- Safety Flags
- Entry Point and Dispatch
- Abstraction Layer
- File Backend
- Refs Backend
- Command Notes
- Shell Completion
- Testing
- Versioning
- Known Limitations
Overview
git-arx is a single self-contained bash script. There are no dependencies beyond git and bash 4+. The script is structured in six sections separated by comment headers:
# --- CONFIG HELPERS ---
# --- BACKEND: FILE ---
# --- BACKEND: REFS ---
# --- ABSTRACTION LAYER ---
# --- COMMANDS ---
# --- ENTRY POINT ---
All commands go through an internal abstraction layer and never touch storage directly. This makes adding a new storage backend a matter of implementing three functions and wiring them into the layer – no commands need to change.
File Structure
git-arx Single executable bash script – the entire implementation
install.sh Installs git-arx to PATH and sets the git alias
uninstall.sh Removes installed files and the git alias
git-arx-completion.bash Bash tab completion script
test.sh Test suite
README.md End-user documentation
INTERNALS.md This file
LICENSE MIT License
Why a Bash Script
- No install dependencies – bash and git are already present everywhere this tool would be used
- Git aliases with
!prefix (git config alias.arx '!git-arx') invoke external scripts on$PATHnatively - The logic is simple enough that bash’s limitations (no proper data structures, string-heavy) are acceptable
- A compiled binary (Go, Rust, etc.) would be the right choice if distribution to non-developers were a goal; it’s not
The one meaningful bash requirement is associative arrays (declare -A), which need bash 4+. Git for Windows ships bash 4.4+. macOS ships bash 3.2 (due to GPL licensing), but /usr/bin/env bash on modern macOS with Homebrew resolves to bash 5.x. This is a known trade-off.
Safety Flags
set -euo pipefail
-e: exit immediately on any command error-u: treat unset variables as errors-o pipefail: propagate errors through pipes (e.g.false | truefails)
This is important for a tool that writes to storage – silent failures would corrupt the archive or leave it in a partial state.
Caveat: Commands that are expected to return non-zero must be wrapped. Examples:
git cat-file -e "$sha"– used for existence checks, returns 1 if the object is missing. Wrapped inif ! ....git update-ref -d– used when deleting refs that may not exist. Followed by|| true.(( counter++ ))– arithmetic(( expr ))returns 1 when the expression evaluates to 0. Usecounter=$(( counter + 1 ))instead.[[ $dry_run -eq 1 ]] && printf '...\n'– whendry_run=0,[[ ]]returns 1, which is the exit code of the whole&&expression, triggeringset -e. Always append|| true:[[ $dry_run -eq 1 ]] && printf '...\n' || true.
Implementing --dry-run on a command:
- Add a
local dry_run=0variable and a--dry-run) dry_run=1 ;;case in the option parser. - Keep all output (
printf) statements identical to the non-dry-run path – the user sees the same output either way. - Guard every write/delete with
[[ $dry_run -eq 0 ]] && ...or wrap inif [[ $dry_run -eq 0 ]]; then ... fi. - Append a single trailing line after the normal summary:
[[ $dry_run -eq 1 ]] && printf '(dry run – no changes written)\n' || trueThe
|| trueis mandatory – see the caveat above.
Entry Point and Dispatch
main() {
_arx_require_git
ARX_GIT_ROOT=$(_arx_git_root)
...
}
ARX_GIT_ROOT is set once at startup and used by _arx_config_file() to resolve the archive path relative to the repo root rather than the current working directory. This means git arx list works correctly regardless of which subdirectory the user is in when they run it.
Commands that don’t apply to the configured storage call _arx_require_storage at the top of their function, which prints a descriptive error and exits:
git-arx: this command requires refs storage (set: git config arx.storerefs true)
Unknown commands print an error referencing git arx help.
Abstraction Layer
The core of the architecture is three functions that all commands call exclusively:
_arx_read_all()
Reads from configured backend(s) and emits normalized records to stdout, one per line:
<branch-name> <full-sha> <ISO-8601-date>
This is a streaming interface – callers pipe or redirect it with while read. No temporary files are needed for reads.
When both arx.storerefs and arx.storefile are enabled, the function performs a union merge:
- Emit everything from the refs backend, recording branch names in a
declare -A seenassociative array - Emit file-only entries (those whose branch name is not in
seen)
Refs are treated as primary in the union merge. This reflects the refs backend’s stronger guarantees (gc-safe, native git). The sync command surfaces conflicts between backends explicitly; _arx_read_all silently prefers refs to avoid making every command into a conflict reporter.
_arx_write(branch, sha, date)
Writes to all enabled backends. When both are enabled, writes to file first, then refs. Order doesn’t matter for correctness; file first means a crash between the two writes leaves the more portable copy updated.
_arx_delete(branch)
Removes from all enabled backends. When both are enabled, removes from file first, then refs.
Helper Functions
Several helper functions are defined between the abstraction layer and the commands:
_arx_lookup_branch(branch)– calls_arx_read_alland returnssha datefor the named branch._arx_sha_exists(sha)– checks object existence viagit cat-file -e; used bylogandcheckoutbefore operating on an archived SHA._arx_lookup_sha(sha)– reverse-lookup: scans_arx_read_alloutput for all entries matching the target SHA. Used byarx addto detect when a commit is already archived under a different name.arx updateuses the in-memoryarc_by_shamap instead (see Performance section).
File Backend
Format
# git-arx archive – do not edit manually
feature/login a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2 2025-11-15T10:30:00+01:00
fix/bug-42 deadbeefdeadbeefdeadbeefdeadbeefdeadbeef 2025-10-01T08:00:00+00:00
- Space-delimited, three fields:
branch sha date - Full 40-character SHA (abbreviated SHAs become ambiguous as repos grow)
- ISO-8601 date with timezone offset from
git log -1 --format=%aI(author date of HEAD commit) #lines are comments, skipped on read- Blank lines are skipped on read
The format is intentionally simple. Since git branch names cannot contain spaces (git itself rejects them), space as delimiter is unambiguous and requires no quoting or escaping.
Atomic Writes
The file is never modified in place. Every write uses a filter-then-append pattern:
- Read the full archive into a temp file (
${archive}.tmp.$$), skipping the entry being updated - Append the new entry to the temp file
- Replace the archive with the temp file
The temp file name uses $$ (the shell’s PID) to avoid collisions if multiple instances run simultaneously (unlikely for an interactive CLI, but safe practice).
The replace step on Windows/MINGW64 requires an explicit rm -f before mv:
[[ -f "$archive" ]] && rm -f "$archive"
mv "$tmpfile" "$archive"
On Linux/macOS, mv over an existing file is atomic at the filesystem level. On Windows NTFS via Git Bash, mv can fail if the destination exists; the explicit remove makes it reliable.
Remove = Filter Out
Deleted entries are removed from the file entirely, not marked with a prefix like #archived. Rationale:
- The git object itself still exists in the repository (until gc) – the SHA in the record is the real audit trail
- Keeping deleted entries would mean the file grows unboundedly
_arx_file_writealready implements filter-then-append, so delete is just filter-without-append – no new code path
Refs Backend
Namespace
Archived branches are stored as git refs under a configurable prefix, defaulting to refs/arx/. For a branch named feature/login, the default ref path is refs/arx/feature/login. The prefix is read from arx.refsprefix via _arx_config_refsprefix().
Git ref names allow forward slashes and use them to create directory structure. refs/arx/feature/login is stored as the file .git/refs/arx/feature/login. This is the same mechanism used by refs/remotes/origin/feature/login – no special handling is needed.
The only characters illegal in git ref names are: space, ~, ^, :, ?, *, [, \, and the sequences .. and @{. Since git itself rejects branch names with these characters, any valid local branch name is a valid ref name in our namespace.
Why Refs Protect from gc
git gc prunes unreachable objects – commits, trees, and blobs that cannot be reached by following refs (branches, tags, stash, reflogs). When a local branch is deleted, its commits become unreachable unless something else references them. A ref under the arx prefix (e.g., refs/arx/) is a real git ref, so any commit it points to (and all ancestors of that commit) remain reachable and will not be pruned.
Reading Dates from Refs
The refs backend does not store dates explicitly – the date is read from the commit object at query time:
git for-each-ref \
--format='%(refname) %(objectname) %(creatordate:iso-strict)' \
"$refsprefix"
%(refname) gives the full ref path (e.g., refs/arx/feature/login). The configured prefix is then stripped in _arx_refs_read to recover the branch name.
%(creatordate:iso-strict) gives the ISO-8601 date of the commit the ref points to. This is the same date that would have been stored in the file backend, so the normalized output of both _arx_file_read and _arx_refs_read is identical in format.
Remote Operations
Refs under the arx prefix are not pushed by default. Git only pushes refs/heads/* and refs/tags/* in a standard git push. The push command uses an explicit refspec built from arx.refsprefix (default: refs/arx/):
git push origin 'refs/arx/*:refs/arx/*'
This pushes all refs under the prefix to the same path on the remote. Supported by GitHub, GitLab, Gitea, and Bitbucket.
After a successful push, arx push updates a local remote-tracking namespace derived from arx.refsprefix — e.g. refs/arx/ → refs/arx-remote/origin/. Each pushed ref is mirrored there via git update-ref so that arx list and arx status --all can report an accurate REMOTE column without a network call.
arx push --delete <branch> deletes a single ref from the remote using git push origin --delete refs/arx/<branch>, then removes the corresponding local remote-tracking ref.
arx push --prune adds --prune to the glob refspec push, which causes git to delete any remote refs under the prefix that have no local counterpart. After the push, the local remote-tracking namespace is rebuilt from scratch to match the new remote state (delete all tracking refs, then re-mirror from current local refs).
arx pull inverts this: it fetches refs/arx/* from the remote into the tracking namespace first, then copies those refs into refs/arx/* locally. This preserves a clean record of the last known remote state regardless of any local-only archive entries that were added between pulls.
# pull fetches into tracking namespace, then promotes to local refs
git fetch origin 'refs/arx/*:refs/arx-remote/origin/*'
# then: git update-ref refs/arx/<branch> <sha> for each tracking ref
arx fetch is a read-only preview of what arx pull would bring in. It uses git ls-remote to query the remote ref list without downloading any objects, then compares against local refs/arx/*:
new— on the remote, not locally; pull would add itup to date— same SHA on both sides; pull is a no-op for this branchchanged— different SHAs; pull would overwrite local with the remote SHAlocal— local only, not on the remote; unaffected by pull
This is also how fully automatic remote sync is possible without git arx push/pull when using both backends: if .gitarchive is committed to the repository, it syncs as part of the normal git object graph.
Command Notes
arx update and arx status
Both commands use %(upstream) from git for-each-ref to classify local branches:
- Empty
%(upstream)— no upstream ever configured: the branch was never pushed (“local only”). - Non-empty
%(upstream),git rev-parse --verifyfails — upstream was configured but the tracking ref no longer exists locally: the remote branch was deleted (aftergit fetch --pruneor manual pruning). - Non-empty
%(upstream),git rev-parse --verifysucceeds — tracking ref exists: live remote branch, skip.
This approach is more robust than checking %(upstream:track) for the string [gone] because:
[gone]can vary by git version or locale- The ref-existence check is a direct, binary fact about the object store
Note: git remote prune origin removes the remote tracking ref (refs/remotes/origin/branch) but does not clear branch.<name>.remote or branch.<name>.merge from .git/config. So %(upstream) still outputs the full ref path for pruned branches. Checking whether that ref resolves handles both the “never had a remote” and “remote was deleted” cases.
arx update only processes remote-deleted branches (non-empty upstream that doesn’t resolve). Never-pushed branches are skipped — they require an explicit git arx add if the user wants to archive them.
arx status (default) shows only remote-deleted branches, with archive states Not archived, Archived, Archived as "<name>", or Conflict. Nothing is written.
arx status --all additionally shows:
- Never-pushed local branches — shown with status
Local onlywhen not yet archived (orArchived/Conflictif they have been archived manually). - Archived branches that no longer exist locally — after the
git for-each-refloop, the command comparesarc_by_namekeys against thelocal_branchesset to find orphan entries. Their authors are fetched in a singlegit log --no-walkcall, with(gc)as a fallback for pruned commits. These rows are appended to the samerowsarray and go through the same sort and print path. - A REMOTE column (when the refs backend is active) — loaded once from
refs/arx-remote/origin/*intoremote_sha_by_branch[branch]=shabefore the print loop. Each row is classified aspushed(remote SHA matches archive SHA),ahead(remote SHA differs),local(no tracking ref),remote(not in local archive but remote tracking ref exists — branch was removed locally but not yet deleted from remote), or-(not in archive and no remote tracking ref).
arx status accepts --sort=name|date and --order=asc|desc. The default sort is name; the default order depends on the sort key — asc for name, desc for date — unless overridden explicitly. When sorting by date, name is used as a tiebreaker. Rows are collected first, then sorted as a post-processing step before printing. arx list uses the same sort/order logic.
Color output. Both arx status and arx list emit ANSI color codes only when stdout is a terminal ([[ -t 1 ]]). Piped or redirected output is always plain text. arx status colors the STATUS column: red (Not archived), green (Archived), light blue / bright cyan (Archived as "..."), yellow (Conflict), dim (Local only). Both commands color the REMOTE column: green (pushed), yellow (ahead), red (local), cyan (remote), dim (-).
printf byte-vs-character width. printf %-Ns pads a field to N bytes, not N display columns. Author names containing multibyte UTF-8 characters (e.g. ć, ž) are longer in bytes than in characters, so subsequent columns shift left for those rows. arx status corrects for this before printing each row: it measures the string in both character count (${#s} with the active locale) and byte count (${#s} with LC_ALL=C), then widens the format field by the difference. The same correction is applied to the STATUS column when the REMOTE column is present — ANSI color codes add invisible bytes to the colored STATUS string, which would otherwise throw off its fixed-width padding.
Performance (arx update, arx status, arx list --author)
For repos with many branches or archived entries, naive per-branch subprocess calls add up to tens of seconds. Three commands use bulk operations to avoid this.
arx update and arx status share the same two optimisations:
-
Archive loaded once –
_arx_read_allis called once before the branch loop and its output is stored in two in-memory associative arrays:arc_by_name[branch]=shaandarc_by_sha[sha]=name. All per-branch archive lookups are then O(1) bash hash table reads instead of O(n) subprocess calls. -
Single
git for-each-refcall – a single call retrieves branch name, SHA, author date, and (forstatus) author name for every branch at once, replacing per-branchgit rev-parseandgit logcalls:
# arx status (includes authorname for display)
git for-each-ref \
--format='%(refname:short)%09%(objectname)%09%(authordate:iso-strict)%09%(authorname)%09%(upstream)' \
refs/heads/
# arx update (authorname not needed)
git for-each-ref \
--format='%(refname:short)%09%(objectname)%09%(authordate:iso-strict)%09%(upstream)' \
refs/heads/
arx update also keeps the in-memory maps current after processing each branch – arc_by_name and arc_by_sha are updated regardless of --dry-run – so the simulation is accurate, and subsequent branches see the correct in-memory state whether or not writes are actually happening.
%(upstream) must be last. Tab (%09) is an IFS whitespace character. When %(upstream) is empty, it produces two consecutive tabs. Because IFS whitespace collapses, read treats <TAB><TAB> as a single separator – the empty field disappears and all subsequent fields shift left. Placing %(upstream) last avoids this: the trailing tab is stripped cleanly, and upstream_ref is assigned an empty string, which is the correct behaviour.
set -u and associative arrays. Accessing a missing key in an associative array with set -u enabled triggers an “unbound variable” error in bash 4.x. All array reads use ${arr[key]:-} to provide an explicit empty-string default and suppress the error.
arx list --author – archived SHAs are not local branch refs, so git for-each-ref does not apply. Instead, all SHAs are collected from the sorted entries and passed to a single git log --no-walk call, reducing N subprocess calls to one:
git log --no-walk --format='%H %an' sha1 sha2 sha3 ...
The result is stored in author_by_sha[sha]=name and looked up during rendering. gc’d commits are absent from the output and fall back to (gc) via ${author_by_sha[$sha]:-\(gc\)}.
arx add – Conflict Detection
Before writing, add calls _arx_lookup_branch against the target name (which may be a custom archive name). Four outcomes:
- Not archived – write and report
Archived:. - Archived with same SHA – exit 0 with
Already archived:. Idempotent; safe to call repeatedly. - Archived with different SHA – conflict. Exit 1 with an error and hints.
--forceoverwrites; anarchive-nameargument stores under a different name instead. - Not archived by target name, but SHA already present under a different name –
_arx_lookup_shafinds the duplicate. Prints aNote:line, then writes anyway (the user explicitly requested this archive entry).
arx update applies the same conflict logic for every candidate branch, using the in-memory arc_by_name and arc_by_sha maps (see Performance section) rather than calling _arx_lookup_branch and _arx_lookup_sha per branch. If the current SHA is already stored under a different name, the branch is skipped with an Already safe: message and counted separately in the summary. This prevents silent duplicate SHA storage during automatic archiving. If the user wants the branch indexed under its natural name too, they can run git arx add <branch> explicitly.
arx rename
Implemented as _arx_write(new) + _arx_delete(old) – the abstraction layer fans out to all enabled backends automatically. There is no dedicated rename primitive in either backend; write-then-delete is equivalent.
arx log – Argument Passthrough
cmd_log() {
local branch="$1"
shift # $@ now contains only the git log flags
...
git log "$sha" "$@"
}
shift removes the branch name argument, leaving "$@" as whatever the user typed after the branch name. All git log flags, format strings, file path filters, and revision ranges work as expected.
exec is intentionally not used here. On Windows/MINGW64, exec does not properly transfer the pipe file descriptor to the replacement process, so git detects a terminal instead of a pipe, opens the pager, and the output never reaches the caller. A plain git log call avoids this and behaves correctly on all platforms.
arx checkout – gc Detection
Before attempting to restore, the script checks whether the commit still exists:
if ! git cat-file -e "$sha"; then
# commit was garbage collected
fi
git cat-file -e <object> exits 0 if the object exists in the object store, non-zero otherwise. It does not print anything. This is the correct low-level check – it works for any object type (commit, tree, blob) and does not require the object to be reachable.
arx prune
Finds all archived branches that still exist as local branches, then deletes them.
Key behaviors:
- The currently checked-out branch is always skipped (git would reject the deletion anyway). It is listed separately in the output with a “Skipped (currently checked out)” notice.
- Without
--force, the full list is printed and the user must type"yes"to proceed. This is intentional –git branch -Dis irreversible from git’s perspective (the archive is the only recovery path). --dry-runprints the same list and count as a real run but skips the confirmation prompt and does not delete anything.
arx sync – Union Merge Algorithm
sync is only meaningful when both arx.storerefs and arx.storefile are enabled, since it reconciles two backends that can theoretically drift.
When drift happens:
In normal usage, drift should not occur – every write operation hits both backends atomically (within the script). Drift can arise from:
- Someone manually edits
.gitarchivewith a text editor - Someone manually creates/deletes refs with raw git commands
- A script crash between the file write and the ref write
- Running
git fetch origin 'refs/arx/*:refs/arx/*'directly instead ofgit arx pull(updates local arx refs but bypasses the tracking namespace and the file backend)
Algorithm:
for each branch in (refs ∪ file):
refs-only → write to file
file-only → write to refs
both, same SHA → no-op
both, different SHA → conflict
Non-conflicting entries are always processed. A conflict does not block other entries from being synced. After processing all entries, if any conflicts occurred, sync exits with status 1.
--dry-run: Runs the same comparison logic and prints the same output as a real sync, but skips all writes. A trailing (dry run – no changes written) line is appended. Works with or without --force-file / --force-refs – output shows exactly what would happen if the flag were run without --dry-run.
--force-file / --force-refs: When a SHA conflict is detected and a force flag is present, the designated backend is treated as the source of truth and the other is overwritten. This is an escape hatch for the rare case where the user knows which side is correct.
Shell Completion
git-arx-completion.bash defines _git_arx() — the function name git-completion.bash looks for when the user presses Tab after git arx. The naming convention is the external command name with hyphens replaced by underscores.
The function is context-aware: it offers different completions depending on the subcommand at words[2]:
- Subcommand names at
cword == 2(including aliases:ls,rm,mv) - Archived branch names (via
git arx list) forcheckout,log,remove/rm,rename/mv - Local branch names (via
__git_heads) foradd - Per-subcommand flags for everything else
Archived branch names are fetched by calling git arx list at tab-press time and stripping the two header lines with awk NR > 2. This is a subprocess invocation on every completion for those commands — fast enough in practice, but noticeable on repos with very large archives.
Testing
The test suite lives in test.sh and is an integration test suite – it runs the actual git-arx script against real git repositories created in a temporary directory. No mocking.
Running the tests
bash test.sh
No install required. The script resolves the path to git-arx relative to its own location, so it works from any working directory.
Structure
The suite is organized into sections, each exercising one command or scenario:
test_help git arx help / -h
test_add git arx add (normal, conflict, --force, archive-name)
test_remove git arx remove
test_rename git arx rename
test_list git arx list (sorting, --author, --storage filter)
test_update git arx update (--dry-run, --force, conflicts, already-safe)
test_log git arx log (passthrough flags)
test_checkout git arx checkout (restore, gc'd commit)
test_prune git arx prune (--dry-run, --force, current branch skipped)
test_merge git arx merge (dedup, conflicts)
test_refs_backend refs-only storage
test_both_backend both backends enabled (union reads, sync)
test_push_pull git arx push / fetch / pull (requires a bare remote)
test_sync git arx sync (--dry-run, --force-file, --force-refs)
test_slashed_branches branch names with slashes
test_double_add idempotency of add
test_error_cases unknown commands, missing args, bad config
Each section uses assert_ok, assert_fails, and assert_out helpers. assert_out greps the combined stdout+stderr for a fixed string – tests are intentionally coarse-grained (output substring match) rather than exact, so minor wording changes in messages don’t break the suite.
Test isolation
Each test section resets the archive state via reset_archive() before running. This deletes .gitarchive and removes all refs/arx/ refs, then resets storage to file-only. Branches deleted during a test are recreated by recreate_branches() where needed.
The entire repo lives in a mktemp -d temporary directory and is cleaned up via a trap ... EXIT at the end of the run.
Versioning
git-arx uses commit hashes as its version identifier. The source file always contains VERSION="dev" — this placeholder is replaced at install time by install.sh.
How the stamp works:
- Local install (
bash install.sh):install.shreads the short commit hash from the repo alongside it viagit rev-parse --short HEADand writes it into the installed file usingsed. - Remote install (
curl ... | bash):install.shqueries the GitHub API for the latest commit onmasterand writes that hash into the installed file.
After install, git arx --version reports the short commit hash that was current at install time (e.g. git-arx abc1234). To check whether the installed version is up to date, compare against the latest commit on master in the repository.
VERSION="dev" in the source is intentional — it is never manually edited. Do not commit a real hash into the source file.
Known Limitations
- No locking. The archive file has no write lock. Concurrent invocations (unlikely for an interactive tool) could corrupt it. Acceptable trade-off.
- bash 4+ required. Uses
declare -Aassociative arrays. macOS ships bash 3.2; users need to install bash via Homebrew and ensure it’s on their PATH. mergedoes not resolve SHA conflicts. When the same branch appears in two files with different SHAs, the conflict is reported and the entry is skipped. The user must manually decide which SHA is correct and edit the output file. There is no--force-file/--force-refsequivalent formerge(unlikesync) becausemergehas no concept of a “primary” source.push/pullhardcodesorigin. The remote name is not configurable. This could be added viaarx.remoteconfig in a future version.- Bash completion only.
git-arx-completion.bashcovers bash. zsh and fish completion scripts are not provided.