tar over SSH¶
Introduction¶
As described on my backup page, I headlessly manage backups of my family's Macs over SSH, and occasionally I need to add or update configuration files, scripts and binaries. Instead of dealing with files individually, I prefer to transfer all relevant files in one operation, make changes locally, then transfer everything back in one operation, preserving file ownership and permissions. This is possible by running tar over SSH.
What follows is an example of managing backup-related files on a family member's Mac laptop.
Example¶
Note
All commands are run on my own Mac, not on the remote Mac.
Set some shell variables that we'll reference throughout.
SSH_ALIAS=foo-mac # (1)!
MANIFEST_ORIGINAL="${SSH_ALIAS}.original.manifest" # (2)!
MANIFEST_MODIFIED="${SSH_ALIAS}.modified.manifest" # (3)!
TAR_ORIGINAL="${SSH_ALIAS}.original.tar" # (4)!
TAR_MODIFIED="${SSH_ALIAS}.modified.tar" # (5)!
USER_MAP="${SSH_ALIAS}.user.map" # (6)!
GROUP_MAP="${SSH_ALIAS}.group.map" # (7)!
-
This is an SSH host alias in
~/.ssh/config
such that I can SSH to the remote Mac viassh foo-mac
. -
This file is a manifest of remote files and directories to transfer locally.
-
This file is a manifest of local files and directories to transfer remotely.
-
This file is a TAR of the files in
${MANIFEST_ORIGINAL}
. -
This file is a TAR of the files in
${MANIFEST_MODIFIED}
to be transferred remotely. -
This file maps local user IDs to user names and user IDs within the TAR.
-
This file maps local group IDs to group names and group IDs within the TAR.
Before going further, define these shell functions as aliases of the following programs.
awk
→gawk
(GNU awk)find
→gfind
(GNU find)sed
→gsed
(GNU sed)tar
→gtar
(GNU tar)
awk() { "$(whence -p gawk)" "${@}" ; }
find() { "$(whence -p gfind)" "${@}" ; }
sed() { "$(whence -p gsed)" "${@}" ; }
tar() { "$(whence -p gtar)" --format=posix "${@}" ; }
Create a file manifest…
…and add resolved absolute pathnames of the files to archive. Pathnames must not have a trailing slash.
# vim: set ft=cfg :
/Library/Preferences/com.soma-zone.LaunchControl.fdautil.plist
/Users/foo/.config/rclone
/Users/foo/.config/resticprofile
/Users/foo/Library/LaunchAgents/com.manselmi.resticprofile.foo_mac.backup.plist
/usr/local/bin/exec-rclone
/usr/local/bin/exec-resticprofile
/usr/local/bin/rclone
/usr/local/bin/restic
/usr/local/bin/resticprofile
Create this Zsh script, which accepts pathnames on stdin and emits a TAR on stdout. See comments for details.
#!/usr/bin/env -S -- zsh -f
# vim: set ft=zsh :
# Stop at any error, treat unset vars as errors and make pipelines exit with a non-zero exit code if
# any command in the pipeline exits with a non-zero exit code.
set -o ERR_EXIT
set -o NO_UNSET
set -o PIPE_FAIL
# If macOS, define the following shell functions as aliases of the following programs (available via
# Homebrew):
#
# awk → gawk (GNU awk)
# find → gfind (GNU find)
# sed → gsed (GNU sed)
# tar → gtar (GNU tar)
#
# https://zsh.sourceforge.io/Doc/Release/Shell-Builtin-Commands.html#index-whence
if [[ "${OSTYPE}" == darwin* ]]; then
awk() { "$(whence -p gawk)" "${@}" ; }
find() { "$(whence -p gfind)" "${@}" ; }
sed() { "$(whence -p gsed)" "${@}" ; }
tar() { "$(whence -p gtar)" --format=posix "${@}" ; }
fi
# For each input pathname such as
#
# foo/bar/baz
#
# print
#
# foo
# foo/bar
# foo/bar/baz
#
# When combined with GNU tar's `--no-recursion` option, this allows us to ensure inclusion of parent
# directories in order to, upon extraction and if necessary, create missing parent directories with
# correct ownership and permissions.
#
# https://www.gnu.org/software/tar/manual/html_section/recurse.html
# https://serverfault.com/a/877313
read -r -d '' AWK_PROG << 'EOF' || true
BEGIN {
FS = "/"
RS = "\0"
ORS = "\0"
}
{
path_component = $1
for (i = 2; i <= NF; i++) {
print path_component
path_component = path_component "/" $i
}
print path_component
}
EOF
# 1. `sed`: Accept null-byte-terminated pathnames from stdin and strip a leading slash (`/`) from
# each pathname.
#
# 2. `find`: Relative to ${TAR_DIRECTORY}, search the directory trees rooted at the supplied
# pathnames and print the pathnames of directories, regular files and symlinks pointing to
# directories or regular files. Suppress warnings regarding nonexistent files.
#
# 3. `awk`: Run the AWK program described above if ${TAR_PARENT_DIRS} is non-empty, otherwise no-op.
#
# 4. `sort`: Sort and deduplicate pathnames.
#
# 5. `tar`: Archive files with given pathnames relative to ${TAR_DIRECTORY}, suppressing warnings
# regarding files that cannot be read. Print the archive to stdout.
typeset -a TAR_PARENT_DIRS_CMD
TAR_PARENT_DIRS_CMD=('cat')
if [[ -n "${TAR_PARENT_DIRS-}" ]]; then
TAR_PARENT_DIRS_CMD=('awk' '--' "${AWK_PROG}")
fi
sed -z -- 's|^/||' \
| (
pushd -q -- "${TAR_DIRECTORY}"
find -- -files0-from - -xtype d,f -print0 2> /dev/null
) \
| "${TAR_PARENT_DIRS_CMD[@]}" \
| env -- LC_ALL=POSIX sort -uz \
| tar \
-cf - \
${TAR_USER_MAP:+"--owner-map=${TAR_USER_MAP}"} \
${TAR_GROUP_MAP:+"--group-map=${TAR_GROUP_MAP}"} \
--directory="${TAR_DIRECTORY}" \
--format=posix \
--ignore-failed-read \
--warning=no-failed-read \
--no-recursion \
--sort=none \
--null \
--files-from=-
Run the script remotely over SSH, optionally piping the output through Pipe
Viewer to monitor throughput. Here,
printf
is a Zsh
builtin.
< "${MANIFEST_ORIGINAL}" \
sed -E -- '/^[[:blank:]]*(#|$)/d' \
| tr '\n' '\0' \
| ssh -o RequestTTY=no -- "${SSH_ALIAS}" sudo -- env -- \
PATH='/var/manselmi/.prefix/bin:/var/manselmi/.prefix/sw/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin' \
TAR_DIRECTORY=/ \
TAR_PARENT_DIRS=1 \
zsh -fc "$(printf '%q' "$(< tar-create.sh)")" \
| pv -W \
> "${TAR_ORIGINAL}"
We now have a TAR we can inspect. List the archive members and ensure what's needed is present.
drwxr-xr-x root/wheel 0 2024-01-06 00:10 Library/
drwxr-xr-x root/wheel 0 2024-01-20 21:25 Library/Preferences/
-rw-r--r-- root/wheel 420 2024-01-19 14:30 Library/Preferences/com.soma-zone.LaunchControl.fdautil.plist
drwxr-xr-x root/admin 0 2024-01-05 21:35 Users/
drwxr-x--- foo/staff 0 2024-01-20 21:26 Users/foo/
drwxr-xr-x foo/staff 0 2024-01-20 21:25 Users/foo/.config/
drwxr-xr-x foo/staff 0 2024-01-05 20:56 Users/foo/.config/rclone/
-rw------- foo/staff 602 2024-01-14 22:25 Users/foo/.config/rclone/rclone.conf
drwxr-xr-x foo/staff 0 2024-01-19 14:30 Users/foo/.config/resticprofile/
drwxr-xr-x foo/staff 0 2024-01-14 22:28 Users/foo/.config/resticprofile/curlrc/
-rw-r--r-- foo/staff 192 2024-01-05 20:30 Users/foo/.config/resticprofile/curlrc/foo_mac
drwxr-xr-x foo/staff 0 2024-01-14 22:28 Users/foo/.config/resticprofile/exclude/
-rw-r--r-- foo/staff 984 2023-12-18 08:22 Users/foo/.config/resticprofile/exclude/base.txt
-rw-r--r-- foo/staff 813 2024-01-06 04:03 Users/foo/.config/resticprofile/exclude/foo_mac.txt
drwxr-xr-x foo/staff 0 2024-01-14 22:28 Users/foo/.config/resticprofile/log/
drwxr-xr-x foo/staff 0 2023-08-20 18:02 Users/foo/.config/resticprofile/log/foo_mac/
-rw-r--r-- foo/staff 4695 2024-01-20 14:03 Users/foo/.config/resticprofile/log/foo_mac/backup.log
-rw-r--r-- foo/staff 2994 2024-01-19 14:30 Users/foo/.config/resticprofile/profiles.toml
drwxr-xr-x foo/staff 0 2024-01-14 22:31 Users/foo/.config/resticprofile/status/
-rw-r--r-- foo/staff 2461 2024-01-20 14:03 Users/foo/.config/resticprofile/status/foo_mac.json
drwx------ foo/staff 0 2024-01-07 16:41 Users/foo/Library/
drwx------ foo/staff 0 2024-01-14 22:30 Users/foo/Library/LaunchAgents/
-rw-r--r-- foo/staff 1005 2024-01-14 22:30 Users/foo/Library/LaunchAgents/com.manselmi.resticprofile.foo_mac.backup.plist
drwxr-xr-x root/wheel 0 2023-12-15 09:43 usr/
drwxr-xr-x root/wheel 0 2024-01-06 03:57 usr/local/
drwxr-xr-x root/wheel 0 2024-01-14 22:05 usr/local/bin/
-rwxr-xr-x root/wheel 397 2024-01-05 22:53 usr/local/bin/exec-rclone
-rwxr-xr-x root/wheel 533 2024-01-05 22:53 usr/local/bin/exec-resticprofile
-rwxr-xr-x root/wheel 73065456 2024-01-08 06:19 usr/local/bin/rclone
-rwxr-xr-x root/wheel 27146176 2024-01-14 17:43 usr/local/bin/restic
-rwxr-xr-x root/wheel 16102320 2023-10-24 11:54 usr/local/bin/resticprofile
Before extracting the archive, print a deduplicated table of member user IDs, group IDs, user names and group names. We'll need this later.
paste \
<(
tar -tf "${TAR_ORIGINAL}" --quoting-style=escape --verbose --numeric-owner \
| tr -s '[[:blank:]]' '\t' \
| cut -f 2
) \
<(
tar -tf "${TAR_ORIGINAL}" --quoting-style=escape --verbose \
| tr -s '[[:blank:]]' '\t' \
| cut -f 2
) \
| awk -F '[[:blank:]/]' -- '{ print $1, $2, $3, $4 }' \
| {
printf '%s\t%s\t%s\t%s\n' UID GID UNAME GNAME
sort -u -k 1,1n -k 2,2n
} \
| column -t
Create this Zsh script. See comments for details.
#!/usr/bin/env -S -- zsh -f
# vim: set ft=zsh :
# Stop at any error, treat unset vars as errors and make pipelines exit with a non-zero exit code if
# any command in the pipeline exits with a non-zero exit code.
set -o ERR_EXIT
set -o NO_UNSET
set -o PIPE_FAIL
# If macOS, define the following shell functions as aliases of the following programs (available via
# Homebrew):
#
# tar → gtar (GNU tar)
#
# https://zsh.sourceforge.io/Doc/Release/Shell-Builtin-Commands.html#index-whence
if [[ "${OSTYPE}" == darwin* ]]; then
tar() { "$(whence -p gtar)" --format=posix "${@}" ; }
fi
# Accept TAR from stdin and extract relative to ${TAR_DIRECTORY}, preserving ownership and
# permissions.
tar \
-xf - \
--directory="${TAR_DIRECTORY}" \
--same-owner \
--same-permissions
Create the directory ${SSH_ALIAS}
and extract the archive into it, preserving member ownership and
permission.
sudo -- rm -fr -- "${SSH_ALIAS}"
mkdir -- "${SSH_ALIAS}"
< "${TAR_ORIGINAL}" sudo -- env -- TAR_DIRECTORY="${SSH_ALIAS}" ./tar-extract.sh
We're now ready to create or modify files as needed. sudo
may be required to view or modify files
not owned by our user.
sudo -u \#501 -g \#20 -- mkdir -- "${SSH_ALIAS}/Users/foo/.config/foo"
sudo -u \#501 -g \#20 -- touch -- "${SSH_ALIAS}/Users/foo/.config/foo/bar"
sudo -u \#501 -g \#20 -- touch -- "${SSH_ALIAS}/Users/foo/.config/foobar"
pushd -q -- "${SSH_ALIAS}/Users/foo/.config"
sudo -u \#501 -g \#20 -- ln -s foo baz
popd -q
Now that we've modified the necessary files, let's prepare to archive them. First, make a copy the original manifest and add any new files to include them in the archive. Pathnames must not have a trailing slash.
cp -- "${MANIFEST_ORIGINAL}" "${MANIFEST_MODIFIED}"
cat >> "${MANIFEST_MODIFIED}" << 'EOF'
/Users/foo/.config/baz
/Users/foo/.config/foo
/Users/foo/.config/foobar
EOF
# vim: set ft=cfg :
/Library/Preferences/com.soma-zone.LaunchControl.fdautil.plist
/Users/foo/.config/rclone
/Users/foo/.config/resticprofile
/Users/foo/Library/LaunchAgents/com.manselmi.resticprofile.foo_mac.backup.plist
/usr/local/bin/exec-rclone
/usr/local/bin/exec-resticprofile
/usr/local/bin/rclone
/usr/local/bin/restic
/usr/local/bin/resticprofile
/Users/foo/.config/baz
/Users/foo/.config/foo
/Users/foo/.config/foobar
Create files that map local user names/IDs to remote user names and user IDs, and local group names/IDs to remote group names and group IDs, respectively.
Ensure no executable regular file has the com.apple.quarantine
extended attribute.
sudo -- gfind -- "${SSH_ALIAS}" \
-type f -perm /u=x,g=x,o=x -exec xattr -d com.apple.quarantine -- {} +
Create the archive.
< "${MANIFEST_MODIFIED}" \
sed -E -- '/^[[:blank:]]*(#|$)/d' \
| tr '\n' '\0' \
| sudo -- env -- \
TAR_DIRECTORY="${SSH_ALIAS}" \
TAR_USER_MAP="${USER_MAP}" \
TAR_GROUP_MAP="${GROUP_MAP}" \
./tar-create.sh \
> "${TAR_MODIFIED}"
Diff the original and modified archives as a sanity check. For example, are ownership and permissions correct?
tar-list() {
tar -tf "${1}" --quoting-style=escape --verbose \
| sed -E -- ':a; /^([^\t]*\t){5,}/ b; s/ +/\t/; ta' \
| cut -f 1-2,6-
}
diff -u --color=always <(tar-list "${TAR_ORIGINAL}") <(tar-list "${TAR_MODIFIED}") | less -RS
--- /dev/fd/11 2024-01-20 21:36:07.050059294 -0500
+++ /dev/fd/12 2024-01-20 21:36:07.050322332 -0500
@@ -1,9 +1,8 @@
-drwxr-xr-x root/wheel Library/
-drwxr-xr-x root/wheel Library/Preferences/
-rw-r--r-- root/wheel Library/Preferences/com.soma-zone.LaunchControl.fdautil.plist
-drwxr-xr-x root/admin Users/
-drwxr-x--- foo/staff Users/foo/
-drwxr-xr-x foo/staff Users/foo/.config/
+lrwxr-xr-x foo/staff Users/foo/.config/baz -> foo
+drwxr-xr-x foo/staff Users/foo/.config/foo/
+-rw-r--r-- foo/staff Users/foo/.config/foo/bar
+-rw-r--r-- foo/staff Users/foo/.config/foobar
drwxr-xr-x foo/staff Users/foo/.config/rclone/
-rw------- foo/staff Users/foo/.config/rclone/rclone.conf
drwxr-xr-x foo/staff Users/foo/.config/resticprofile/
@@ -18,12 +17,7 @@
-rw-r--r-- foo/staff Users/foo/.config/resticprofile/profiles.toml
drwxr-xr-x foo/staff Users/foo/.config/resticprofile/status/
-rw-r--r-- foo/staff Users/foo/.config/resticprofile/status/foo_mac.json
-drwx------ foo/staff Users/foo/Library/
-drwx------ foo/staff Users/foo/Library/LaunchAgents/
-rw-r--r-- foo/staff Users/foo/Library/LaunchAgents/com.manselmi.resticprofile.foo_mac.backup.plist
-drwxr-xr-x root/wheel usr/
-drwxr-xr-x root/wheel usr/local/
-drwxr-xr-x root/wheel usr/local/bin/
-rwxr-xr-x root/wheel usr/local/bin/exec-rclone
-rwxr-xr-x root/wheel usr/local/bin/exec-resticprofile
-rwxr-xr-x root/wheel usr/local/bin/rclone
Warning
Observe that unlike when we created the original archive on the remote machine, here we choose not to archive the parent directories of items in our manifest. This is because we assume that for every TAR member the corresponding parent directories already exist on the remote system. If this turns out not to be the case, add the required parent directories to the manifest.
For example, if you add a new regular file Users/foo/a/b/c.conf
, then append all of these
lines to the manifest:
/Users
and /Users/foo
need not be added because we know those directories already exist on
the remote system.
Everything looks good, so delete the extracted files.
Extract the TAR remotely over SSH.