Rsync File Matching

2018-06-19

Rsync is a super effective tool. Efficient file transfers and mirroring/syncing files in all sorts of ways. But getting the list of files to transfer can be difficult. This is usually the case when doing backups of file trees where you want to exclude some paths.

Basics

When transferring files usually what you want to do is select a remote directory and specify a local directory to copy the contents of the remote directory into. Here copying the contents of /home/me/ on server into /backup/server_me_home on local filesystem. server_me_home will be created if it does not already exist.

rsync -a server:/home/me/ /backup/server_me_home

Note

Filter Rules

Filter rule matching is a big topic in rsync, so reading the man page can be useful, but it is very long and can be hard to read. This is a discussion of the most common uses. Rules are made up of patterns that match file paths (file or directories names).

Include/Exclude

By default with -a flag the whole tree of files and subdirectories under the source path will be included. To exclude a path use a --exclude= argument with a pattern.

The --include= can override an exclude. To do this it must precede the --exclude= on the command line and all parent directories specified in the rule must be included (ie not excluded). See example.

* and ** pattern operators

* and ** can be used in patterns to match any path. * will match any path excluding slashes. ** Will also include slashes.

Transfer root

If a pattern starts with a / it will match against transfer root, otherwise it will match from the end of the path. The example matches against transfer root.

Example

For example let's say you have a directory /home/me/code with a sub directory work that you want to select but you want exclude all other files and directories in /home/me/code.

rsync -a --include=/code/work** --exclude=/code/** server:/home/me/ /backup/server_me_home

The important parts:

The include comes first.

The /code directory is included. If the exclude was --exclude=/code** instead the directory /code would be excluded. The include rule only matches the specified subdirectory not /code it self, all parents must be included for it to take effect.

The root of the patterns is from the perspective of the transfer, since we have a trailing slash in the source server:/home/me/ the root is the contents of this directory.

Note

Note that in this example another directory with "work" as the prefix would be matched too, this could be fixed by having --include=/code/work --include=/code/work/**.

Trying it out

Easiest way to try this out is to use the -v -n options to add verbosity which lists the files and only do dry run (no actual transfer) respectively. Just create some small directory tree locally and just try to sync it including and excluding as you like.