Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.
Rsync finds files that need to be transferred using a “quick check” algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file’s data does not need to be updated.
Some of the additional features of rsync are:
- support for copying links, devices, owners, groups, and permissions
- exclude and exclude-from options similar to GNU tar
- a CVS exclude mode for ignoring the same files that CVS would ignore
- can use any transparent remote shell, including ssh or rsh
- does not require super-user privileges
- pipelining of file transfers to minimize latency costs
- support for anonymous or authenticated rsync daemons (ideal for mirroring)
General Info
Rsync copies files either to or from a remote host, or locally on the current host (it does not support copying files between two remote hosts).
There are two different ways for rsync to contact a remote system: using a remote-shell program as the transport (such as ssh or rsh) or contacting an rsync daemon directly via TCP. The remote-shell transport is used whenever the source or destination path contains a single colon (:) separator after a host specification. Contacting an rsync daemon directly happens when the source or destination path contains a double colon (::) separator after a host specification, or when an rsync:// URL is specified.
As a special case, if a single source arg is specified without a destination, the files are listed in an output format similar to “ls -l”.
As expected, if neither the source or destination path specify a remote host, the copy occurs locally.
Rsync refers to the local side as the “client” and the remote side as the “server”. Please don’t confuse “server” with an rsync daemon — a daemon is always a server, but a server can be either a daemon or a remote-shell spawned process.
Setup
In order to use rsync, you will need to ensure that it is installed in the Linux system. In a Debian Linux distro, for example, you can install rsync by running:
# apt install rsync
Once installed, you can use rsync to any machine that you can access via a remote shell (as well as some that you can access using the rsync daemon mode protocol). For remote transfers, a modern rsync uses ssh for its communications, but it may have been configured to use a different remote shell by default, such as rsh or remsh.
You can also specify any remote shell you like, either by using the -e command line option, or by setting the RSYNC_RSH environment variable. Note that rsync must be installed on both the source and destination machines.
Examples of Usage
# rsync -r /home/user /backup
This command allows the user (as root) to copy the entire contents of the home directory for the user to a local backup directory. In this example, the -r option is used to force rsync to copy not only the parent directory but any sub-directories and files as well during the transfer of data to the /backup directory. For this command to work, however, the /backup directory will have to exist already in the Linux system. If not, then you need to create it first prior to running the above command in the Terminal.
When using this simple rsync command to make a backup of the user’s [$HOME] directory, a problem arises in that if we inspect the contents of the /backup directory, we see that all of the files in the /home/user directory and sub directories were successfully copied, but all of the files are now owned by root rather than by the original user. You should also note that much of the metadata associated with the files in the original /home/user directory and its sub-directories has changed as well, such as timestamps on those files. This is a problem, but is one that is easily avoided or corrected during or after the copy process has been run, respectively.
To correct the issue with metadata in the above example, we can run the backup command once again using rsync, but this time, we will add the -a option to the command. This option stands for (archive, -a) which forces rsync to retain as much of the metadata as possible from the original source location when it copies it to the destination.
So, running the rsync command again using the -a option, we see the command typed as:
# rsync -a /home/user /backup
instead. This will ensure that an exact copy of the data structure and contents from the source is created in the destination, including timestamps, permissions, etc. The -a option is actually an option that encompasses seven other options in one.
To summarize, the -a option (or archive option) is a replacement for the following options with the associated functionality:
-r (copies data recursively)
-l (copies the symbolic links)
-p (preserves permissions)
-g (preserves group ownership)
-t (preserves modification times)
-o (preserves the owner)
-D (preserves device files)
Therefore, the
# rsync -rlpgtoD /home/user /backup
command is replaced by the simple:
# rsync -a /home/user /backup
command instead.
If you want to watch the progress of rsync in action, this is possible using another option, the -v option, which stands for verbose. Used in conjunction with the archive option, the command would look like:
# rsync -av /home/user /backup
As I mentioned earlier, rsync can be use remotely as well. That is to say, one can copy files from a local source to a remote destination or vice versa. One very important protocol that you may be aware of for accessing remote systems in Linux is the SSH protocol. The commands and options for copying files to a remote system using SSH are the same so all one needs to do is simply point rsync to the remote server rather than to the local system. An example of this is shown below:
# rsync -av /home/user remoteuser@192.168.1.90:/backup
In the above example, we are recursively copying all files in the /home/user directory and sub-directories, archiving the contents of those directories, and watching rsync perform these actions as the files are copied from the local system to a remote server at IP address of 192.168.1.90 to a directory named /backup under the user account of remoteuser.
Let’s assume for a moment that after copying the files to the remote server, we delete files from the source location and rerun the same command. The rsync command will copy the files from the source location (less the deleted files), and since rsync does not delete files by default, the contents of /home/user and /backup will be different since the first copy process has copied files into /backup that no longer exist in the /home/user directory. So how do we synchronize these two locations using rsync? The way we can do this is to use another option that I’ll introduce here known as –delete. Using it would look like:
# rsync -av --delete /home/user remoteuser@192.168.1.90:/backup
This option will delete any files it finds in the remote server location that are not in the original source location at the time of the subsequent copy processes, thus guaranteeing an exact (synchronized) copy in both locations. The colon (:) following the IP address in the remote server location which precedes the directory of /backup with no intervening spaces, tells rsync where to place the exact copy of the /home/user directory in the remote server
location.
Another option that is a good one to use when performing backups from a local system to a remote server location in Linux is the -b option. This option stands for backup and would be used as follows:
# rsync -avb --delete /src /target
where the /src represents the source directory and /target represents the target directory location on the remote server. What exactly does the -b option do here? This option renames the files at the /target location whose content have changed at the /src location so the original file is preserved and is not overwritten. But to avoid confusion regarding file names being re-written at the /target location, we can use another option called
–backup-dir=
which will move files which would normally be re-written to another directory instead of leaving them in the original /target directory. An example of this is shown below:
# rsync -avb --delete --backup-dir=/backup/incremental /src /target
This command would tell rsync to copy, recursively, archived files, but rather than renaming the files and keeping them in the /backup directory, make a backup copy of the files that would be replaced and move them to the /backup/incremental directory instead, and show me the progress of what you’re doing. By using the
-backup-dir=
option in conjunction with the -b option, the replaced files will not be renamed, they’ll simply be moved to the /backup/incremental (backup directory) instead.
And, finally, if we want to put this information in some meaningful and very useful terms, we can perform the following in the bash that will execute these rsync commands for us to make incremental backups of our /home/user directory or any other for that matter. The following is an example of a series of bash commands that will do just that:
CURRENTDATE = $(date +%m-%d-%Y)
export CURRENTDATE
sudo rsync -avb --delete --backup-dir=/backup/incremental/$CURRENTDATE /src /target
Hence, in the above code, we assign the current date to a variable named CURRENTDATE, then export that variable out so it will be globally recognized. Next, we run the rsync command that will perform a backup of /src to /target and if any files are modified and would normally be rewritten, these files will be copied and moved to a sub-directory of the /backup directory which is a directory with the name of the current date the incremental backup occurred.