LOGO

Rsync Guide: Syncing Data for Experienced Users

February 11, 2014
Rsync Guide: Syncing Data for Experienced Users

Unlocking Rsync's Full Potential for Comprehensive Data Management

While the rsync protocol is often considered straightforward for routine backup and synchronization tasks, its capabilities extend far beyond basic usage. Many advanced features within rsync can be unexpectedly powerful.

Beyond Basic Backups: A Unified Data Redundancy Solution

This article demonstrates how even users managing substantial data volumes can leverage rsync as a comprehensive solution for all data redundancy requirements.

For those with extensive data collections and a strong commitment to backups, rsync offers a versatile and efficient approach.

Exploring Advanced Rsync Features

The inherent flexibility of rsync allows it to be adapted to a wide range of data management scenarios.

It’s possible to consolidate multiple backup strategies into a single, streamlined rsync-based system.

Benefits of a Single Rsync Solution

  • Simplified Management: Reduce complexity by centralizing data redundancy efforts.
  • Increased Efficiency: Optimize backup processes and minimize resource consumption.
  • Enhanced Reliability: Benefit from rsync’s proven track record for data integrity.

By mastering these advanced techniques, users can transform rsync from a simple utility into a robust and scalable data protection platform.

Ultimately, rsync provides a powerful toolkit for anyone serious about safeguarding their valuable data.

Rsync: A Deep Dive for Experienced Users

Individuals unfamiliar with the core concepts of rsync, or those who utilize it solely for elementary operations, might benefit from reviewing our introductory material. A prior article details the fundamentals of data backup using rsync on Linux systems.

This earlier resource provides installation instructions and demonstrates the tool’s simpler capabilities. Establishing a solid understanding of rsync’s basic functionality, alongside comfort navigating a Linux terminal, is a prerequisite for engaging with this advanced guide.

Prerequisites and Assumptions

This article assumes a working knowledge of the Linux command line. It also presumes you've already successfully installed and used rsync for basic file synchronization tasks.

We will be exploring more complex scenarios and options, building upon the foundation established in our introductory documentation.

Beyond Basic Synchronization

While rsync excels at simple backups, its true power lies in its advanced features. These capabilities allow for highly customized and efficient data transfer and synchronization.

  • Remote Differential Compression: Rsync intelligently transfers only the differences between files, minimizing bandwidth usage.
  • Secure Transfers: Utilizing SSH, rsync ensures data is transferred securely across networks.
  • Preservation of Attributes: File permissions, timestamps, and ownership can be meticulously preserved during synchronization.

Advanced Options and Techniques

Several command-line options unlock rsync’s full potential. Understanding these options is crucial for tailoring the tool to specific needs.

For example, the --delete option enables the removal of files from the destination that no longer exist in the source, ensuring perfect mirroring. Conversely, the --backup and --backup-dir options allow for the creation of backups of files before they are modified or deleted.

Real-World Applications

The versatility of rsync makes it suitable for a wide range of applications.

  • System Backups: Creating comprehensive backups of entire systems.
  • Mirroring Websites: Maintaining identical copies of websites across multiple servers.
  • Data Migration: Efficiently transferring large datasets between servers.

By mastering these advanced techniques, users can leverage rsync to streamline data management and ensure data integrity.

Utilizing rsync on Windows Systems

To begin, it’s important to ensure all users, including those on Windows, have a clear understanding of the process. While rsync is natively designed for Unix-based operating systems, its functionality can be readily accessed within a Windows environment.

Cygwin provides a robust Linux Application Programming Interface (API) that enables the execution of rsync on Windows. Therefore, visiting the Cygwin website and downloading the appropriate version – either 32-bit or 64-bit, based on your system architecture – is the initial step.

The installation process is generally uncomplicated. You can proceed with the default settings throughout most of the setup.

However, special attention is required when reaching the "Select Packages" screen.

Package Selection for rsync

It is necessary to also install Vim and SSH alongside rsync. The package names may differ slightly during selection, so visual aids are provided below.

Here's how the Vim installation selection appears:

the-non-beginners-guide-to-syncing-data-with-rsync-2.jpg

And this is what the SSH installation selection looks like:

the-non-beginners-guide-to-syncing-data-with-rsync-3.jpg

After selecting these three essential packages, continue clicking "Next" to complete the installation procedure.

Once finished, you can launch Cygwin by clicking the desktop icon created during installation.

rsync Commands: From Basic to Advanced

Having established a common understanding, let's examine a straightforward rsync command and demonstrate how incorporating advanced options can increase its complexity.

Imagine you need to back up a collection of files – a common requirement in today’s digital world. You connect your portable hard drive to initiate the backup process and execute the following command:

rsync -a /home/geek/files/ /mnt/usb/files/

Alternatively, on a Windows system utilizing Cygwin, the command would appear as:

rsync -a /cygdrive/c/files/ /cygdrive/e/files/

This is a relatively simple operation, and in this scenario, rsync might not be necessary, as files could be copied and pasted manually. However, if the destination drive already contains some files and only updated versions and newly created files need to be transferred, this command proves valuable. It efficiently transmits only the new data, which is particularly beneficial when dealing with large files or transferring data over a network.

Storing backups on an external hard drive in the same physical location as your computer presents a significant risk. Therefore, let’s explore the steps required to transmit your files over the internet to a remote computer – perhaps a rented server or a family member’s machine.

rsync -av --delete -e 'ssh -p 12345' /home/geek/files/ geek2@10.1.1.1:/home/geek2/files/

This command sends your files to a computer with the IP address 10.1.1.1. It removes any extraneous files from the destination that are no longer present in the source directory, displays the names of the files being transferred for monitoring progress, and establishes a secure connection through SSH on port 12345.

The

-a -v -e --delete

options represent some of the most fundamental and frequently used switches. If you are reading this guide, you likely already possess a solid understanding of their functions. Now, let's delve into other switches that are often overlooked but can be incredibly useful:

--progress

- This option enables the display of transfer progress for each file. It is especially helpful when transferring large files over the internet, although it can generate excessive output when transferring small files across a fast network.

An rsync command utilizing the

--progress

switch during a backup operation is shown below:

the-non-beginners-guide-to-syncing-data-with-rsync-4.jpg
--partial

- This switch is particularly advantageous when transferring large files over the internet. If the rsync process is interrupted during a file transfer, the partially transferred file is preserved in the destination directory. The transfer resumes from the point of interruption when the rsync command is executed again. When transferring substantial files over an internet connection, a brief outage, system crash, or user error can disrupt the process, making the ability to resume transfers invaluable.

-P

- This switch combines the functionality of

--progress

and

--partial

, offering a more concise command by integrating both options.

-z

or

--compress

- This switch instructs rsync to compress file data during transfer, reducing the amount of data transmitted. While commonly used, it isn't always essential, primarily benefiting transfers over slower connections. It has no effect on files with the following extensions: 7z, avi, bz2, deb, g,z iso, jpeg, jpg, mov, mp3, mp4, ogg, rpm, tbz, tgz, z, zip.

-h

or

--human-readable

- If you are utilizing the

--progress

switch, this option is highly recommended. Unless you prefer manually converting bytes to megabytes, the

-h

switch converts all numerical output to a human-readable format, making it easier to understand the amount of data being transferred.

-n

or

--dry-run

- This switch is crucial when initially developing and testing your rsync script. It performs a trial run without making any actual changes, displaying the intended modifications. This allows you to review the output and ensure accuracy before deploying the script to a production environment.

-R

or

--relative

- This switch is necessary when the destination directory does not already exist. We will employ this option later in this guide to create directories on the target machine with timestamps incorporated into the folder names.

--exclude-from

- This switch links to an exclude list containing directory paths that you wish to exclude from the backup process. The list should be a plain text file with one directory or file path per line.

--include-from

- Similar to

--exclude-from

, this switch links to a file containing directories and file paths that you want to include in the backup.

--stats

- While not a critical switch, it can be useful for system administrators to monitor detailed statistics for each backup, such as network traffic.

--log-file

- This option directs the rsync output to a log file. We strongly recommend this for automated backups where you cannot directly observe the output. Regularly reviewing log files ensures proper operation. It is also an essential switch for system administrators to troubleshoot backup failures.

Now, let's revisit our rsync command with the additional switches we've discussed:

rsync -avzhP --delete --stats --log-file=/home/geek/rsynclogs/backup.log --exclude-from '/home/geek/exclude.txt' -e 'ssh -p 12345' /home/geek/files/ geek2@10.1.1.1:/home/geek2/files/

The command remains relatively straightforward, but we haven't yet established a robust backup solution. Although our files are now stored in two separate physical locations, this backup does not protect against a primary cause of data loss: human error.

Snapshot Backups

Data loss can occur due to accidental file deletion, virus corruption, or undesirable alterations to your files. If an rsync backup script is then executed, the compromised data overwrites the backup, rendering the backup solution ineffective.

Recognizing this potential issue, the developer of rsync introduced the --backup and --backup-dir arguments, enabling differential backups. The initial example on rsync’s official website demonstrates a strategy of performing a full backup weekly, followed by daily backups of only the changed files. However, restoring files using this approach requires multiple recovery steps.

Challenges with Traditional Differential Backups

Furthermore, frequent backups – even several times daily – can quickly accumulate a large number of backup directories. This not only complicates file recovery but also makes browsing backed-up data extremely time-consuming, as identifying the most recent version of a file necessitates knowing its last modification time. Running incremental backups infrequently, such as weekly, is also inefficient.

Snapshot backups offer a more streamlined solution! They are essentially incremental backups that leverage hardlinks to preserve the original file structure. This concept might initially seem complex, so let’s illustrate with an example.

Imagine a backup script that automatically backs up data every two hours. Each backup is named using a format like: Backup-month-day-year-time.

At the end of a typical day, the destination directory would contain a list of folders resembling this:

the-non-beginners-guide-to-syncing-data-with-rsync-5.jpg

Within each of these directories, you’d find every file from the source directory as it existed at that specific time. Crucially, there would be no duplicated files across any two directories. Rsync achieves this efficiency through the use of hardlinking, utilizing the --link-dest=DIR argument.

Implementing Snapshot Backups with Rsync

To benefit from these neatly dated directory names, some modifications to your rsync script are necessary. Let's examine a script designed for this purpose, followed by a detailed explanation:

#!/bin/bash
#copy old time.txt to time2.txt
yes | cp ~/backup/time.txt ~/backup/time2.txt
#overwrite old time.txt file with new time
echo `date +"%F-%I%p"` > ~/backup/time.txt
#make the log file
echo "" > ~/backup/rsync-`date +"%F-%I%p"`.log
#rsync command
rsync -avzhPR --chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r --delete --stats --log-file=~/backup/rsync-`date +"%F-%I%p"`.log --exclude-from '~/exclude.txt' --link-dest=/home/geek2/files/`cat ~/backup/time2.txt` -e 'ssh -p 12345' /home/geek/files/ geek2@10.1.1.1:/home/geek2/files/`date +"%F-%I%p"`/
#don't forget to scp the log file and put it with the backup
scp -P 12345 ~/backup/rsync-`cat ~/backup/time.txt`.log geek2@10.1.1.1:/home/geek2/files/`cat ~/backup/time.txt`/rsync-`cat ~/backup/time.txt`.log

This script represents a typical snapshot rsync implementation. Let's break down its functionality step-by-step.

Script Breakdown

Initially, the script copies the contents of time.txt to time2.txt, confirming the overwrite with the 'yes' command. Subsequently, the current date and time are written into time.txt. These files serve a crucial purpose later in the process.

The next step involves creating the rsync log file, named rsync-date.log, where 'date' reflects the actual date and time.

Now, we arrive at the core rsync command, which incorporates several key arguments:

-avzhPR, -e, --delete, --stats, --log-file, --exclude-from, --link-dest

- These switches have been previously discussed; refer back to earlier sections if a refresher is needed.

--chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r

- These settings define the permissions for the destination directory. As the directory is created within the script, specifying permissions ensures the user has write access.

Understanding the Date and Cat Commands

We will now examine each instance of the date and cat commands within the rsync command, in the order they appear. It’s important to note that alternative methods exist, particularly through variable declarations, but this guide utilizes this approach for clarity.

The log file is designated as:

~/backup/rsync-`date +"%F-%I%p"`.log

Alternatively, it could be specified as:

~/backup/rsync-`cat ~/backup/time.txt`.log

Regardless, the --log-file command should locate the previously created dated log file and write to it.

The link destination is defined as:

--link-dest=/home/geek2/files/`cat ~/backup/time2.txt`

This instructs the --link-dest command to use the directory from the previous backup. If backups run every two hours and the script executes at 4:00 PM, the command searches for the directory created at 2:00 PM, transferring only changed data (if any).

As previously mentioned, time.txt is copied to time2.txt at the script's beginning to allow the --link-dest command to reference that time.

The destination directory is specified as:

geek2@10.1.1.1:/home/geek2/files/`date +"%F-%I%p"`

This command places the source files into a directory named with the current date and time.

Finally, a copy of the log file is placed within the backup:

scp -P 12345 ~/backup/rsync-`cat ~/backup/time.txt`.log geek2@10.1.1.1:/home/geek2/files/`cat ~/backup/time.txt`/rsync-`cat ~/backup/time.txt`.log

Secure copy (scp) on port 12345 transfers the rsync log to the appropriate directory. The time.txt file is referenced via the cat command to ensure the correct log file is selected and placed in the right location. Using cat time.txt instead of the date command accounts for potential time elapsed during the rsync command’s execution, guaranteeing the accurate time is used.

Automated Synchronization

To schedule regular backups or synchronizations, leverage Cron on Linux systems or Task Scheduler on Windows to automate your rsync script. A crucial consideration is ensuring that any existing rsync processes are terminated before initiating a new synchronization.

While Task Scheduler generally handles the closure of pre-existing instances automatically, Linux requires a more proactive approach to process management.

Utilizing pkill for Process Termination

The pkill command is a readily available tool on the majority of Linux distributions. Integrating this command into your rsync script ensures a clean start for each scheduled run.

To implement this, include the following line at the very beginning of your script:

pkill -9 rsync

This command forcefully terminates any running rsync processes, preventing potential conflicts and ensuring the script operates as intended during automated scheduling.

Data Encryption for Enhanced Security

Even after establishing a robust and cost-effective backup strategy, the security of your files remains a critical concern. While offsite backups offer protection against localized disasters, they are still vulnerable to potential theft or unauthorized access. It’s important to remember that physical security is just as vital as digital security.

When utilizing tools like rsync, data is encrypted during transmission via SSH, safeguarding it while it travels to its destination. However, this encryption only applies during the transfer process. Once the files arrive, they exist in an unencrypted state, potentially exposing them to risk.

A notable characteristic of rsync is its ability to transfer only the modified portions of files. Should encrypted files undergo even minor alterations, the entire file must be re-transmitted due to the encryption process randomizing the data with each change. This can impact efficiency.

Therefore, implementing disk encryption is highly recommended. Solutions like BitLocker for Windows and dm-crypt for Linux provide comprehensive data protection, even in the event of physical theft. This allows for efficient rsync transfers without compromising security. Alternatives like Duplicity exist, but often lack the full feature set offered by rsync.

Having successfully configured offsite snapshot backups and secured both your source and destination drives with encryption, you’ve created a remarkably resilient data backup solution. This combination provides a high level of protection against data loss and unauthorized access.

Useful Linux Commands

  • File Management: tar, pv, cat, tac, chmod, grep, diff, sed, ar, man, pushd, popd, fsck, testdisk, seq, fd, pandoc, cd, $PATH, awk, join, jq, fold, uniq, journalctl, tail, stat, ls, fstab, echo, less, chgrp, chown, rev, look, strings, type, rename, zip, unzip, mount, umount, install, fdisk, mkfs, rm, rmdir, rsync, df, gpg, vi, nano, mkdir, du, ln, patch, convert, rclone, shred, srm, scp, gzip, chattr, cut, find, umask, wc, tr
  • Process Management: alias, screen, top, nice, renice, progress, strace, systemd, tmux, chsh, history, at, batch, free, which, dmesg, chfn, usermod, ps, chroot, xargs, tty, pinky, lsof, vmstat, timeout, wall, yes, kill, sleep, sudo, su, time, groupadd, usermod, groups, lshw, shutdown, reboot, halt, poweroff, passwd, lscpu, crontab, date, bg, fg, pidof, nohup, pmap
  • Networking: netstat, ping, traceroute, ip, ss, whois, fail2ban, bmon, dig, finger, nmap, ftp, curl, wget, who, whoami, w, iptables, ssh-keygen, ufw, arping, firewalld

Further Reading: Top Linux Laptops for Developers and Tech Enthusiasts

#rsync#data synchronization#file transfer#linux#command line#backup