Usage tips
Since yarsync allows to use a command interface similar to git,
one can synchronize several repositories simultaneously using
myrepos.
Development
Community contributions are very important for free software projects.
If you find this tool useful and want to fix an existing or unknown issue,
clone the repository and install Python packages for tests:
pip install -r requirements.txt
yarsync was tested on ext4 and SimFS on Arch Linux and CentOS.
Tests on other systems would be useful.
Hard links
The file system must support hard links if you plan to use commits.
Multiple hard links are supported by POSIX-compliant and partially POSIX-compliant operating systems,
such as Linux, Android, macOS, and also Windows NT4 and later Windows NT operating systems
[Wikipedia].
Notable file systems to support hard links include [hard links and comparison of file systems from Wikipedia]:
- EncFS (an Encrypted Filesystem using FUSE). Note that it doesn't support hard links when External IV Chaining is enabled (this is enabled by default in paranoia mode, and disabled by default in standard mode).
- ext2-ext4. Standard on Linux. Ext4 has a limit of 65000 hard links on a file.
- HFS+. Standard on Mac OS.
- NTFS. The only Windows file system to support hard links. It has a limit of 1024 hard links on a file.
- SquashFS, a compressed read-only file system for Linux.
Hard links are not supported on:
- FAT, exFAT. These are used on many flash drives.
- Joliet ("CDFS"), ISO 9660. File systems on CDs.
The majority of modern file systems support hard links.
A full list of file system capabilities can be found on Wikipedia.
One can copy data to file systems without hard links, but this will reduce the functionality of yarsync,
and one should take care not to consume too much disk space if accidentally copying files instead of hard linking.
rsync limitations
- Millions of files will be synced very slowly.
- rsync freezes when encountering too many hard links. Users report problems for repositories of 200 G or 90 GB, with many hard links. For the author's repository with 30 thousand files (160 thousand with commits) and 3 Gb of data rsync works fine. If you have a large repository and want to copy it with all hard links, it is recommended to create a separate partition (e.g. LVM) and copy the filesystem as a whole. You can also remove some of older backups.
- rsync may create separate files instead of hard linking them. It can be fixed quickly using the hardlink executable.
Alternatives
Free software that uses rsync includes:
- Back In Time. See previous snapshots using a GUI.
- Grsync, graphical interface for rsync.
- LuckyBackup. It is written in C++ and is mostly used from a graphical shell.
- rsnapshot, a filesystem snapshot utility. rsnapshot makes it easy to make periodic snapshots of local machines, and remote machines over ssh. Files can be restored by the users who own them, without the root user getting involved.
Other syncronization / backup / archiving software:
- casync is a combination of the rsync algorithm and content-addressable storage. It is an efficient way to deliver and update directory trees and large images over the Internet in an HTTP and CDN friendly way. Other systems that use similar algorithms include bup.
- Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. duplicity uses librsync and is space efficient. It supports many cloud providers. In 2021 duplicity supports deleted files, full unix permissions, directories, and symbolic links, fifos, and device files, but not hard links. It can be run on Linux, MacOS and Windows (under Cygwin).
- Git-annex manages distributed copies of files using git. This is a very powerful tool written in Haskell. It allows for each file to track the number of backups that contain it and their names, and it allows to plan downloading of a file to the local storage. This is its author's use case: "I have a ton of drives. I have a lot of servers. I live in a cabin on dialup and often have 1 hour on broadband in a week to get everything I need". I tried to learn git-annex, it was uneasy , and finally I found that it doesn't preserve timestamps (because git doesn't) and permissions. If that suits you, there is also a list of specialized related software. git-annex allows to use many cloud services as special remotes, including all rclone remotes.
- Rclone focuses on cloud and other high latency storage. It supports more than 50 different providers. As of 2021, it doesn't preserve permissions and attributes.
Continuous synchronization software:
- gut-sync offers a real-time bi-directional folder synchronization.
- Syncthing. A very powerful and developed tool, works on Linux, MacOS, Windows and Android. Mostly uses a GUI (admin panel is managed through a Web interface), but also has a command line interface.
- Unison is a file-synchronization tool for OSX, Unix, and Windows. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other (pretty much like other syncronization tools work).
- Dropbox, Google Drive, Yandex Disk and many other closed-source tools fall into this cathegory.
ArchWiki includes several useful scripts for rsync and a list of its
graphical front-ends.
It also has a list of cloud synchronization clients
and a list of synchronization and backup programs.
Wikipedia offers a comparison of file synchronization software and a comparison of backup software.
Git-annex has a list of git-related tools.