Using Cygwin, Rsync, SSH and the internet to backup my XP computer

update 11-9-11: consider using DeltaCopy instead of my home-brewed method below. I haven’t used it yet, but it looks nice!

I want to back up my Windows XP computer to another Windows XP computer using Rsync and SSH. Since Rsync only runs under *nix, I’m running it under Cygwin.

There are a couple hurdles to doing this. The one I got stuck on was getting SSH to work without a password.

It’s so freaking simple to do. You just have to know which instructions do not help accomplish your goal.

First, ignore these instructions from pigtail.net. They don’t work. Ignore these instructions from Berkeley.edu. They don’t work. Follow these instructions from Freebsd.org. They work. Thank you Mike!!

1) Generate your keys on your local machine:
– ssh-keygen -t rsa
2) When prompted for a password, do not enter one. This will generate a password-less key called id_rsa, and a public key id_rsa.pub
3) Copy the id_rsa.pub key over to the machine you want to ssh to. NOTE: change the filename first or you may overwrite your existing RSA key for your remote host. Like this:
– local: cp id_rsa.pub local.key
– scp local.key remote:
– cat local.key >> .ssh/authorized_keys

At this point, your remote machine should accept a passwordless login from that “local” machine.

Mike
Michael K. Smith NoaNet


Here’s the rest of it….

Let’s say you want to back up your computer… The most protected thing you could do would be to make encrypted offsite backups on a regular or even constant basis. But toting around offsite backups are a real bother.

The best (inexpensive and easy to maintain) system I’ve come up with is thus:

Two WinXP computers over high speed internet. Both computers have Cygwin (installed: ssh, sshd, rsync) and Truecrypt. Every so often, I rsync a backup to the Truecrypted file system. A full 50 gigabyte backup takes about 2 weeks of continuous file transfer (at 50KB/sec) but subsequent backups take a few hours overnight. It’s better if the initial backup is done locally but what the hey, it’s only bandwidth.

I’ve only recently gotten the system going so I’ll keep you informed as to how it’s working for me.

To protect the backup machine, I use Truecrypt. I also use “Windows Tweakui | My Computer | Drives” to hide the Truecrypted partition on the backup machine. The best I could figure for using Truecrypt is to VNC into the backup machine and decrypt the partition before running a backup process. I’ve discussed this topic previously in my blog. Go searching.

update 9-22-05
I had to restart the rsync session several times because the following error would stall things

…Lullaby.mp3
Received disconnect from [IP removed]: 2: Corrupted MAC on input.

I googled a little and found some folks on Redhat.com or someplace like that saying how they don’t have a fix for it but believe it to be a bug in a Linksys router. My Netgear router has the problem as well so phoey.

4.4 gig moved in 20 hours. Yup. It looks like the initial backup will be a 2 week affair. The 75 KB/sec transfer isn’t hindering my normal system usage which is a good thing. When I move files over my local network with rsync at about 1.5 MB/sec, both computers are virtually unusable during the transfer. (update, there’s a fix for that. Check the comments)

Stopping and restarting the rsync is solid proof that I’m on the right track. It takes 30-45 seconds for the computers to sift through the first 4.4 gigabytes for changed files and then resume the transfer.

update 9-23-05 I had some trouble trying to install cron (so I could restart failed rsync processes) so I reinstalled Cygwin from scratch. I observe 2 things: 1) Reinstalling this entire pseudo operating system was a SNAP 2)I haven’t had the failure in several hours even though I didn’t do anything.

update 9-24-05I installed cron and I run this every hour to restart everything:

#!/bin/bash

# kill all of the rsync processes (actually, shoot all the tcsh processes… they are running rsyncs. Then kill the rsync processes

ps | grep tcsh | awk ‘{print $1}’ | xargs kill -9
ps | grep rsync | awk ‘{print $1}’ | xargs kill -9
ps | grep ssh | awk ‘{print $1}’ | xargs kill -9

# start a backup or two
/home/lee/backup.txt >> /home/lee/backup-log.txt &
#/home/lee/backup2.txt >> /home/lee/backup2-log.txt &

But grrr. After a restart, the file it was working on is restarted from the beginning. That means if it takes more than 1 hour to move an individual file (at my speed, about a 0.25 gig file), my cron job will spin around in circles forever… …

Rsync Options to the rescue! “–partial-dir=.rsync-partial” fixes that problem.

Here is the script file I run (I’ve stripped out some of the boring folders that I backup):

#!/bin/csh

echo “———— Begin Rsync ———— ”
echo -n “start on: ” ; date

# notes:
# If you want to push the output to a file, you might do something like this:
# ./backup-booty.txt > booty-log.txt 2>&1
#
# This is being run with Cygwin on WinXP on both sides. Several folders initially
# (for some unknown reason) had default permissions of 000.
# This caused the first rsync to work and subsequent transfers to crash due to file permission errors.
# To fix this, I used Cygwin to set file permissions on the client-side to 700.
#
# If the file transfer gets mucked up and the files on the server won’t delete,
# run a ‘chmod -R 777 *’ on them
#
# It might be useful to run it with the output to a file for analysis later.
# IE: backup-booty.txt > bootylog.txt

setenv Rsync “nice +1 rsync”
setenv OPTS “–verbose –partial-dir=.rsync-partial –compress –recursive –times –delete –rsh=/usr/bin/ssh”

# useful options:
# –bwlimit=50 limit bandwidth to 50KB/sec
# –progress: good for interactive sessions, bad for logged sessions
# –verbose –verbose –verbose: shows exactly what’s going on. You might want to pipe the output to a file.
# –compress: good for slow connections, bad for high speed
# –whole-file: good for fast connections. It won’t use the rsync algorythm so it won’t bog down the CPU.

# I don’t run with “-a” because it could be good to strip all that permission stuff when restoring the backup

setenv BackupDestination “Owner@mydomain.com:/cygdrive/f/”
#setenv BackupDestination “/cygdrive/f/”
#setenv BackupDestination “cg:/cygdrive/d/”

setenv BackupFolder “current-backup”

echo “rsyncing to: $BackupFolder and $BackupFolderD”
echo “Options specified: $Rsync, $OPTS”

# ———————————————————————

echo “Set up backup folder”
date
# … by dropping this very program into the folder. We do this because rsync can’t create sub-sub-folders blindly
$Rsync $OPTS $0 \
$BackupDestination/$BackupFolder/

#echo “Outlook”
date
$Rsync $OPTS /cygdrive/c/Documents\ and\ Settings/Lee/Local\ Settings/Application\ Data/Microsoft/Outlook* \
$BackupDestination$BackupFolder

echo “My Documents sans My Music and My Videos”
date
$Rsync $OPTS –exclude=”My Music” –exclude=”My Videos” /cygdrive/c/Documents\ and\ Settings/Me/My\ Documents* \
$BackupDestination$BackupFolder

echo “Firefox bookmarks”
date
$Rsync $OPTS /cygdrive/c/Documents\ and\ Settings/Lee/Application\ Data/Mozilla/Firefox/Profiles/41e9243n.default/bookmarks.html \
$BackupDestination$BackupFolder/Firefox-Bookmarks/

echo “Firefox extentions”
date
$Rsync $OPTS /cygdrive/c/Documents\ and\ Settings/Lee/Application\ Data/Mozilla/Firefox/Profiles/41e9243n.default/extensions* \
$BackupDestination$BackupFolder/Firefox-Extentions/

echo “Start Menu”
date
$Rsync $OPTS /cygdrive/c/Documents\ and\ Settings/Lee/Start\ Menu* \
$BackupDestination$BackupFolder

echo -n “— finished rsync on ” ; date

update 9-25-05 8pm 19 gig moved. I get about 5 GB/day when it runs continuously… my estimates were correct :-)

update 9-26-05 9am 22 gig moved. If I run 2 rsync sessions at once, I get 6 GB/day. The computer lags a bit though. 3 sessions doesn’t improve performance. It takes about 60 seconds for Rsync to power-up so I changed the cron job to restart the download every 2 hours instead of every hour. “* */2 * * * /home/lee/restart-backup-cron.txt”

update 9-29-05 12:26am, enabled –compression. 33.4 GB on the remote machine. . . . . 8:30pm 9-29-05, 37.9 GB… thats 5.4 GB per day. A slight improvement over the previous 5GB/day. I keep most of my files compressed as jpg, mp3, avi, zip, etc…

I think it might be that the first time a folder is checked, it bogs down the CPU, subsequent checks are 100 times faster and don’t bog down the machine.

update 10-3-05 12:00am 53.7 GB.. actually, I’ve been PCAnywhere-ing into the machine and getting that read with Windows Explorer. When I SSH in, ‘df’ says 56.6 GB. Bigger numbers are better numbers so there. I set the cron to restart every hour instead of every 2.

update 10-7-05: To make cron run as a service and start automatically when I start my computer (so I can run the cron jobs that start the rsyncs that swallow the flies) I opened a Cygwin shell prompt and typed this

cygrunsrv -I cron -p /usr/sbin/cron -a -D

It makes it an automatically starting Service. :-( (Thanks)

10-8-05 update after 15 days and 77 GB, I have a backup in New Jersey. When doing updates, if there are no changes, Rsync verifies this fact in less than 4 minutes. That is astoundingly phenominally wonderfully fast. All hail the Rsync. And good night!

10 Comments

  1. Tim says:

    >When I move files over my local network with rsync at about 1.5 MB/sec,
    >both computers are virtually unusable during the transfer

    Cool :)

    If it helps, look at the -whole-files option when transferring a directory tree within your lan. (Its probably not spelled correctly) – sounds like your boxen are using the rsync algorithms which are more suited to internet transfers than lan transfers, for reasons that should be clear now… they spend more time crunching bytes than they do transmitting.

  2. Lee says:

    Hurray! By using “–whole-files”, I get about 2.3 MB/sec over the local network instead of 1.5 (and there is a 802.11G connection in the mix!)

    And the computers are usable during the transfer! Thank you! I was stymied by this partially because Windows Task Manager wasn’t indicating that the CPU was being stressed at all… Cygwin or rsync tricked the Task Manager.

    Thanks!

  3. Lee says:

    I don’t use this backup method any more. I just signed up for a really good off-site internet backup service for Windows and Mac called Mozy. I’ll give you $10 if you use my referral code when you sign up. Go here to find out more about Mozy and the discount.

  4. mangoo says:

    Did you use the whole Cygwin installation, or did you use “stripped down” versions of sshd and rsync (windows executables and needed binaries only)?

  5. lee says:

    To do the Cygwin backup I used the full installation. It might work with stripped down versions but I can’t tell.

  6. ugh. says:

    Mozy is terrible. It crashes on me a lot, and the computer is basically unusable when it is working. I have a windows task manager entry just to start and stop it so that it doesn’t get in the way, and I’m only backing up a small portion of the files I want to, because it’s so intrusive otherwise. I’m hoping to get this rsync solution working instead.

  7. lee says:

    Ugh, I agree. I’ve since switched to Crashplan.

  8. Red Five says:

    DeltaCopy is a Windows version of rsync that integrates with the Task Scheduler and gives you a GUI to configure everything on both ends. It does use the cygwin dll and a couple other related items, but wraps it in a Windows GUI. Of course, it’s harder to incorporate the SSH tunnel in DC, but it can be done using a Hamachi, sorry, LogMeIn Hamachi VPN.

  9. lee says:

    Red Five, thanks very much for pointing out DeltaCopy. That looks like a great replacement for the (slightly?) tortured machinations I describe in this post. I’ve switched to using Crashplan for my backups.

  10. Tim says:

    I used Deltacopy for backups from an old production server.

    It worked (sorta), and will suit some needs, but for industrial strength stuff I’d still highly recommend the extra effort of configuring cwrsync properly.

Leave a Comment

Do not write "http://" or "https://" in your comment, it will be blocked. It may take a few days for me to manually approve your first comment.