Tutorial: Use rsync instead of mv or scp when it really matters

I’ve been running a lot of simulations on the ‘updraft’ parallel computing cluster at the University of Utah.  My input files often have to wait in the queue for quite a while (a few days sometimes) before they can be ran.  The simulations generate large data sets which I then need to use for post-processing.  The directory where these files are created on the cluster is regularly wiped by the administrators to keep space free for other users.  This means that you don’t want to leave important data sitting around on this file system.  I’d been moving it back to my home directory on the cluster using ‘mv’, and eventually transfering it to my workstation using ‘scp’.   This was kind of a pain and took FOREVER!  I also discovered something that caused me to completely abandon ‘mv’ for any data that is even somewhat important.  I was using ‘mv’ to transfer the data to my home directory when I lost my internet connection.  Big deal right.  I logged back in only to find out that the data files had been corrupted by the inturrupted ‘mv’ command.  Now I had to run the simulation all over again to generate a new data file.  Bummer.  I did a little research about ‘mv’ and found that if it is interupted for any reason, it often looses data.  Not good.  Enter rsync.  rsync is a tool which makes a copy of files and directories.  If it gets interrupted, you can simply restart it and it will essentially continue where it left off.  Why not just use cp or scp?  Two reasons.  First if cp or scp is interupted, then issued again it simply restarts.  This is really a problem when the transfer takes an hour an you need the data NOW.  Which brings me to the second reason:  speed.  If you call rsync with the -z flag, it compresses the data before copying it.  On remote file transfers this results in a HUGE speed up.  Of course with rsync, once the files are transferred you need to manually delete the unwanted copy.  You can use ‘rdiff’ to verify that the two copies are in fact identical before deleting the unwanted files.  Did I mention that rsync is also great for backups too?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s