Using Amazon S3 via s3sync

I have a VPS at Slicehost that I use to play around with Linux, it is much easier and convenient to use their remote service than dealing with the security, noise, and heat of running my own machine at home. However their VPS doesn’t come with much disk space, the base 256MB plan only allows for 10GB of disk space. So, I needed a way to offload my files so that I can shutdown my server when I don’t need it. Slicehost pro-rates their charges so I don’t keep my VPS active unless I’m using it which saves me money. I did a lot of research before settling on s3sync to save my files. I tried using Jets3t but it requires Java which was slow and it is GUI based which means I had to run VNC to access it. s3sync runs on Ruby and is on the command line which makes it perfect for running over SSH.

s3sync consists of two utilities, s3sync and s3cmd. s3sync is a utility that will keep folders synchronized between Amazon S3 and your hard drive. This utility is perfect for syncing your personal directories with Amazon for backups or to keep your development and production sites in sync. s3cmd is billed as a counterpart to s3sync for managing your Amazon S3 account, however it is also great as a standalone tools for uploading, downloading, deleting, and managing single files. I mostly use s3cmd on my VPS and that is what I’ll focus on here.

Assuming you are staring from a fresh installation, first install Ruby. From Slicehost’s Ubuntu Ruby-on-Rails guide:

apt-get install ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8 libreadline-ruby1.8 libruby1.8 libopenssl-ruby

Then, install the symlinks:

sudo ln -s /usr/bin/ruby1.8 /usr/bin/ruby
sudo ln -s /usr/bin/ri1.8 /usr/bin/ri
sudo ln -s /usr/bin/rdoc1.8 /usr/bin/rdoc
sudo ln -s /usr/bin/irb1.8 /usr/bin/irb

Now, download, unzip, and untar s3sync:

wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
gunzip s3sync.tar.gz
tar -xvvf s3sync.tar

You should now have a directory called s3sync containing the s3sync files. Create a directory called .s3conf and copy the s3confg.yml.example file to it as s3config.yml. This is where s3sync will look for its configuration options. The README file lists additional locations you can specify the configuration options.

mkdir .s3conf
cp ./s3sync/s3config.yml.example ./.s3conf/s3config.yml

Finally, open .s3conf/s3config.yml and edit the aws_access_key_id and aws_secret_access_key. Set them to your S3 access key and S3 secret access key respectively.

You are now set use s3cmd and s3sync. Run the utilities without any arguments to get a listing of the options and commands. I have found the -v (verbose) and –progress (progress bar) options to be great in giving me an update on what is going on during the file load/download process. Some examples:

s3cmd.rb listbuckets –> Lists all your buckets
s3cmd.rb createbucket/deletebucket –> Create and delete buckets
s3cmd.rb list <bucket> –> Lists all tokens in bucket
s3cmd.rb get/put <bucket>:key filename –> Copies filename from/to S3

s3sync can be used to synchronize local folders with buckets:

./s3sync.rb <bucket>:token local_dir –> sync your bucket with the local dir. The –make-dirs option will create local directories as needed for first time downloads.

You will see that s3sync and s3cmd are very capable tools for interacting with S3. Besides the command line, they are also scriptable and you can set up regular syncs and uploads/downloads via cron. They have given me access to virtually unlimited cheap disk space and simplified the complexity of managing my files.

Written by M Kapoor

October 27, 2008 at 4:17 am

Posted in Amazon S3, hosting, slicehost

Tagged with Amazon S3, hosting, s3sync, slicehost

xyzio