Archive for October 2008
I have a VPS at Slicehost that I use to play around with Linux, it is much easier and convenient to use their remote service than dealing with the security, noise, and heat of running my own machine at home. However their VPS doesn’t come with much disk space, the base 256MB plan only allows for 10GB of disk space. So, I needed a way to offload my files so that I can shutdown my server when I don’t need it. Slicehost pro-rates their charges so I don’t keep my VPS active unless I’m using it which saves me money. I did a lot of research before settling on s3sync to save my files. I tried using Jets3t but it requires Java which was slow and it is GUI based which means I had to run VNC to access it. s3sync runs on Ruby and is on the command line which makes it perfect for running over SSH.
s3sync consists of two utilities, s3sync and s3cmd. s3sync is a utility that will keep folders synchronized between Amazon S3 and your hard drive. This utility is perfect for syncing your personal directories with Amazon for backups or to keep your development and production sites in sync. s3cmd is billed as a counterpart to s3sync for managing your Amazon S3 account, however it is also great as a standalone tools for uploading, downloading, deleting, and managing single files. I mostly use s3cmd on my VPS and that is what I’ll focus on here.
Assuming you are staring from a fresh installation, first install Ruby. From Slicehost’s Ubuntu Ruby-on-Rails guide:
apt-get install ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8 libreadline-ruby1.8 libruby1.8 libopenssl-ruby
Then, install the symlinks:
sudo ln -s /usr/bin/ruby1.8 /usr/bin/ruby
sudo ln -s /usr/bin/ri1.8 /usr/bin/ri
sudo ln -s /usr/bin/rdoc1.8 /usr/bin/rdoc
sudo ln -s /usr/bin/irb1.8 /usr/bin/irb
Now, download, unzip, and untar s3sync:
tar -xvvf s3sync.tar
You should now have a directory called s3sync containing the s3sync files. Create a directory called .s3conf and copy the s3confg.yml.example file to it as s3config.yml. This is where s3sync will look for its configuration options. The README file lists additional locations you can specify the configuration options.
cp ./s3sync/s3config.yml.example ./.s3conf/s3config.yml
Finally, open .s3conf/s3config.yml and edit the aws_access_key_id and aws_secret_access_key. Set them to your S3 access key and S3 secret access key respectively.
You are now set use s3cmd and s3sync. Run the utilities without any arguments to get a listing of the options and commands. I have found the -v (verbose) and –progress (progress bar) options to be great in giving me an update on what is going on during the file load/download process. Some examples:
s3cmd.rb listbuckets –> Lists all your buckets
s3cmd.rb createbucket/deletebucket –> Create and delete buckets
s3cmd.rb list <bucket> –> Lists all tokens in bucket
s3cmd.rb get/put <bucket>:key filename –> Copies filename from/to S3
s3sync can be used to synchronize local folders with buckets:
./s3sync.rb <bucket>:token local_dir –> sync your bucket with the local dir. The –make-dirs option will create local directories as needed for first time downloads.
You will see that s3sync and s3cmd are very capable tools for interacting with S3. Besides the command line, they are also scriptable and you can set up regular syncs and uploads/downloads via cron. They have given me access to virtually unlimited cheap disk space and simplified the complexity of managing my files.
This is the first of several articles I’m going to write about Amazon S3. I’ve done a lot of research and cheap, secure, reliable, and publicly hostable disk space is hard to come by. Sure, there are some companies such as dreamhost and mediatemple that offer enormous amounts of disk space at a low price but often they put restrictions on the file size, locations, and types of files you can save. Other companies even give it away for free, but I wonder how they plan to survive by giving their product away for free. Ads may pay for some of their costs initially, but I doubt they will pay enough as they scale up their operation. I have a feeling that many of them will simply disappear along with the data entrusted to them. I invite you to come back to this post in a year and see how many of these hosts are still around.
My search for a cheap, reliable, and secure place to backup my data ended when I found out about Amazon S3 (Simple Storage Service). S3 is touted as a data storage service and that is all that it is. There is no limitation on the types or quantities of files one can store. They charge you on a pay as you go plan and there are no upload and download limits. You simply pay for what you use. Amazon’s own website is a testament to their experience in building massively scalable services. These are their rates at this time:
$0.15 per GB-Month of storage used
$0.100 per GB – all data transfer in
$0.170 per GB – first 10 TB / month data transfer out
$0.01 per 1,000 PUT, POST, or LIST requests
$0.01 per 10,000 GET and all other requests*
Considering how much other hosts charge, their rates are very reasonable. Their storage and transfer rates have been coming down and I think this trend will continue.
Amazon has built a very simple API with a high level of abstraction for its S3 service. Data is stored in ‘buckets’ and each bucket can contain an unlimited number of objects up to 5GB in size. Objects can be made public or private which makes it ideal for hosting large files as well as using it for backing up your data. In addition, you can choose to store your data in either Europe or the United States which gives your data an additional level of geo-redundancy and allows you to deliver files to users around the globe with minimal latency.
As a results of this simple API, Amazon S3 is very versatile and has a diverse eco-system built up around it. A simple search for amazon s3 tools turns up several tools that abstract away the S3 API and simplify the user experience. Of these, s3sync and JetS3t are my favorite. I use jetS3t to back-up important files on my computer to S3 and s3cmd from the s3sync package to upload files I want to share.
In future columns, I’ll write more about these tools and how I use them to simplify my life.