Anyone have experience working with amazon S3?

blundar · February 20, 2013, 3:47am

I’m highly considering using Amazon’s S3 cloud for my personal offsite backup. Previously, I’ve been running two servers with RAID, with the backup server rsyncing the primary. Power is getting too expensive and that approach still doesn’t net me an OFFSITE backup.

The data I’m primarily concerned with backing up is a subversion (svn) repo. It is figure about 80% binary and 20% source. Current size is about 1.25Gb. Volatility is not that high (figure couple megs/month, TOPS) but it is stored as a BerkleyDB file structure I believe.

The data is currently residing on a FreeNAS / FreeBSD ZFS store exported via NFS to a Ubuntu 12.04 LTS box running Apache / WebDAV that is the primary portal.

I’d like the data on the S3 cloud to be encrypted if possible. I figure S3 will get owned sooner or later and all my personal + business

The options I’ve considered so far:
#1: API call / s3tools cron job to sync files monthly

#2: s3backer to implement a virtual block device, run filesystem over the top, backup files to s3backer filesystem

#3: spin up a SMALL AWS EC2 server, run webdav on the amazon AWS server and use S3 as a storage backbone,

#4: switch to git, spin up a SMALL EC2 server, run gitosis and use S3 as a storage backbone

My analysis so far:
#1 is simple, easy. However, S3 requires any changed files to be uploaded again. I’m not so much concerned with the cost of bandwidth so much as the time involved in transferring over my slow(ish) connection - 2Mbit upstream.

#2 is fancy. Handles differential changes easy. Compression and encryption built in. BUT serious data integrity issues present themselves due to a combination of write caching and the nature of S3. Also much more overhead in terms of transfers and transactions.

#3 might cost marginally more because I’m spinning up a virtual server, but also might be able to leverage resources more effectively by dealing with subversion properly instead of as a blob of binary data.

#4 might make sense because of the ease of which a git clone can be performed. git can do what I need as a version control system, but I started using subversion when I started using these things about 8-9 years ago and I’ve stuck with it. git seems like it would work better than svn for maintaining a local repo for high speed access and periodically backing up just the changed portions elsewhere.

Any and all thoughts are welcome. Cost isn’t really a HUGE concern here as this is for my business and my storage and transfer needs are so small that this isn’t going to cost more than a few dollars a month, regardless of which option I choose. If you think of something else that would work better (dropbox?) I’m open to other ideas.

thanks,
-Dave B.

Chris · February 23, 2013, 7:31pm

When I priced it out, my conclusion was that S3 would be great for stuff that is somewhat small in size (like a few gigs) and of vital importance. as in stuff that’s otherwise irreplaceable.

I considered s3 as a backup store for my NAS, but since the per GB price for transfer in is higher than the month-to-month storage fee pushing my 6TB up to S3 would cost like double to triple the price of a new box. i couldn’t justify those startup costs for a data store that is mostly warez.