Using Amazon S3 To Backup My WordPress Blog

Web Development Add comments

A couple of days ago I wrote a post on my blog about implementing a backup strategy for archiving the website files and database related to this blog. Although I was happy with the result and they do work well I was still a bit concerned because the files were still on the server which meant I had to manually retrieve them off the server to be sure of having a suitable backup should something happen to the server.

I decided to try out Amazon S3 which allows you to store files in “buckets” located in America and Ireland. Authentication is used to secure your data and it uses standards-based REST and SOAP interfaces to allow developers the ability to hook on to the service. They operate a “Pay For What You Use” pricing model of $0.15 a month per GB hosted which is about £0.09 a month. Currently all data transfer is free until 30th June 2010 but even after then it is only $0.10 a month per GB which is about £0.06 a month. The prices change after 50TB of data is being hosted but that is unlikely at present. If this trial is successful then I intend to roll it out across my other websites.

After I had set my account up I then installed Cockpit which is a GUI written in Java for viewing the contents of an Amazon S3 account. It is part of the JetS3t toolkit and is free so ideal for what I need. I used it to create a bucket called blog.ianroke.co.uk which will contain all the archived files for my website.

Next I converted the command line commands I were using here into scripts so that I could continue to tweak them while they were being scheduled. I created a blog_database_backup.sh and a blog_website_backup.sh script and I also created another script called s3_backup_refresh.sh that would refresh the synchronisation between my server and the Amazon S3 server. I wanted to run the database refresh every 15 minutes and the website refresh every hour so that at every 10 minutes past the hour I could run the synchronisation and be sure of a full and complete backup of the database and website had been done.

I made a few tweaks to the scripts while I was doing this too. I created an archive for the latest refreshed file, I also compressed the output from the database refresh which reduced the file size by 90% and I also archived all the archives into one large archive for each day. This kept the number of files down on the server and also on the Amazon S3 server because once this archive was created I deleted all the files that were put in the archive off my server as they were duplicated.

Here is an example of the sort of script I used. Hopefully the comments are self explanatory.

#!/bin/bash

# Set up some variables.
DATE=`date +"%Y%m%d"`
TIME=`date +"%H%M%S"`
DIR=/path/to/database_schema
FILE=database_schema_${DATE}

cd ${DIR}

# Create a dump of the blog database schema.
mysqldump -uUSER -pPASS DB > ${FILE}${TIME}.sql

# Create an archive of the database schema.
tar cf database_latest.tar ${FILE}${TIME}.sql

# Remove the compressed archive of the latest database schema.
# This avoids any file overwrite errors.
rm database_latest.tar.gz

# Compress the latest database schema.
gzip database_latest.tar

# Checks if a compressed archive has been created for the current day.
if [ -f ${FILE}.tar.gz ]; then
    # Uncompress the archive, add the latest archive created then compress again.
    gunzip ${FILE}.tar.gz
    tar rf ${FILE}.tar ${FILE}${TIME}.sql
    gzip ${FILE}.tar
else
    # Create a new archive and compress it.
    tar cf ${FILE}.tar ${FILE}${TIME}.sql
    gzip ${FILE}.tar
fi

# Remove the created database schema.
rm ${FILE}${TIME}.sql

I then downloaded S3Sync which is a script written in Ruby which provides an interface for me to communicate with my Amazon S3 account using the Shell command line on my server. I can then create scripts and schedule them to run as and when I require them.

# Sample syntax to synchronise data with Amazon S3
ruby s3sync.rb – r /SOMEFILES BUCKETNAME:PREFIX

I had a problem getting the s3sync.rb file to view the settings I had saved in s3config.yml. I had done everything correctly but in the end I only got it to work by making the following change.

#confpath = ["#{ENV['S3CONF']}", "#{ENV['HOME']}/.s3conf", "/etc/s3conf"]
confpath = ["./", "#{ENV['S3CONF']}", "#{ENV['HOME']}/.s3conf", "/etc/s3conf"]

I then needed to write a very simple script to synchronise the directories between my server and the Amazon S3 server which I then scheduled to run at 10 minutes past the hour every hour so that potentially the most amount of data I would loose is up to an hours worth which I can live with in worse case situations. The script I used to sync is below. Note that I used –delete to force S3Sync to do a comparison of the directories with the Amazon S3 version and delete files that are not present on my server anymore. I did this so that if I removed the files it was to be assumed that I have backed them up elsewhere so they are not required on the Amazon S3 server as well.

#!/bin/bash

cd ~/blog_backups/s3sync

ruby s3sync.rb --delete -r /path/to/database_schema blog.ianroke.co.uk:d

ruby s3sync.rb --delete -r /path/to/website_files blog.ianroke.co.uk:w

I hope this helps anybody who wants to do this sort of thing on their server. If you do use my examples then please let me know either by email or in the comments below.

One Response to “Using Amazon S3 To Backup My WordPress Blog”

  1. foreveradog Says:

    Very good well informed ty you for the information. From the guys at Bloggles

Leave a Reply

 

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in