Amazon EC2: 1st Impressions Mounting S3
We’ve got a small internal project at Cantina that aims to make use of the Groovy on Grails framework and some of our Grails plugins, and we plan on using S3 as the persistent storage for the project. We decided to test out a small EC2 instance at Amazon to use as an integration point for the project for a couple reasons:
- EC2 instances are very quick to setup
- They are not charged for data transfer to S3
- They are presumably the closest you can be to your S3 storage in terms of network latency
Setting up our initial EC2 instance was incredibly easy. There’s a wealth of pre-built Amazon Machine Images, or AMIs, out there with various configurations for different application servers, including Apache, MySQL, JBoss, Tomcat, and, our new favourite, Red5. I was able to get one up and running fairly quickly with the packages I needed (I love yum).
Actually, I was floored by how fast I had a brand new server up and running, considering I’ve had requests submitted to fairly large enterprise grade hosting companies for such intensely complicated things as, say opening a new port in the firewall, take over a week. I guess the landscape is changing, but I digress…
In order to use Amazon S3 as the backing store for our new EC2 instance, we seemed to have a few options:
- Code our application to manage the transfer and synchronization of files to S3, perhaps via our Amazon S3 Grails Plugin
- Utilize an S3-aware file synchronization tool such as the jets3t Synchronize tool, or the Ruby s3sync tool
- Mount S3 as a filesystem in the EC2 instance
Since we’ve already been doing #1, and #2 isn’t exactly real-time, we decided to give #3 a go. To do so, we enlisted the help of a couple of tools:
- JungleDisk: A multi-platform tool that provides a local WebDAV interface to S3, suitable for mounting as a filesystem from the Mac OS X Finder, or from Linux using…
- davfs2: Linux filesystem driver that will mount a WebDAV URL to a mount point in the local filesystem
JungleDisk is a great tool in general for interacting with an S3 account, and I’ve been using it for a little while now for my own backup purposes on my Mac development laptop. The Linux version provides a standalone command line program (in addition to the GUI that comes on all platforms) which can be run as a daemon and scripted to startup on boot.
The setup was surprisingly simple. To get JungleDisk running from the command line client, you need to provide a configuration file, commonly called jungledisk-settings.ini. The documentation says that you should run the GUI first to generate the file before running the command line version, but I was able to copy over the file from my Mac laptop and update the values for the EC2 instance. Here’s an example of the configuration file:
LoginUsername=
LoginPassword=PROTECTED:
AccessKeyID=XXXXXXXXXXXXXXXXXXXXX
SecretKey=PROTECTED:XXXXXXXXXXXXXXXXXXXXX
Bucket=default
CacheDirectory=/var/cache/jungledisk
ListenPort=2667
CacheCheckInterval=120
AsyncOperations=1
Encrypt=0
ProxyServer=
EncryptionKey=PROTECTED:
DecryptionKeys=PROTECTED:
MaxCacheSize=1000
MapDrive=
UseSSL=0
RetryCount=3
FastCopy=1
WebAccess=0
LogDuration=30
ArchiveFlag=5
ArchiveDuration=60
PasswordPrompt=0
I setup jungledisk to startup on boot using a really handy /etc/init.d script from the JungleDisk forums found here.
Once JungleDisk was configured and running, I had a local WebDAV server running at http://localhost:2667. This could be used in the KDE to browse the filesystem, but since I’d like my web application to be able to access it via the filesystem, the next step was to get davfs2 running.
The EC2 instance did not have davfs2 installed by default (at least not the AMI I chose, which was based on the default AMI), so I simply downloaded the source distribution and compiled locally on the EC2 instance. The base AMI I was using did not have gcc or the neon development libraries (Neon is a WebDAV library that davfs2 uses for communicating with WebDAV servers). Luckily yum was installed on the EC2 instance so getting these dependencies was pretty straightforward.
davfs2 can be configured to run by non-root users, or by root. The configuration of davfs2 depends on your choice here as some configuration options are intended only for system wide davfs2 configuration started from root. I opted to take this approach, setup my /usr/local/etc/davfs2.conf file with the following configuration options:
dav_user mydavuser
dav_group mydavgroup
kernel_fs fuse
ask_auth 0
Since JungleDisk provides WebDAV access to localhost only, and does not require authentication, setting ask_auth to 0 is useful to prevent davfs2 from prompting for a password when mounting the WebDAV URL. Last but not least, I added an entry to /etc/fstab to configure the mount:
http://localhost:2667 /mountpoint davfs nolocks,noaskauth,rw
Voila! Now my S3 instance is mounted to the local Linux filesystem on my EC2 instance. I have not done any performance testing or cache tuning, but this article over at Right Scale looks promising. Once we have our application and up and running, I’ll post back with some more information on how this configuration is working.

So, how did this work out for you? I’m interesting in pursuing something similar.