Blog post:
PaaS bakeoff: Comparing Stackato, OpenShift, Dotcloud and Heroku for Django hosting and deployment
If you’ve been following this blog, you’ll know that I’m a big fan of PaaS providers – heck, I even built one which gave me even greater respect for all the work that goes into making a platform that is flexible, scalable, reliable and easy to use.
During the last few weeks I’ve been kicking the tires on these PaaS solutions, both publicly hosted ones like Heroku and Dotcloud as well as open source ones like OpenShift and CloudFoundry.
Last night I gave a talk Django deployment revisited at the Django Boston meetup group, and discussed four different PaaS providers: Stackato, Dotcloud, OpenShift and Heroku. As an example, I showed for each provider how to deploy Mezzanine, a Django-based blogging and CMS software.
Here are the slides from the presentation (sorry, no audio):
Show me the code!
All the code used in the examples is available in this paasbakeoff Github repo – with a different branch for each PaaS provider.
One criteria for a PaaS is how many files do I need to add/modify in order to get my Django project deployed. What became apparent as I was giving the talk, is that all of the providers function quite similarly in regards to how you get your Django project working with them. It really boils down to these things:
DATABASES
All of the providers will provision a PostgreSQL or MySQL (except for Heroku) database for you without you needing to do anything except issue one command.
The actual database creation happens automatically except for Dotcloud in which you get to specify the name of the database in your settings.py, and you have complete control about how it’s created in a createdb.py script. You can either see this as an advantage (complete flexibility) or a disadvantage (one more thing to have to manage). It’s the classic tradeoff – control. vs ease-of-use, that is a recurring theme when adopting a PaaS solution.
The way you tell Django to use this provisioned database, is to modify your settings.py file (or use a separate production_settings.py) to override the DATABASES setting. All of the providers expose environment variables that contain the connection string:
Stackato
DATABASE_URL
You can also use VCAP_SERVICES to retain CloudFoundry Core compatibility.
OpenShift
OPENSHIFT_MYSQL_DB_URL OPENSHIFT_POSTGRESQL_DB_URL
DotCloud
DOTCLOUD_DB_SQL_LOGIN DOTCLOUD_DB_SQL_PASSWORD DOTCLOUD_DB_SQL_HOST DOTCLOUD_DB_SQL_PORT
Heroku
DATABASE_URL
Also see the convenient dj-database-url package by Kenneth Reitz for handling the parsing of the DATABASE_URL string with one line of code.
Heroku lets you attach multiple PostgreSQL databases (master/slave, or staging/production) and each database gets it’s own color-coded database URL (i.e. HEROKU_POSTGRESQL_GREEN, HEROKU_POSTGRESQL_RED, etc.) Most Django projects are only going to use 1 database, so Heroku provides a pg:promote command that lets you promote that database to be the canonical DATABASE_URL.
STATIC_ROOT
While it’s possible to have Django serve up static assets (images, CSS, Javascript), it’s advised that all static assets should be served up using an HTTP server like Apache or Nginx for performance reasons. All of the PaaS providers have a built-in way to do this except for Heroku which requires that you serve them up using Amazon S3.
Stackato
Stackato strangely uses uWSGI to serve the static assets. In the stackato.yml file:
processes: web: $STACKATO_UWSGI --static-map /static=$HOME/mywebsite/static
OpenShift
In the settings.py file:
STATIC_ROOT = os.path.join(os.environ.get('OPENSHIFT_REPO_DIR'), 'wsgi', 'static')
In /wsgi/static/.htaccess:
RewriteEngine On RewriteRule ^application/static/(.+)$ /static/$1 [L]
Dotcloud
In settings.py:
STATIC_ROOT = '/home/dotcloud/volatile/static/'
In nginx.conf:
location /static/ { root /home/dotcloud/volatile ; }
Heroku
In settings.py:
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
See complete example of S3FileStorage
MEDIA_ROOT
Similar to the STATIC_ROOT, the files in MEDIA_ROOT need to not only be served up Apache, Nginx or uWSGI, but also need to be persisted across subsequent deploys. By default, files that are uploaded through your Django application will be stored in the application container that is thrown away on every deploy. So we need to tell Django to store these files in a data directory that won’t be discarded.
Stackato
You first need to create a ‘filesystem’ service by adding it to your stackato.yml file:
services: postgresql-mywebsite: postgresql filesystem-mywebsite: filesystem
Then in your settings.py:
MEDIA_ROOT = os.environ['STACKATO_FILESYSTEM']
OpenShift
OpenShift provides a persisted data dir that can be referenced with the environment variable OPENSHIFT_DATA_DIR:
MEDIA_ROOT = os.path.join(os.environ.get('OPENSHIFT_DATA_DIR'), 'media')
You then need to symlink this directory to the static directory that is being served up by Apache (see above in STATIC_ROOT).
In .openshift/action_hooks/build:
#!/bin/bash if [ ! -d $OPENSHIFT_DATA_DIR/media ]; then mkdir $OPENSHIFT_DATA_DIR/media fi ln -sf $OPENSHIFT_DATA_DIR/media $OPENSHIFT_REPO_DIR/wsgi/static/media
Dotcloud
Add the following to your settings.py:
MEDIA_ROOT = '/home/dotcloud/data/media/'
Add another line to your nginx.conf:
location /static/ { root /home/dotcloud/volatile; } location /media/ { root /home/dotcloud/data/media; }
Heroku
In settings.py:
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
See complete example of S3FileStorage
WSGI
The PaaS providers use mod_wsgi, uWSGI or Gunicorn to serve the Django application.
Stackato
Stackato uses uWSGI by default but you can use gunicorn instead if you prefer. You simply place a wsgi.py file that references our Django settings file.
OpenShift
OpenShift uses mod_wsgi and expects to find a file /wsgi/application that looks something like this.
Dotcloud
Like Stackato, Dotcloud expects a wsgi.py file in the root of the project directory.
Heroku
Heroku recommends using gunicorn. Simply add gunicorn to your requirements.txt and INSTALLED_APPS, and create a file called Procfile in the root of your repo, that contains the following:
web: gunicorn hellodjango.wsgi -b 0.0.0.0:$PORT
Requirements
All of the providers expect a requirements.txt file to be in the root of the project directory except for OpenShift which uses the more Pythonic way of defining dependencies in a setup.py file. You can still reference your requirements.txt file using this trick.
Configuration
Stackato (example stackato.yml) and Dotcloud (example dotcloud.yml) both use a YAML file to define configuration information about your app (i.e. what database to create and bind)
OpenShift doesn’t seem to have a configuration file, so you have to add the cartridges (database) with a separate command.
Heroku uses a Procfile but most of the configuration is done using the config and addons commands.
Management commands
When it comes time to provide instructions for what should be done when you do a deploy, each provider has a slightly different way of handling these management commands (syncdb, collectstatic, migrate, etc.).
Stackato uses post-staging hooks in the stackato.yml file.
Dotcloud uses a simple postinstall bash script.
OpenShift uses a deploy bash script in the .openshift/action_hooks directory.
Heroku does most of these things for you automatically and you can disable them by adding a collectstatic_disabled marker to the .heroku directory.
Background processes with Celery
Many advanced Django applications require the use of background job processing using Celery, a distributed task queue. Which PaaS providers support Celery?
Stackato supposedly had Celery support at one time as evidenced by this thread, but the latest commit on the celery-demo app is that it no longer works.
OpenShift supposedly has Celery support according to this thread and this closed bug, but I don’t see any definitive documentation about how to set it up on OpenShift.
Dotcloud has a complete documentation page on how to use Django and Celery on Dotcloud.
Heroku lets you run Celery as just another worker.
So who took the 1st prize trophy home?
All of the PaaS providers are winners in my book, because they’re making our jobs as developers easier! But there are clearly pros/cons for each one:
Stackato
Pros:
- runs anywhere (EC2, VirtualBox, VMWare, HPCloud, etc.)
- recent versions of MySQL and PostgreSQL and support for most other services
- you can apt-get Ubuntu packages
Cons:
- long deploy times due to rebuilding the virtualenv on every deploy
- no hosted offering, so if you want to use it you need to deploy it yourself to EC2 or HP Cloud
OpenShift
Pros:
- Open source and backed by a company (Redhat) known for open source community building
- Zero downtime deploys with Jenkins builds and hot_deploys
Cons:
- Older versions of Python 2.6 and PostgreSQL 8.4
- Bit clunky handling of git repos. Add your app’s source as a remote, rather than adding OpenShift git repo as a remote
- Missing built-in services that other PaaS’ have (Redis, Memcached, RabbitMQ)
Dotcloud
Pros:
- Great documentation
- Very flexible platform – ex: can use a script to tell Dotcloud exactly how to create your database, control with your own nginx.conf
- Built-in support for just about any service you’d want
- Zero-downtime and hot deploys with Granite
Cons:
- Flexibility adds some complexity
Heroku
Pros:
- Good documentation (including e-book Heroku Hacker’s Guide)
- Large community of developers using Heroku (more likely you’ll be able to get your question answered)
- Large ecosystem of 3rd party add-ons
- Easiest deployment – Heroku auto-detects Django app and sets most things up automagically (syncdb, collectstatic, etc.)
Cons:
- No serving of static assets, so you have to use Amazon S3
- No persisted storage, so you have to use something like Amazon S3 for storing uploaded media
- No built-in MySQL database service (have to use Amazon RDS)
- Expensive once you exceed the resources provided in the free tier and need a production database (starts at $50/mo)
Feature comparison matrix
This is by no means an exhaustive list, but just the things I could think of off the top of my head. If you have suggestions for other things to be included, let me know in the comments below.
Stackato |
OpenShift |
Dotcloud |
Heroku |
|
Python |
2.7, 3.2 |
2.6 (2.7) |
2.6.5, 2.7.2, 3.1.2, 3.2.2 |
2.7.2 |
PostgreSQL |
9.1 |
8.4 |
9.0 |
9.1.6 |
MySQL |
5.5 |
5.1 |
5.1 |
(Yes, via RDS) |
Persisted FS |
Yes |
Yes |
Yes |
(Yes, via S3) |
Redis |
Yes, 2.4 |
No |
Yes, 2.4.11 |
(Yes, via addon) |
MongoDB |
Yes, 2.0 |
Yes, 2.2 |
Yes, 2.2.1 |
(Yes, via addon) |
Memcached |
Yes, 1.4 |
No |
Yes |
(Yes, via addon) |
RabbitMQ |
Yes, 2.4 |
No |
Yes, 2.8.5 |
(Yes, via addon) |
Solr |
No |
No |
Yes, 3.4.0 |
(Yes, via Websolr) |
Cron |
Yes |
Yes |
Yes |
Yes |
Extensible |
Yes, apt-get install |
Yes, DIY cartridge |
Yes, custom service |
Yes, buildpacks |
WebSockets |
Yes |
Yes |
Yes |
Yes, via Pusher add-on |
Hot deploys |
No |
Yes, w/ hot_deploy |
Yes, with Granite |
Yes, with preboot |
If it ain’t broke, don’t fix it
There were a lot of questions at the end about reliability, portability, extensibility which I think sums up the reasons that people are still not jumping on these PaaS platforms. When you’ve got something that works (Fabric file that pushes to AWS), why change it?
Several people contacted me afterwards and said that after my talk, they are now reconsidering their opinion of PaaS providers and might dump the Linode, Rackspace, AWS servers that they’re babysitting in favor of a PaaS deployment solution.
The Future of PaaS
PaaS is still in its infancy and it will be interesting to see over the next few years what happens in the developer ecosystem as these platforms mature. There will no doubt be more consolidation, and hopefully some standardization around common formats.
Imagine being able to define a generic deploy.yml file in your code repo that is consumed by each PaaS provider and translated into their specific way of doing things.
At the last DjangoCon 2012 sprint, we started working on a project called django-deployer, to attempt to make a PaaS-agnostic deployment tool for Django. We added support for Stackato and Dotcloud and then the sprint was over, and I haven’t had time to revisit it. But if anyone is interested in working on this, let me know!
What’s next
I only had time in this presentation to cover four PaaS providers, but there are others that have Python/Django support including Amazon Elastic Beanstalk, Google App Engine, CloudFoundry, AppFog and even Microsoft Azure!
What would you like the next blog post to be? Leave a comment below to express your preference!
- Additional PaaS providers compared like we already did with these four
- Pricing comparison showing for an average Django application what the costs are on each provider
- Deployment time durations – statistics about deployment times (how long is the first push, subsequent deploys)
- Scaling your app on a PaaS
- something else?