CKAN development & deployment using Docker, Fig & Vagrant
When I discovered CKAN over a year ago it was version 1.7 or 8 and the implementation I studied was using Elastic Search for indexing… I was really confused by the complexity of the setup and it took me a few attempts to get my “play box” right.
By the time I started working on a project based on CKAN 2.2. the documentation had improved tremendously, and projects such as Data.gov.uk To Go as well as CKAN Packaging Scripts have really improved the way you can package & deploy the many components of a typical CKAN portal.
A few months ago I gave a short talk on CKAN at a JBug Scotland event and I was really amazed by Ian Lawson presentation on OpenShift. A few months after that I discovered Docker. And also found out that OpenShift was going to support Docker
I thought this was a really good news, because that meant that if I “containerised” my CKAN install I could use the same containers in every environments I’m working on, Dev, Test, Staging, Prod, Cloud!… and I wasn’t the only one thinking that way, in May Nick Stenning make a great Pull Request with the first containers for CKAN
There were a few issues though, such as the absence of datastore, inability to setup the ckanext-spatial because PostGIS was not installed, editing the config was complex and not very flexible, and the three containers were using different bases which meant that you were pulling three bases images instead of caching one.
Standing on the shoulders of giants
I picked up from there as re-factored the containers one by one, starting with Postgres, then Solr & CKAN. When that was done I created another Dockerfile that extends the main CKAN Dockerfile to allow custom configurations based on the core project.
- All containers extend the same base image
phusion/baseimage(updated to 0.9.13), which means you only pull & cache them once, the first few steps are also identical to rely on Docker cache as much as possible
- The Postgres container installs PostGIS, configures the database, the datastore & PostGIS on the CKAN database. The default names & passwords can easily be overridden with environment variables.
- The Solr container has been updated to 4.10.1
- The CKAN Core container has been updated to configure the datapusher, has all the dependencies required to use the spatial extension & also supervisor to manage tasks.
- The custom config shows how to extend the Core container to enable common extensions such as ckanext-viewhelpers, ckanext-archiver, ckanext-spatial, ckanext-harvest, and how you can extract services such as redis from the CKAN container and let that service be handled by a separate container.
Building containers is easy, caching is powerful. But you need to cheat sometimes, especially with the
ADD command. In the Solr container for instance, I quickly realised that the following command:
is not cached, whereas
And since Solr tar is over a 100Mb, so installing wget & cheating is really worth it! In some cases like that
RUN is more appropriate than
ADD, but it really depends on the use case.
Managing containers can be tedious, especially when you’re developing them. There are a lot of tools to help. I’ve not tried Shipyard yet but I will soon. In the meantime docker-cleanup is pretty useful, and the usual
docker stop $(docker ps -aq) &
docker rm $(docker ps -aq) work great to clean-up any running containers
But when I’m working with a custom Docker container I have to type (or copy & paste) 4 commands to build them, 4 commands to run them… and just as many to stop the containers
This is a bit tedious, and that’s why I looked at Fig
Fig allows you to define all the above in a single YAML file to do the following:
- start, stop and rebuild services
- view the status of running services
- tail running services’ log output
- run a one-off command on a service
so the 8+ commands above are reduced to 1:
fig up thanks to the definition below:
And fig can simplify the rest of the docker commands you want to run, to view logs etc.
Now you may wonder why do you need/want Vagrant? The whole point about Docker is that containers are not VMs, and Fig has reduced the complexity of managing containers, why would you want to bring virtualisation back in the picture?
Well the answer is simple: portability. I have a personal Mac, a work PC, and Linux servers… Docker will work on all those operating systems; natively on Linux and through proxy a VM on OS X & Windows: Boot2docker. I love this project, it’s fast, lightweight & simple to use, but it doesn’t support volumes & shared folders on Windows yet (Boot2docker 1.3 offers partial support on Mac OS X), and it’s not really representing your production host.
That’s why I think Vagrant is useful, and I was really excited to see support for Docker added in Vagrant 1.6
My goal was you make sure than any development environment would represent production and behave exactly the same. This also helps portability of the environment, since a simple command:
vagrant up --provider=docker --no-parallel will create Linux hosts running Docker if required (OSX & Windows), build & run boot all the containers in order & mount the source directory on your machine as a volume inside the container.
The development Dockerfile is slightly different & designed to be lightweight, Apache & Nginx are not installed.
paster serve does just what you want on a dev box.
vagrant ssh also works a treat with
Phusion baseimage and you can ssh directly into the container.
That was a great personal journey into containerisation & virtualisation to build consistent & portable development environments. There’s still to be done on the core Dockerfile to extract Nginx from the main container & link the official Ngnix container instead. The Example Vagrant file is really just a template to show what’s possible but at the moment it only maps the CKAN source directory, so you would have to add new synced folders to build your custom extensions. It’s just one step further, and hopefully it’s just a start.
Check this out on my Github repo
some really good reading
- Vagrant with Docker: How to set up Postgres, Elasticsearch and Redis on Mac OS X – maori.geek
- Vagrant 1.6 Feature Preview: Docker-Based Development Environments – Vagrant
- Building a Development Environment with Docker – Terse Systems
- A Rails Development Environment with Docker and Vagrant
- VirtualBox guest-specific operations error · Issue #81 · tmatilai/vagrant-proxyconf
- Setting up a development environment using Docker and Vagrant – Zenika
- vagrant-cachier :: viewdocs.io
- Docker in OSX via boot2docker or Vagrant: getting over the hump
- Rails Development Using Docker and Vagrant – Abe Voelker
- Vagrant Synced Folders Permissions – jeremykendall.net
- Get Started with Docker Containers in RHEL 7 – Red Hat Customer Portal
- Docker – OpenStack
- Docker Images / Demo CKAN
- Allow customised CKAN Docker images (fixes #1904) by cygri · Pull Request #1929 · ckan/ckan · GitHub
- Quickly SSH into a Docker container
- How to Use Docker on OS X: The Missing Guide
- Use Docker to Build a LEMP Stack (Buildfile)