Data Mangling

Gavin Heavyside's infrequent tech blog

ACCU 2011 - Non-Relational Databases

These are the slides from my talk last Saturday at the ACCU2011 Conference in Oxford.

I presented a 90-minute whirlwind tour of non-relational database families (trying my best to avoid saying ‘NoSQL’), and went into a little more detail for one in each of the column, graph, document and key-value families.

These slides were chosen by Slideshare to be showcased on the front page of the Technology category

Running a Jekyll Blog on Heroku

This blog used to be hosted on the Jekyll-powered Github Pages, and I was using the Custom Domains feature to have it appear under with the DataMangling domain name. I’ve switched to
hosting it with Heroku so that I can save myself $84 per year.

Github only let you use Custom Domains if you’ve got a paid account. I had a micro (7$/month) plan, initially because I wanted to keep a couple of my repositories private, but I don’t need to do that anymore. I’m on a bit of an economy drive at the moment, so the monthly payment to Github has been a casualty of my belt-tightening.

The only tangible consequence is I can no longer alias my own domain name to the blog. This is a bit annoying after I’d gone to the trouble of finding a dotcom domain that had the word data in it. The solution is to host the blog on Heroku, as the free tier of service is more than sufficient, and I can use my own domain name.

The only question was how to get Heroku to serve the Jekyll blog properly?

Enter Rack-Jekyll. By depending on the rack-jekyll gem and adding a config.ru for Rack awareness the transition was smooth and painless. The only steps involved were:

  • Add rack-jekyll to the .gemfile
  • Add a config.ru as described in the rack-jekyll page
  • Make sure the _site directory is checked into git
  • Create a new heroku app and add my custom domain name
  • Update DNS to point to Heroku
  • Push to Heroku

Cascalog Lightning Talk at HUGUK

The 4th Hadoop Users Group UK meetup took place last night at the Skillsmatter Exchange. Aaron Kimball of Cloudera talked about Sqoop, and Tim Sell of Last.fm talked about how they use Hive. Ben “Shevek” Mankin of Karmasphere gave a Lightning talk introducing Karmasphere Studio which looks great, and then I gave a quick overview of Cascalog. I think the combination of two unfamiliar technologies (Clojure and Datalog) was probably a bit much for most people, but it was good to be able to talk about something new.

GeoTools Quickstart Example in Clojure

As part of my Clojure experiments I’m taking a look at GeoTools and the Java Topologya Suite for working with geospatial data.

Clojure has been described as “A better Java than Java”. I’m not a Java programmer, but having access to Java libraries in Clojure is very useful, and Clojure has made the interop remarkably painless so far.

The quickstart example on the GeoTools documentation pages starts with a horrid Maven incantation which I converted to a leiningen project file. The source itself once converted to Clojure looks like this:

And the leiningen project.clj:

Running either of the functions pops up a dialog box for you to choose a shapefile, which is then loaded and displayed.

The clojure version is shorter than the Java version, can be tinkered with at the REPL, and doesn’t use any Maven.

It’s Good to Be Lazy

On 14th May 2010 the inaugural meeting of the Software Craftsmanship UK user group was held at the offices of Eden Development in Winchester.

The meeting kicked off with introductions, and then Enrique called Doug Bradbury and Micah Martin of 8th Light on Skype, and they talked to us about the history of the Software Craftsmanship movement.

After lightning talks, we moved on to a randori-style coding dojo in which the task was to write an algorithm to determine how many Lychrel numbers there are in the starting range 1-10000.

We used Ruby, which was known by the majority (but not all) of the attendees, and after a few false moves a recursive algorithm took shape and an answer was found. The program could have used a bit of refactoring, but it satisfied the task.

I’m currently very enthusiastic about Clojure, partly because I’ve been meaning to learn a functional language for ages and this looks like a good one, and partly because it runs on the JVM and has some interesting libraries that I want to use with Hadoop for processing and analysing big data sets at Journey Dynamics.

The next day I knocked up a Clojure solution to the problem we’d seen in the dojo. It was concise, tested, and I was fairly pleased with it. I tweeted about it, and was about to forget about it until @t_crayford said that he could do better. Before long he’d posted an improved version of the main function:

At first I was baffled, but that was mostly down to trying to read it on my iPhone after a couple of glasses of wine. With a clear head and a big screen the next day it became obvious how he had replaced my naive recursive algorithm with a much more idiomatic lazy sequence version that has better performance.

He defines a function that calculates the next number in the sequence, and creates a (infinite) lazy sequence of them using iterate. It takes 50 numbers from this sequence using rest and take and checks to see if any of them are palindromic using some. I was initially confused by the ->> macro, but it is explained here.

It’s going to take me a while to think naturally in Clojure idioms, but I think it will be worth it. Paul Graham argues that the truly serious hacker should consider learning Lisp, and I think he is absolutely right. The advantage of being able to think about solutions to problems in a different way from the dominant procedural and OO mindset can be really valuable, and I agree that even if you don’t subsequently use Lisp, having learned it will make you a better programmer.