These are the slides from my talk last Saturday at the
ACCU2011 Conference in
Oxford.
I presented a 90-minute whirlwind tour of non-relational database families (trying
my best to avoid saying ‘NoSQL’), and went into a little more detail for one in
each of the column, graph, document and key-value families.
These slides were chosen by Slideshare to be showcased on the front page of the
Technology category
This blog used to be hosted on the
Jekyll-powered
Github Pages, and I was using the Custom Domains
feature to have it appear under with the
DataMangling domain name. I’ve switched to
hosting it with Heroku so that I can save myself
$84 per year.
Github only let you use Custom Domains if you’ve got a paid account. I
had a micro (7$/month) plan, initially because I wanted to keep a
couple of my repositories private, but I don’t need to do that
anymore. I’m on a bit of an economy drive at the moment, so the
monthly payment to Github has been a casualty of my belt-tightening.
The only tangible consequence is I can no longer alias my own domain
name to the blog. This is a bit annoying after I’d gone to the trouble
of finding a dotcom domain that had the word data in it. The solution
is to host the blog on Heroku, as the free tier
of service is more than sufficient, and I can use my own domain name.
The only question was how to get Heroku to serve the Jekyll blog
properly?
Enter Rack-Jekyll. By
depending on the rack-jekyll gem and adding a config.ru for Rack
awareness the transition was smooth and painless. The only steps
involved were:
The 4th Hadoop Users Group UK meetup took place
last night at the Skillsmatter Exchange.
Aaron Kimball of
Cloudera talked about
Sqoop, and Tim
Sell of Last.fm talked about how they use Hive. Ben
“Shevek” Mankin of Karmasphere gave a Lightning talk introducing
Karmasphere Studio which looks
great, and then I gave a quick overview of
Cascalog. I think
the combination of two unfamiliar technologies (Clojure and Datalog)
was probably a bit much for most people, but it was good to be able to
talk about something new.
At the
ACCU2010 Conference
in April I presented a short session introducing MapReduce and Apache
Hadoop, into which I crammed HDFS, MapReduce, Hadoop Streaming, Pig
and Hive. Slides are available on slideshare:
Clojure has been described as “A better Java than Java”. I’m not a
Java programmer, but having access to Java libraries in Clojure is
very useful, and Clojure has made the interop remarkably painless so
far.
The meeting kicked off with introductions, and then
Enrique called Doug Bradbury and
Micah Martin of 8th Light on Skype, and they
talked to us about the history of the Software Craftsmanship
movement.
After lightning talks, we moved on to a randori-style
coding dojo in which the
task was to write an algorithm to determine how many
Lychrel numbers there are in the
starting range 1-10000.
We used Ruby, which was known by the majority (but not all) of
the attendees, and after a few false moves a recursive algorithm took
shape and an answer was found. The program could have used a bit of
refactoring, but it satisfied the task.
I’m currently very enthusiastic about Clojure,
partly because I’ve been meaning to learn a functional language for
ages and this looks like a good one, and partly because it runs on the
JVM and has some interestinglibraries that I want to use with
Hadoop for processing and analysing big data sets at
Journey Dynamics.
The next day I knocked up a
Clojure solution
to the problem we’d seen in the dojo. It was concise, tested, and I
was fairly pleased with it. I tweeted about it, and was about to
forget about it until @t_crayford
said that he could do better. Before long he’d posted an improved
version of the main function:
At first I was baffled, but that was mostly down to trying to read it
on my iPhone after a couple of glasses of wine. With a clear head and
a big screen the next day it became obvious how he had replaced my
naive recursive algorithm with a much more idiomatic lazy sequence
version that has better performance.
He defines a function that calculates the next number in the
sequence, and creates a (infinite) lazy sequence of them using
iterate. It takes 50 numbers from this sequence using rest and
take and checks to see if any of them are palindromic using some.
I was initially confused by the ->> macro, but it is explained
here.
It’s going to take me a while to think naturally in Clojure idioms,
but I think it will be worth it. Paul Graham argues that
the truly serious hacker should consider learning Lisp,
and I think he is
absolutely right. The advantage of being able to think about solutions
to problems in a different way from the dominant procedural and OO mindset
can be really valuable, and I agree that even if you don’t
subsequently use Lisp, having learned it will make you a better
programmer.