Distilling data
Quite a few years back I explored opening a moonshine distillery in St Andrews. I’m not writing about about that today - although I do want to go on record saying that moonshine is the most iconic drink that no-one has ever tasted - and thus a massive opportunity.
The most important thing in early-stage development work, which is taking an idea from a prototype, is to work out what information you are dealing with and to structure a database around that.
Every website is really just an interface to a database. Facebook is a database of your friends, photos and messages. Google is a database of web pages. Working out the structure of this database and the relationships between things is something that has to be done very early.
Facebook is a great example of this. When Facebook started in 2005 or so, it was almost defined by a single database feature. You were friends with people and they were friends with you - but you couldn’t be friends with someone and them not be friends with you. Believe it or not, this was actually quite unique at the time.
Your friendships form what is known as a has and belongs to many
relationship. You have many
friends, and you also belong to
many friends (ie. are a friend of them). To represent this you would make a list of People and a list of Friendships.
People
Bart |
Ralph |
Nelson |
Lisa |
Friendships
Person 1 | Person 2 |
---|---|
Bart | Ralph |
Lisa | Bart |
Ralph | Lisa |
In a real database we’d give every person an ID number because: 1. Numbers are much faster to deal with than text. 2. ID numbers don’t change - if Bart changed his name to Fartman then we’d need to update his name everywhere in our friendships list.
It’s important to note that to find who Lisa is friends with, we have to look at both columns.
I’ve gone off track into basic database theory here, so let’s get back to what I was writing about!
When I look at developing a system, I have to look at the business and basically work out how to fit the whole business into a spreadsheet like Excel.
To take an example, let’s say we’re building a database for a kindergarten. We’ll have sheets (table in database language) called Children, Classes, Teachers, Parents.
Like real life, the tricky bit comes in relationships.
Can a Child be in more than one class? Does a class have more than one teacher? Do teachers teach more than one class?
It’s pretty interesting stuff when you dive into it, but to really be successful you need to be something of an insider on the business or industry you are approaching. It’s just not possible to nail down a database structure as an outsider and I think this is why so many successful startups come from people making something to solve a problem they’ve encountered in their own industry.
Here’s a list of tables from Boxmove, a logistics company I co-founded.
It’s constantly changing but that structure represents basically our entire operational business. Other non-core elements of the business like payroll and finance are managed in other databases that I’m thankful I don’t have to maintain.