Wednesday, October 8, 2008

The Database Interview Question No One Gets

The Question

Every now and then I end up interviewing potential employees for database related positions. To me, this includes ETL developers, DBA's, and web developers. I lump web developers in there because even though a deep knowledge of ASP.NET/JSPs/PHP/Rails is important for web work, it's likely that almost all of your application state is going to be persisted in some database. As a result, knowing how those database things work isn't just for DBAs anymore. Anyway's I digress, back to the question.

How do I know if a table is in 3NF?


This question really speaks to what 3NF is and how to apply it to a table. Before we get to the answer, let me spend a sentence or two talking about why I think the above roles should know the answer.

When we talk about creating a schema we often talk about normalization. When I ask people why we normalize I sometimes get pretty foggy answers. People will say things like "to add clarity to the data model...", which is usually prolog for me asking them to be more specific. Enter the question above which quickly seems to separate the wheat from the chaff.

I think it's fair to assume that if you declare that you know what normalization is...then you should be able to identify one of the most common normal forms and its rules.

Good developers IT Professionals are exceptional learners, they rarely stop with the short answer. If you've built more than a couple schemas you should probably know what 3NF is and why you would (and would not) normalize to 3NF.

As you become proficient in a topic it's inevitable that you begin to learn some surrounding knowledge about the subject. Would you ever hire a C developer who claimed to be an expert, but had never heard of Dennis Ritchie. How about a PERL developer who had never heard of regular expressions? People who are experts with a tool know exactly what it's good for, and more importantly what it's bad at.


So how about it, is the following table in 3NF (answers at bottom)?

Monarch Name Monarch Number Royal House
Edward II Devonshire
Edward III Plantagenet
Richard V Plantagenet
Henry IV Lancaster

How about this one?

Name Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

To figure out if a table is in 3NF lets have to look at an abridged version of the definition. 3NF is all about identifying functional dependencies a table. Another way to say this is; for all rows in this table, this column implies the value of some other column.

Consider the following employee table.

Employee Name Pay Grade Store No. Pay
Chuck Taylor 5 66 $7.90
Daryl Weep 6 75 $8.50
Josh Wellington 5 33 $7.90
Horris Lowbrow 7 11 $9.25

Notice how whenever the Pay Grade is 5, the Pay is $7.90. If this behavior repeats for all rows (and values) in the table then it is said to be a functional dependency. For the above table Pay Grade implies Pay.

For a table to be in 3NF it can't have any functional dependencies involving columns that aren't part of the primary key or are a superkey (a super key is a potential key for the table).

The following table has many functional dependencies (StudentId->All Columns, SSN->All Columns) but each is either part of the primary key or part of a superkey. As a result it's still in 3NF.

StudentID SSN Name Faculty
1 544718392 Jerry Engineering
3 651273928 75 Law
9 341841892 33 Medicine
15 483938928 11 Engineering

I guess in a nutshell I'm looking for an interviewee to finally say:

[Tyler]: How do I tell if a table is in 3NF?

[Great Dev]: Well first we look for functional dependencies between table columns. If we find any column that implies another, isn't part of the primary key, and isn't a superkey...then the table isn't in 3NF.

If we wanted to make the table 3NF we'd move the functional dependency out to another table.

Perfect answer! You can tell this candidate is at least well versed when it comes to normalization and 3NF.

Answer Key

If you're mildly curious the first table IS in 3NF. There are no functional dependencies to be found.

The 2nd table is NOT in 3NF since:

  • The Winner column implies the Winner Date of Birth column.
  • The Winner column is not part of the primary key.
  • The Winner column is not part of a superkey.

If we wanted to rectify this we'd break it up into two tables.

Tournament Year Winner
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson

Player Date of Birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968

It's important to know that there's a tonne of times where you shouldn't normalize, but that's for another. It's also good to know that there are very common breaches of 3NF that are widely accepted. Consider a User table with an Address fields. Street Address always implies Zip Code, but address information is seldom broken up. That being said, knowing is half the battle.


Maybe it's not a fair question (I'm sure I'll hear feedback either way), but I often get frustrated with Senior Engineers that have yet to get deep in their tools or topics. If you heard a term repeatedly that you didn't understand, how long would it take you to go explore it? I'm also not saying that people should run around with academic definitions memorized in their head. All I'm getting at is you can't ever stop learning, and getting deep in the tools that immediately surround your day job is really just going after the low hanging fruit. Either way I hope this post made 3NF a little more simple for someone out there.



Anonymous said...

your entries in the table don't reflect what you say - the pay for pay grade 5 differ....

Tyler Holmes said...

Thanks for the heads up. It was a little late when I posted this last night :-|. I've made the edit.