Every now and then I end up interviewing potential employees for database related positions. To me, this includes ETL developers, DBA's, and web developers. I lump web developers in there because even though a deep knowledge of ASP.NET/JSPs/PHP/Rails is important for web work, it's likely that almost all of your application state is going to be persisted in some database. As a result, knowing how those database things work isn't just for DBAs anymore. Anyway's I digress, back to the question.
How do I know if a table is in 3NF?
This question really speaks to what 3NF is and how to apply it to a table. Before we get to the answer, let me spend a sentence or two talking about why I think the above roles should know the answer.
When we talk about creating a schema we often talk about normalization. When I ask people why we normalize I sometimes get pretty foggy answers. People will say things like "to add clarity to the data model...", which is usually prolog for me asking them to be more specific. Enter the question above which quickly seems to separate the wheat from the chaff.
I think it's fair to assume that if you declare that you know what normalization is...then you should be able to identify one of the most common normal forms and its rules.
developers IT Professionals are exceptional learners, they rarely stop with the short answer. If you've built more than a couple schemas you should probably know what 3NF is and why you would (and would not) normalize to 3NF.
As you become proficient in a topic it's inevitable that you begin to learn some surrounding knowledge about the subject. Would you ever hire a C developer who claimed to be an expert, but had never heard of Dennis Ritchie. How about a PERL developer who had never heard of regular expressions? People who are experts with a tool know exactly what it's good for, and more importantly what it's bad at.
So how about it, is the following table in 3NF (answers at bottom)?
|Monarch Name||Monarch Number||Royal House|
How about this one?
|Name||Year||Winner||Winner Date of Birth|
|Indiana Invitational||1998||Al Fredrickson||21 July 1975|
|Cleveland Open||1999||Bob Albertson||28 September 1968|
|Des Moines Masters||1999||Al Fredrickson||21 July 1975|
|Indiana Invitational||1999||Chip Masterson||14 March 1977|
To figure out if a table is in 3NF lets have to look at an abridged version of the definition. 3NF is all about identifying functional dependencies a table. Another way to say this is; for all rows in this table, this column implies the value of some other column.
Consider the following employee table.
|Employee Name||Pay Grade||Store No.||Pay|
Notice how whenever the Pay Grade is 5, the Pay is $7.90. If this behavior repeats for all rows (and values) in the table then it is said to be a functional dependency. For the above table Pay Grade implies Pay.
For a table to be in 3NF it can't have any functional dependencies involving columns that aren't part of the primary key or are a superkey (a super key is a potential key for the table).
The following table has many functional dependencies (StudentId->All Columns, SSN->All Columns) but each is either part of the primary key or part of a superkey. As a result it's still in 3NF.
I guess in a nutshell I'm looking for an interviewee to finally say:
[Tyler]: How do I tell if a table is in 3NF?
[Great Dev]: Well first we look for functional dependencies between table columns. If we find any column that implies another, isn't part of the primary key, and isn't a superkey...then the table isn't in 3NF.
If we wanted to make the table 3NF we'd move the functional dependency out to another table.
Perfect answer! You can tell this candidate is at least well versed when it comes to normalization and 3NF.
If you're mildly curious the first table IS in 3NF. There are no functional dependencies to be found.
The 2nd table is NOT in 3NF since:
- The Winner column implies the Winner Date of Birth column.
- The Winner column is not part of the primary key.
- The Winner column is not part of a superkey.
If we wanted to rectify this we'd break it up into two tables.
|Indiana Invitational||1998||Al Fredrickson|
|Cleveland Open||1999||Bob Albertson|
|Des Moines Masters||1999||Al Fredrickson|
|Indiana Invitational||1999||Chip Masterson|
|Player||Date of Birth|
|Chip Masterson||14 March 1977|
|Al Fredrickson||21 July 1975|
|Bob Albertson||28 September 1968|
It's important to know that there's a tonne of times where you shouldn't normalize, but that's for another. It's also good to know that there are very common breaches of 3NF that are widely accepted. Consider a User table with an Address fields. Street Address always implies Zip Code, but address information is seldom broken up. That being said, knowing is half the battle.
Maybe it's not a fair question (I'm sure I'll hear feedback either way), but I often get frustrated with Senior Engineers that have yet to get deep in their tools or topics. If you heard a term repeatedly that you didn't understand, how long would it take you to go explore it? I'm also not saying that people should run around with academic definitions memorized in their head. All I'm getting at is you can't ever stop learning, and getting deep in the tools that immediately surround your day job is really just going after the low hanging fruit. Either way I hope this post made 3NF a little more simple for someone out there.