What is Normalization?
Normalzation is a database design technique that reduces the data redundancy and dependency.
Let's start to see how we do normalization in sample data of football player information.
We will see how we procced from Un-normalized Form to First, Second, Third Normal Form step-by step.
Initially, our data is in un-normalized form. It contains composite or multiple values in an attribute (Clubs).
1st Normal Form
Each column should contain a single value and each record have to be unique.
2nd Normal Form
Now, we are in 1NF. To make it 2NF, we need to define primary key.
3rd Normal Form
There are data anomalies and inconsistencies in current design, for example:
+ If we deleted the "Ronaldo" and "Nani", we would inevitably delete the agent "Mandez" completely from the database.
+ We cannot add a new agent to the database unless we also add a player.
+ If "Mandez" retired from agent job, we would have to change it in all records in which he appears.
So, we will remove this transitive functional dependency in 3NF. Agents will be moved to another table and its id will be used as foreign key in Player table.
Now, we have "Players", "Agents", "Players_Clubs" tables.
We can optimized by moving Clubs into "Clubs" table and linking its id as foreign key in "Players_Clubs" table.
Normalzation is a database design technique that reduces the data redundancy and dependency.
Let's start to see how we do normalization in sample data of football player information.
We will see how we procced from Un-normalized Form to First, Second, Third Normal Form step-by step.
Initially, our data is in un-normalized form. It contains composite or multiple values in an attribute (Clubs).
Player Name | Clubs | Agent | Position |
---|---|---|---|
Ronaldo | Juventus, Real Madrid, Manchester United, Sporting Lisbon | Mandez | Forward |
Rooney | DC United, Manchester United, Everton | David | Forward |
Nani | Manchester United, Sporting Lisbon | Mandez | Winger |
1st Normal Form
Each column should contain a single value and each record have to be unique.
Player Name | Club | Agent | Position |
---|---|---|---|
Ronaldo | Juventus | Mandez | Forward |
Ronaldo | Real Madrid | Mandez | Forward |
Ronaldo | Manchester United | Mandez | Forward |
Ronaldo | Sporting Lisbon | Mandez | Forward |
Rooney | DC United | David | Forward |
Rooney | Manchester United | David | Forward |
Rooney | Everton | David | Forward |
Nani | Manchester United | Mandez | Winger |
Nani | Sporting Lisbon | Mandez | Winger |
2nd Normal Form
Now, we are in 1NF. To make it 2NF, we need to define primary key.
Player Id | Player Name | Agent | Position |
---|---|---|---|
1 | Ronaldo | Mandez | Forward |
2 | Rooney | David | Forward |
3 | Nani | Mandez | Winger |
Player Id | Club |
---|---|
1 | Juventus |
1 | Real Madrid |
1 | Manchester United |
1 | Sporting Lisbon |
2 | DC United |
2 | Manchester United |
2 | Everton |
3 | Manchester United |
3 | Sporting Lisbon |
3rd Normal Form
There are data anomalies and inconsistencies in current design, for example:
+ If we deleted the "Ronaldo" and "Nani", we would inevitably delete the agent "Mandez" completely from the database.
+ We cannot add a new agent to the database unless we also add a player.
+ If "Mandez" retired from agent job, we would have to change it in all records in which he appears.
So, we will remove this transitive functional dependency in 3NF. Agents will be moved to another table and its id will be used as foreign key in Player table.
Player Id | Player Name | Agent Id | Position |
---|---|---|---|
1 | Ronaldo | 1 | Forward |
2 | Rooney | 2 | Forward |
3 | Nani | 1 | Winger |
Agent Id | Name |
---|---|
1 | Mandez |
2 | David |
We can optimized by moving Clubs into "Clubs" table and linking its id as foreign key in "Players_Clubs" table.