Introduction to Database Normalization

Having consistent, non-redundant, and relatable data in the database is very beneficial. For this purpose, the database normalization process is carried out. We are very well aware that the database is a system that organizes any data in a structured manner. Hence, the database normalization process enables you to manage databases’ data in a well-structured and precise fashion. 

What is Database Normalization?

The database normalization process eliminates data redundancy and enhances data integrity. Data redundancy implies repetitive data. It occupies a larger space in the disk and results in wastage of disk memory. If the data in one single place is modified, it has to be changed in all locations wherever it is present. Hence, having redundant and repetitive data in the database tables is not feasible.

Another term, data integrity, means completeness and consistency of data. The database data must be accurate and meaningful. So, to ensure data integrity and non-repetitiveness, database normalization is employed. 

Along with eliminating data redundancy and enhancing data integrity, the database normalization process eliminates various database anomalies, like insertion, deletion, and update as well. An anomaly in the database weakens data integrity. 

The normalization’s primary objective is to divide larger tables or relations in the database into smaller ones and introduce the relationship between them. Each table or relation should have single attribute values, called atomic values. The database table should have only the relatable data and not any other irrelevant data.

Objectives of Database Normalization

When you normalize the database, make sure you attain the following objectives:

  • The data in the database should be stored logically. The larger tables are grouped into smaller ones. Every smaller group must reflect the part of the larger group. 
  • Data redundancy should be removed as it consumes a more massive amount of database space. 
  • Make sure your database is consistent. Any modifications in the database like insertion, deletion, and an update should not harm data integrity. 
  • If the same data is present at distinct locations, you need to modify it in all areas. While normalizing the database, ensure that you need to update the data only at a single location. 

Anomalies in Database Normalization

In a database management system (DBMS), there are three different kinds of anomalies, namely, insertion, deletion, and update. These three anomalies affect the data integrity principle. 

Let us discuss each of the three anomalies. But, let us first consider a relation or table Employee. It has four attributes, Emp_ID, Emp_Name, Emp_Add, and Ep_Dept, that store employees’ data. 

Employee

Emp_IDEmp_NameEmp_AddEmp_Dept
301JohnCaliforniaD01
301JohnCaliforniaD022
323SammyWashington DCD08
366NickChicagoD10
366NickChicagoD14

The above table stores Employee data. But, it is not in a normalized form. When the table is not in a normalized way, you may face several problems while interesting new data, deleting any existing data, or updating any current data.  

  1. Insert:

The insert anomaly does not allow users to add any new data into the database. This is due to the absence of some specific data. Consider the above Employee relation. If any new employee is joining the company, and no department is assigned to him or her, that particular employee’s data cannot be inserted into the table. This is because we cannot insert NULL into the Emp_Dept. 

  1. Delete:

Another anomaly is the deletion anomaly. In this circumstance, deleting data would not be feasible as it may result in data loss. Consider the above Employee table. Suppose the department D08 closes, the data related to Sammy would be deleted from the Employee table. Hence, the company will lose Sammy’s data as he works in only one department.

  1. Update:

An update anomaly is another element that produces data inconsistency. We take an example of the Employee table to understand an update anomaly. Consider the employee John, who works in D01 and D022 departments. If you wish to modify John’s address, you need to update both rows of D01 and D022; it can harm the data integrity principle. If the address is changed in one row and the other row remains the same as the previous one, it ends in an update anomaly. 

The database normalization is useful in eliminating all the above three anomalies that result in a weak database table. There are several rules or forms of normalization. Before diving deep into the normalization forms, we shall first learn distinct kinds of keys in the database system. You must know the types of keys before you learn the database normalization.

READ  Top 10 Best Free Firewall Software for Windows

Keys in Database

The keys in the database play a vital role in identifying the data present in it. Using keys, you can find any data from the table quickly and easily. The large tables are divided into smaller ones, and keys are used to connect the smaller tables. 

database normalization

Let us see the different types of keys used in the DBMS.

  1. Primary Key:

The primary key is an attribute of the table that identifies any row or tuple uniquely. You must choose the primary key that will uniquely find any data from the table. 

Consider a relation Employee. It has attributes like Emp_ID, Emp_Name, Emp_Add, Passport_Number, and License_Number. The primary key of the Employee relation will be the Emp_ID, as it will uniquely identify every employee’s data. Additionally, the Passport_Number and License_Number can also serve as primary keys as they are unique for every employee. 

  1. Candidate Key:

The candidate key of any table is a set of minimal attributes and can identify any row uniquely in the relation. There can be single or more candidate keys for a single relation. 

Consider the above relation of Employee. We saw that the primary key is the Emp_ID, which is unique and non-repetitive for every employee. The other two attributes, Passport_Number and License_Number, are also non-repetitive. So, they both can serve as candidate keys.

  1. Super Key:

As the primary key and candidate key identifies every tuple uniquely, the super key also helps find the table’s unique tuple. The candidate key is the subset of the super key. There can be one or many super keys. 

We shall use the same Employee relation to have a clear idea about the super key. The Emp_ID attribute can uniquely find out any employee’s data. The Emp_Name attribute cannot be used as the primary key, as two employees can have the same name. But, the combination of the Emp_ID and Emp_Name can find employee’s data uniquely. So, (Epm_ID, Emp_Name) serves as super keys for the Employee relation. 

The Passprt_Number and License_Number are also super keys of the Employee relation. 

  1. Foreign Key:

The foreign key is quite different from the above three keys. It is used to establish a connection between two relations. Consider two relations A and B. Suppose any attribute in the relation A is the primary key of the relation B; that attribute is referred to as the foreign key.  

We shall look at the simple example to understand the foreign key concept. Let us take the example of employees in the company. Every employee is assigned to different departments. Hence, we use two relations, Employee and Department. 

We define Employee relations as Employee (Emp_ID, Emp_Name, Passport_Number, License_Number, Dept_ID), and Department relation as Department (Dept_ID, Dept_Name). 

In the Employee relation, Emp_ID is the primary key, whereas Dept_ID is the department relation’s primary key. The attribute Dept_Id is one attribute in the Employee relation, which is the primary key in the Department relation. Hence, the Dept_ID serves as the foreign key. 

  1. Composite Key:

A composite key is a group of attributes that uniquely finds very employee’s data from the relation. The composite key is the combination of two and more than two attributes.  

From the above Employee relation Emp (Emp_ID, Emp_Name, Passport_Number, License_Number), the composite key is (Emp_name, Emp_ID). 

Before the data normalization forms, another concept you must learn is the functional dependencies along with the key types. Let us know functional dependencies in detail. 

Functional Dependencies in the Database

E.F. Codd developed the concept of functional dependency to avoid or eliminate redundant data. The functional dependency is the relationship between any two attributes or columns of the same relation. In other words, one attribute uniquely finds another attribute. Both attributes belong to the same relation. 

For example, consider two columns, A and B, of the same table. The functional dependency exists between A and B only if column A uniquely finds column B. It is represented as A -> B and is read out as B is functionally dependent on A. You can refer to A as a determinant and B as a dependent. 

There are different types of functional dependencies. Let us see some of these functional dependencies here.

  1. Trivial Functional Dependency:

Consider two attributes, A and B, of the same relation. The trivial functional dependency holds only if B is the subset of A. 

A -> B, if B is the subset of A. 

Consider an Employee relation. Take two attributes, Emp_ID and Emp_Name. The attribute Emp_Id is functionally dependent on {Emp_ID, Emp_Name}. 

{Emp_ID, Emp_Name} -> Emp_ID

Here, Emp_ID is the subset of the {Emp_ID, Emp_Name}. Hence, {Emp_ID, Emp_Name} -> Emp_ID is a trivial functional dependency. 

  1. Non-Trivial Functional Dependency:

The non-trivial functional dependency is the opposite of the trivial functional dependency. Take two columns, P and Q, of the same relation. The non-trivial functional dependency holds if Q is not the subset of P. 

P -> Q, if Q is not the subset of P

If P intersection Q is NULL, then it is complete non-trivial functional dependency. 

Consider the three columns of the Employee relation, Emp_ID, Emp_Name, and Emp_Add. 

READ  15 Best iPhone VPNs for 2020

Emp_ID -> Emp_Name 

The above dependency is a non-trivial functional dependency, as Emp_Name is not the subset of Emp_ID. 

  1. Transitive Functional Dependency:

A transitive functional dependency involves two distinct functional dependencies. It is formed indirectly involving two dependencies. Consider three attributes, A, B, and C, of the same table. The trivial dependency A -> C  holds, if

  • A -> B
  • B does not hold A
  • B -> C

Consider the Students table. Take three columns, Stud_ID, Stud_Name, and Stud_Age. The transitive functional dependency Stud_ID -> Stud_Age hold, if

  • Stud_Id -> Stud_Name
  • Stud_Name does not hold Stud_Id.
  • Stud_Name -> Stud_Age

Hence, if we know the student’s ID, we can find out his or her age. 

  1. Multi-Valued Dependency:

For a single determinant in any functional dependency, there are two or more dependents. Consider the functional dependency X -> Y. In the multi-valued dependency, for every X, there are multiple values of Y. 

To satisfy the multi-valued dependency, there should be a minimum of three attributes in the relation. For example, R(X, Y, Z). If there is a multivalued functional dependency between X and Y, then Y and Z attributes should be independent of each other. 

We have seen all the essential elements required in understanding the database normalization. Now, let us move towards the core topic, the database normalization forms. 

  1. Join Dependency:

Consider a relation R having A, B, C, and D attributes. The relation R is decomposed into the other two relations, R1 and R2, where R1 has A, B, and C attributes, and R2 has C and D attributes. If we join R1 and R2 using the common attribute, C, and the resulting relation is the same as that of the R, then the join dependency exists. 

The join dependency is said to be lossless if the relation’s attributes resulting after the join are the same as the attributes in R. 

Different Forms of Database Normalization

Several forms of database normalization are developed to eliminate data redundancy and improve data integrity. The following are the different database normalization forms, along with their instances. 

database normalization
  1. First Normal Form (1NF):

The first form of database normalization is the first normal form. The database relation is in the first normal form only when all its attributes have a single or atomic value. No attribute of the table should contain multiple values. Let us see an example of how the table complies with the first normal form. 

Consider an Employee table, having Emp_ID, Emp_Name, Emp_Add, and Emp_Mobile_Number as attributes. 

Employee

Emp_IDEmp_NameEmp_AddEmp_Mobile_Number
E01JohnNew Delhi8389097676
E02MichelleMumbai7878689878
E03SamRanchi98765432197656463686
E04OliverKolkata9087654547

The above table is not in the first normal form (1NF); as an employee, Sam has two mobile numbers. Thus, the above table does not comply with the first normal form. The first normal form states that every attribute should hold an atomic value. Therefore, we need to make the above table in the first normal form. 

Employee1

Emp_IDEmp_NameEmp_AddEmp_Mobile_Number
E01JohnNew Delhi8389097676
E02MichelleMumbai7878689878
E03SamRanchi9876543219
E03SamRanchi7656463686
E04OliverKolkata9087654547

The above Employee1 relation is in the first normal form, as every attribute has a single value. 

Before we start the second normal form, you need to know about the non-prime and prime attributes. The prime attribute is the one that is present in the candidate key. And the non-prime attribute is the one that is not present in the candidate key. 

  1. Second Normal Form (2NF):

Another database normalization form is the second normal form. Any database relation or table complies with the second normal form if it satisfies the below conditions:

  • The table must comply with the first normal form.
  • Any non-prime attribute of the table should not be dependent on the proper subset of the candidate key. 

You can have a clear idea about the second normal form after looking at the example below. 

Consider a Teacher relation having Teacher_Id, Subject, and Teacher_Age as attributes. The table is as follows:

Teacher

Teacher_IDSubjectTeacher_Age
T01Java35
T01Data Structures35
T02Python35
T03Data Structures40
T03DBMS40

Here, the candidate keys are {Teacher_ID, Subject}, which uniquely finds teachers’ data. As the attribute Teacher_Age is not in the candidate key, it servers as the non-prime attribute. 

After looking at the above relation, we can conclude that the table complies with the first normal form of database normalization. Every attribute is single-valued. But, it is not in the second normal form. We can identify any teacher’s age only through the Teacher_ID attribute. As Teacher_Age is a non-prime attribute and Teacher_ID is the proper subset of the candidate key, it breaks the 2NF rule. 

To make the above relation in the second normal form, we need to break it into two tables as follows:

Teacher_Age

Teacher_IDTeacher_Age
T0135
T0235
T0340

Subject

Teacher_IDSubject
T01Java
T01Data Structures
T02Python 
T03Data Structures
T03DBMS

The above two relations are in the second normal form. 

  1. Third Normal Form:
READ  Top 25 Salesforce Interview Questions And Answers You Must Prepare In 2020

The third normal form is the nest database normalization form. Any relation is said to be in the third normal is it satisfies the below conditions:

  • The relation should comply with the second normal form (2NF). 
  • Any non-prime attribute should not have a transitive functional dependency on the super key. 

You can also define the third normal form of database normalization as the table should be in the 2NF and the functional dependency X -> Y should follow any one of the following conditions:

  • X should be the superkey of the relation.
  • Y should be the prime-attribute of the relation. 

We shall see how the table complies with the third normal form. Consider an Employee relation having Emp_ID, Emp_Name, Emp_Zip, Emp_State, Emp_City, and Emp_District. This relation is represented as follows:

Employee

Emp_IDEmp_NameEmp_ZipEmp_StateEmp_CityEmp_District
E01John267778MaharashtraKalyanMumbai
E02Sam234567Tamil NaduChennaiM-City
E06Johnny278967UttarakhandPauriBhagwan
E07Rose209876Tamil NaduChennaiUrrapakkam
E08Steve215647Madhya PradeshGwaliorRatan

The super keys of the Employee relation are {Emp_ID}, {Emp_ID, Emp_Name}, {Emp_ID, Emp_Name, Emp_Zip}, and many others. The candidate key is {Emp_ID}. Hence, the EMP_ID is the prime attribute and all other are non-prime attributes. 

You can see that the three attributes, Emp_State, Emp_City, and Emp_District, are functionally dependent on the Emp_Zip attribute. We can find the zip number using the Emp_ID attribute. Hence, Emp_Zip is dependent on Emp_ID. 

The three attributes, Emp_State, Emp_City, and Emp_District, are non-prime attributes. They are indirectly dependent on the Emp_ID attribute, which breaks 3NF rules. To make the Employee relation comply with the third normal form (3NF), we need to break the table into smaller tables as below:

Employee_ID

Emp_IDEmp_NameEmp_Zip
E01John267778
E02Sam234567
E06Johnny278967
E07Rose 209876
E08Steve215647

Employee_Zip

Emp_ZipEmp_StateEmp_CityEmp_District
267778MaharashtraKalyanMumbai
234567Tamil NaduChennaiM-City
278967UttarakhandPauriBhagwan
209876Tamil NaduChennaiUrrapakkam
215647Madhya PradeshGwaliorRatan

The above two tables are in the third normal form. 

  1. Boyce Codd Normal Form (BCNF):

The Boyce Codd Normal form of the database normalization is an extended version of the 3NF. It is referred to as the 3.5NF. Any table or relation complies with the BCNF if it holds the following conditions:

  • The relation must be in the third normal form (3NF). 
  • For any functional dependency A -> B in the relation, A should be the superkey.

You can clearly understand the BCNF concept through an example of employees working in multiple companies’ departments. Consider an Employee relation having Emp_ID, Emp_Nationality, Emp_Department, Dept_Type, No_of_Employees as attributes. The relation holds the following values:

Employee

Emp_IDEmp_NationalityEmp_DepartmentDept_TypeNo_Of_Employees
E01IndianProductionD01250
E01IndianManagementD02300
E02AmericanTechnical SupportD03400
E02AmericanPurchaseD04450

The candidate for the above relation is {Emp_ID, Emp_Dept}. Neither the single Emp_ID attribute can provide the department information, nor the Emp_Dept can determine the employee information. So, the above relation does not comply with the BCNF. To make the above table in BCNF, divide it into three tables as follows:

Employee_Nationality

Emp_IDEmp_Nationality
E01Indian
E02American

Here, Emp_ID is the candidate key. The functional dependency is Emp_ID -> Emp_Nationality. Hence, it is in the BCNF. 

Employee_Department

Emp_DepartmentDept_TypeNo_Of_Employees
ProductionD01250
ManagementD02300
Technical SupportD03400
PurchaseD04450

Here, Emp_Dept is the candidate key, and the functional dependency is {Emp_Dept -> Dept_Type, No_of_Employees}. Therefore, the above relation also complies with BCNF. 

Employee_ID_Dept

Emp_IDEmp_Dept
E01Production
E01Management
E02Technical Support
E02Purchase

For this relation, Emp_ID and Emp_Dept, both are the candidate keys. 

  1. Fourth Normal Form (4NF):

We have seen the multi-valued dependency in the above section. The table is said to be in the fourth normal form if it holds all the below conditions:

  • The relation should comply with the Boyce Codd Normal Form.
  • There should be no multi-valued dependencies between the table’s attributes. 

We shall talk about the fourth normal form using Students relation. The Students relation has three attributes. Stud_ID, Stud_Course, and Stud_Hobby. The table values are as below:

Students

Stud_IDStud_CourseStud_Hobby
S01MathematicsHockey
S01PhysicsTennis
S02ProgrammingHockey
S02ChemistryTennis

The above relation is not in the fourth normal form (4NF), as it has multi-valued dependencies in it. The attributes Stud_Course and Stud_Hobby are dependent on the Stud_ID attribute, which ends in multi-valued dependency. Therefore, to make the above relation in 4NF, we need to break the relation into two different relations as follows:

Students_Course

Stud_IDStud_Course
S01Mathematics
S01physics
S02Programming
S02Chemistry

Students_Hobby

Stud_IDStud_Hobby
S01Hockey
S01Tennis
S02Hockey
S02Tennis

The above two relations are in the fourth normal form. 

  1. Fifth Normal Form:

Another database normalization form is the fifth normal form. It is also referred to as the Project-Join Normal Form (PJ/NF). Any relation complies with the fifth normal form if it satisfies the following conditions:

  • The table is in the fourth normal form (4NF). 
  • There should be no join dependencies, and the joining of relations should be lossless. 

To understand the fifth normal form concept, we shall see an example of the Faculty relation. 

Faculty

Fac_SubjectFac_NameFac_Sem
Computer ScienceJohnSem 1
Computer ScienceOliverSem 1
Electronics and TelecommunicationOliverSem 1
Electronics and TelecommunicationSteveSem 2
MechanicalStephenSem 1

The Faculty relation does not comply with the fifth normal form (5NF). To make the Faculty relation in the fifth normal form, decompose it into three different relations, as shown below:

Faculty1

Fac_SemFac_Subject
Sem 1Computer Science
Sem 1Electronics and Telecommunication
Sem 1Mechanical
Sem 2Electronics and Telecommunication

Faculty2

Fac_SemFac_Name
Sem 1John
Sem 1Oliver
Sem 1Oliver
Sem 2Steve
Sem 1Stephen

Faculty3

Fac_SubjectFac_Name
Computer ScienceJohn
Computer ScienceOliver
Electronics and TelecommunicationOliver
Electronics and TelecommunicationSteve
MechanicalStephen

All the above three relations are in the fifth normal form. 

Conclusion

The database normalization process is extremely useful in eliminating repetitive data and improving the data integrity principle. Additionally, it saves disk space by removing redundant data from the database. When you go through this post, you will understand different database normalization forms and functional dependency types. 

Keys are the primary elements in the database. They allow users to retrieve data quickly and efficiently. We covered types of keys, primary key, candidate key, super key, foreign key, and composite key. 

The content of this article will provide you a brief idea about the distinct functional dependencies types. We have covered six different kinds of database normalization forms, 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF and their respective examples. We hope you will find all the essential information required to study the database normalization forms. 

Recommended Articles