Archives January 2023

Describe normalization and how it is used – Identify considerations for relational data on Azure

Describe normalization and how it is used

When you’re designing a relational database, one of the key approaches you’ll use is normal-ization. This process is all about organizing your data in a way that reduces redundancy and helps avoid any inconsistencies that could lead to confusion or errors.

You’ll achieve normalization by dividing your data into multiple related tables, each one focused on specific data. These divisions are guided by rules known as normal forms. There are several normal forms, each with a specific set of rules. The main ones are first normal form, second normal form, third normal form, Boyce–Codd normal form, fourth normal form, and fifth normal form. Each successive normal form comes with stricter rules.

These are the benefits of normalization:

  • Reduces data redundancy: By breaking down tables and eliminating duplicate data, normalization helps minimize redundancy.
  • Improves data consistency and integrity: Normalization enforces rules that enhance data accuracy and reliability, reducing the likelihood of anomalies.
  • Simplifies query handling: Well-organized data simplifies the process of querying, updating, and managing data.
  • Efficiently uses storage: By reducing redundancy, normalization ensures efficient use of storage.

Normalization is conducted through a series of steps, or what we call normal forms:

  1. First normal form (1NF): Ensures all columns in a table are atomic and each cell can contain only a single value. No repeating groups are allowed.

Example: A Students table with a Subjects column storing comma-separated values

violates 1NF. It should be divided into two tables: Students and Subjects.

  • Second normal form (2NF): Achieved when the table is in 1NF and all nonkey attri-butes are fully functionally dependent on the primary key.

Example: A table with the columns StudentID, Subject, and Teacher, where Teacher depends on Subject, not on StudentID, violates 2NF. To achieve 2NF, divide the table to separate subjects and their teachers.

32 CHAPTER 2   Identify considerations for relational data on Azure

  • Third normal form (3NF): Achieved when the table is in 2NF and all the attributes are functionally dependent only on the primary key.

Example: If a table contains StudentID, StudentAge, and StudentCity, and if StudentAge can be derived from StudentID, it should be moved to a separate table to satisfy 3NF.

  • Boyce-Codd normal form (BCNF): A stronger version of 3NF. A table is in BCNF if it is in 3NF, and for every functional dependency X → Y, X should be a superkey.

Example: In a table with the columns StudentID, CourseID, and Instructor, if Instructor is dependent on CourseID and not on StudentID, the table should be split to satisfy BCNF.

  • Fourth normal form (4NF): Achieved when a table is in BCNF and does not have multi-valued dependencies.

Example: A table with StudentID, Courses, and Hobbies, where Courses and Hobbies are independent multivalued facts about a student, violates 4NF. This should be divided into separate tables.

  • Fifth normal form (5NF): A table is in 5NF if it is in 4NF and there are no join depen-dencies that do not follow from the key constraints.

Example: A complex scenario where a table involves a three-way relationship and break-ing it down into smaller tables and then reconstructing it does not lose information.

Let’s take a look at an example of normalizing the data in Table 2-3.

TABLE 2-3  Unnormalized table

After normalization, this data could be divided into three tables: Student, Course, and Enrollment (see Tables 2-4, 2-5, and 2-6).

TABLE 2-4  Normalized Student table

TABLE 2-5  Normalized Course table

Skill 2.1: Describe relational concepts     CHAPTER 2        33

TABLE 2-6 Normalized Enrollment table

In this example, the Enrollment table establishes the relationship between the Student and Course tables. Now, if the course name changes, you need to update it in only one place, enhancing data consistency.