NORMALIZATION

NORMALIZATION:

153.There are three types of database anomalies:

Update anomaly.

 

The same information can be expressed on multiple rows; therefore updates to the relation may result in logical inconsistencies. For example, each record in an “Employees’ Skills” relation might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee may need to be applied to multiple records (one for each skill). If the update is only partially successful – the employee’s address is updated on some records but not others – then the relation is left in an inconsistent state. Specifically, the relation provides conflicting answers to the question of what this particular employee’s address is. This phenomenon is known as an update anomaly.

Insertion anomaly

There are circumstances in which certain facts cannot be recorded at all. For example, each record in a “Faculty and Their Courses” relation might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course Code. Therefore, we can record the details of any faculty member who teaches at least one course, but we cannot record a newly hired faculty member who has not yet been assigned to teach any courses, except by setting the Course Code to null. This phenomenon is known as an insertion anomaly.

Deletion anomaly

 Under certain circumstances, deletion of data representing certain facts necessitates deletion of data representing completely different facts. The “Faculty and Their Courses” relation described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears, effectively also deleting the faculty member, unless we set the Course Code to null. This phenomenon is known as a deletion anomaly.

154.What is database normalization?

Database Normalization is nothing but technique of designing the database in structured way to reduce redundancy and improve data integrity.

Database Normalization is used for following Purpose:

  • To Eliminate the redundant or useless data
  • To Reduce the complexity of the data
  • To Ensure the relationship between tables as well as data in the tables
  • To Ensure data dependencies and data is logically stored.

155.What are different check points for normalizing data?

There are following checkpoints to normalize the data :

  • Arrangement of data into logical groups.
  • Minimize the Duplicate data.
  • Organize  the data in such way that when modification needed then there should be only one place modification required.
  • User can access and manipulate data quickly and efficiently
156.Why Denormalization is promoted in Universe Designing?
  •  In a relational data model, for normalization purposes, some lookup tables are not merged as a single table.
  • In a dimensional data modeling (star schema), these tables would be merged as a single table called DIMENSION table for performance and slicing data.
  • Due to this merging of tables into one large Dimension table, it comes out of complex intermediate joins.
  • Dimension tables are directly joined to Fact tables.
  • Though, redundancy of data occurs in DIMENSION table, size of DIMENSION table is 15% only when compared to FACT table.
  • So only Denormalization is promoted in Universe Designing.
  • De-Normalization is a process of adding redundancy to the data.
  • This helps to quickly retrieve the information from the database
157.What is Normalization, First Normal Form, Second Normal Form ,Third Normal Form?
 Normalization is process for assigning attributes to entities
  • Reduces data redundancies
  • Helps eliminate data anomalies
  • Produces controlled redundancies to link tables
Normalization is the analysis of functional dependency between attributes /data items of user views,It reduces a complex user view to a set of small and stable subgroups of fields / relations 
INF:
Repeating groups must be eliminated,
Dependencies can be identified,
All key attributes defined,
No repeating groups in table 
2NF:
The Table is already in INF
Includes no partial dependencies–
No attribute dependent on a portion of primary key,
Still possible to exhibit transitive dependency,Attributes may be functionally dependent on non-key attributes
3NF:
The Table is already in 2NF,
Contains no transitive dependencies.
normalization

images

database-normalization-56-638

mca-iidbmsuivstructured-query-language-24-638Modification+AnomalyNormalizationnormalization_steps1slide_2slide_6 (1)slide_6

 

158.Is this statement TRUE or FALSE? all databases must be in third normal form?

In general all organization data bases are normalised to 3nf in order to remove redundancy and efficient access.A database can also be created without normalisation.Hence it is not a mandatory that a database should be in 3nf.

159.When should you consider denormalization?

Denormalization is used when there is a lot of tables involved in retrieving data. Denormalization is done in dimensional modelling used to construct a data warehouse. This is not usually done for databases of transactional systems.

 

  • Denormalization is used when there are a lot of tables involved in retrieving data.
  • Denormalization is done in dimensional data modelling used to construct a data ware house.
  • This is not usually done for data bases of transactional systems.

What is first normal form ? Explain with example.

The first normal form is the normal form of database where data must not contain repeating groups.The database is in First normal form If,

1.It contains only automic values.

Atomic values:- The Single cell have only single value

2.Each Record needs to be unique and there are no repeating groups.

Repeating Groups:- Repeating group means a table contains 2 or more values of columns that are closely related.

Example:

Consider following table which is not normalized:

Employee Table:

Employee No Employee Name Department
1 Amit OBIEE,ETL
2 Divya COGNOS
3 Rama Administrator

To bring it in to first normal form We need to split table into 2 tables.

First table:Employee Table

Employee No Employee Name
1 Amit
2 Divya
3 Rama

Second Table: Department table

Employee No Department
1 OBIEE
1 ETL
2 COGNOS
3 Administrator

We have divided the table into two different tables and the column of each table is holding the automic values and duplicates also removed.

 

 

What is second normal form?

An entity is in the second normal form if all of its attributes depend on the whole (primary) key. In relational terms, every column in a table must be functionally dependent on the whole primary key of that table. Functional dependency indicates that a link exists between the values in two different columns.

If the value of an attribute depends on a column, the value of the attribute must change if the value in the column changes. The attribute is a function of the column. The following explanations make this more specific:

  • If the table has a one-column primary key, the attribute must depend on that key.
  • If the table has a composite primary key, the attribute must depend on the values in all its columns taken as a whole, not on one or some of them.
  • If the attribute also depends on other columns, they must be columns of a candidate key; that is, columns that are unique in every row.
  • If you do not convert your model to the second normal form, you risk data redundancy and difficulty in changing data. To convert first-normal-form tables to second-normal-form tables, remove columns that are not dependent on the primary key.

 

 Explain Second Normal Form with example

The data is said to be in second normalized form If,

1.It is in First normal form

2.There should not be any partial dependency of any column on primary key.Means the table have concatenated primary key and each attribute in table depends on that concatenated primary key.

3.All Non-key attributes are fully functionally dependent on primary key.If primary is is not composite key then all non key attributes are fully functionally dependent on primary key.

Example:

Let us consider following table which is in first normal form:

Employee No Department No Employee Name Department
1 101 Amit OBIEE
2 102 Divya COGNOS
3 101 Rama OBIEE

In above example we can see that department .Here We will see that there is composit key as{ Employee No,Department No}.Employee No is dependent on Employee Name and Department is dependent on Department No.We can split the above table into 2 different tables:

Table 1:Employee_NO table

Employee No Department No Employee Name
1 101 Amit
2 102 Divya
3 101 Rama

Table 2:Department table

Department No Department
101 OBIEE
102 COGNOS

Now we have simplified the table in to second normal form where each entity of table is functionally dependent on primary key.

What is third normal form?

An entity is in the third normal form if it is in the second normal form and all of its attributes are not transitively dependent on the primary key. Transitive dependence means that descriptor key attributes depend not only on the whole primary key, but also on other descriptor key attributes that, in turn, depend on the primary key. In SQL terms, the third normal form means that no column within a table is dependent on a descriptor column that, in turn, depends on the primary key.

For 3NF, first, the table must be in 2NF, plus, we want to make sure that the non-key fields are dependent upon ONLY the PK, and not other non-key fields for its existence. This is very similar to to 2NF, except that now you are comparing the non-key fields to OTHER non-key fields. After all, we know that the relationship to the PK is good, because we established that in 2NF.

 Explain Third Normal Form with example.

The database is in Third normal form if it satisfies following conditions:

1.It is in Second normal form

2.There is no transitive functional dependency

Transitive Dependency:

When table 1 is Functionally dependent on table 2. and table 2 is functionally dependent on table 3 then.table 3 is transitively dependent on table 1 via table 2.

Example:

Consider following table:

Employee No Salary Slip No Employee Name Salary
1 0001 Amit 50000
2 0002 Divya 40000
3 0003 Rama 57000

In above table Employee No determines the Salary Slip No.And Salary Slip no Determines Employee name.Therefore Employee No determines Employee Name.We have transitive functional dependency so that this structure not satisfying Third Normal Form.

For That we will Split tables into following 2 tables:

Employee table:

Employee No Salary Slip No Employee Name
1 0001 Amit
2 0002 Divya
3 0003 Rama

Salary Table:

Salary Slip No Salary
0001 50000
0002 40000
0003 57000

Following are 2 Advantages of 3rd normal form:

1.Amount of data duplication is removed because transitive dependency is removed in third normal form.

2.Achieved Data integrity.

Explain Boyce Code Normal Form with example.

BCNF Normal form is higher version of third normal form.This form is used to handle analomies which are not handled in third normal form. BCNF does not allow dependencies between attributes that belongs to candidate keys.It drops restriction of the non key attributes from third normal form.

Third normal form and BCNF are not same if following conditions are true:

1.The table has 2 or more candidate keys

2.At least two of candidate keys are composed of more than 1 attribute

3.The keys are not disjoint.

Example:

Address-> {City,Street,Zip}

Key 1-> {City,Zip}

Key 2->{City,Street}

No non key attribute hence this example is of 3 NF.

{City,Street}->{zip}

{Zip}->{City}

There is dependency between attributes belonging to key.Hence this is BCNF.