Tutorials‎ > ‎

Basics Of Database Table Design - Pt3

posted 26 Sep 2012, 02:02 by Alistair Hamilton   [ updated 21 Oct 2012, 04:48 ]
Basics of database table design
Last time, we started looking at some example data with a view to explaining the issues that frequently arise when database table design is not considered fully. We Introduced the concept of data 'normalisation' as a mechanism to maintain data intergrity, reduce errors, minimise data storage requirements and provide an element of 'future proofing' should the database design need to expand at a later date.

An example table was introduced (reproduced below for convenience) which demonstrated a number of issues that would cause problems. What we are going to do now, is discuss how these problems are resolved by making sure our tables adhere to the 'normalisation' rules introduced in Basics of Database Table Design - Pt2.

First up is the First Normal Form but before that, please refresh your memory of our example data in the table below.


Employees

Project 1

Project 2

Project 3

John Smith

CD Records Database

Internet Plug-ins


Jill Jones

Traffic Control Software

CD Records Database

Case Manangement

Thomas Harding

Traffic Control Software



Lee Walsh

Internet Plug-ins

Case Management

Satellite Navigation

Joe Jordan

Satellite Navigation

Traffic Control Software


Neil Watson

MP3 Development

Multi-media Training


Louise Harrower

Spreadsheet Utilities

Office Integration




To recap, the table above suffers from the following problems:
  1. Querying this table will be difficult. For example, how do we go about calculating the number of people working on a particular project?
  2. In the various 'Project' columns, the title of various projects are being repeated. This can lead to data input errors through misspelling and so on.
  3. Jill Jones and Lee Walsh are currently involved in three projects. What happens if they are assigned a forth? How is that information recorded? The only way would be to add an additional field to the table.
  4. As some people do not work on all current projects, the corresponding 'cell' will be empty. However, in database table design, this lack of information still takes up storage space. The bigger the table, the more storage space is wasted.
The solution to this is to ensure that the table is in First Normal Form by eliminating repeating groups and putting each into a separate table and connecting them with a one-to-many relationship.

In our example, the project information is in repeating groups, i.e. the Project 1, Project 2 and Project 3 columns. We need to pull that information out so that we end up with two tables: one holding the Employees data and the other, listing all the projects. Like so....

Employees


Project

John Smith


Internet Plug-ins

Jill Jones


CD Records Database

Thomas Harding


Case Management

Lee Walsh


Traffic Control Software

Joe Jordan


Multi-media Training

Neil Watson


Satellite Navigation



MP3 Development



Office Integration



Spreadsheet Utilities



We now have two tables that are related. (As a side note: the contents of the tables solely define what the table holds which implies that these simple example tables are in 2nd Normal Form – that's the topic for Pt 4).

The important point to note is that when Jill Jones is assigned another project all that needs to happen is a new record is placed in the Projects table.  There is NO NEED to redesign the Project table each time – simply add another record instead.

The relationship between the Employees and the Projects tables handles the link between them. In essence, for each employee in the example table, there may or may not be any number of corresponding projects in the Project table. In other words, John Smith may work on three projects, yet Lee Walsh may not be assigned any. Irrespective of the combination, it is handled by the table design and the relationship between the tables concerned. I’ll be discussing the relationship between tables in more detail later in the series.

Similarly, if a new employee joins the firm, their details are simply added to the Employee table. Their inclusion doesn't affect the Project table which will not have any 'empty storage'.

Hopefully, you'll see from this simple example, that just getting your tables into First Normal Form, provides a major improvement, automatically eliminating numerous potential issues with your database design.

Just to recap:

First Normal Form involves eliminating repeating groups (columns that hold the same or similar type of data) of information and placing all that data into its own table. This eliminates repeating data; removes empty cells thereby reducing data storage requirements; reduces data input errors by having specific data stored in one place and one place only; and, perhaps equally important, makes it much easier to incorporate new data without having to redesign the table each time.

In the next installment, I'll be moving on to the Second Normal Form where the fields within each record of a table must uniquely define the record.