Tutorials‎ > ‎

Basics Of Database Table Design - Pt2

posted 12 Sep 2012, 03:05 by Alistair Hamilton   [ updated 21 Oct 2012, 04:44 ]
database table design
Hopefully you have managed to take time out from the little exercise I left you with at the end of Basics Of Database Table Design - Pt1. It is a technique that you will have to employ whenever you are designing a database, so practise it often.

As mentioned before, I am deliberately aiming this series of articles at those users of desktop database applications such as MS Access and who may not have, or want to have, the technical knowledge required but still need to be able to build databases. Consequently, I want to avoid being too technical and swamping you with jargon. However, sometimes it is necessary, so let me get it out of the way up front. Feel free to skim over the next section if you wish.

Data Integrity

Fundamental to the design of the database tables is the care that must be taken to ensure the integrity of the data. To give a few examples:
  1. Misspelling of data by data input staff resulting in data being misrepresented by query results.
  2. The creation of orphaned records. For example a table of CDs and related Artists table. If an Artist is deleted from the database, her CDs will still be listed.
  3. Records with many empty fields resulting in increased storage space and a reduction in the efficiency of queries and the like.
  4. Repeating fields – fields that contain similar data. This causes a design problem if the database needs expanding.
  5. Difficulty in expanding the database when new items or tables are required.

Data integrity can be assured if the following rules of normalisation – known as Normal Forms - are adhered to. Normalisation is the process in which database tables are redesigned to ensure data integrity. By incorporating these rules as part of step two in the design process, not only is the data integrity assured but also you will save yourself a lot of work.

Firstly, let me define the five Normal Forms. During the rest of these articles, I'll be looking at examples that will hopefully give a clearer indication how they relate to the process of designing your database tables.
  1. First Normal Form – eliminates repeating groups by putting each into a separate table and connecting them with a one-to-many relationship.
  2. Second Normal Form -  the fields within each record of a table must uniquely define the record. In other words, every field within the table must contain information that is pertinent to that object and specifically the key field and not to something else. This eliminates functional dependencies on a partial key by putting the fields in a separate table from those that are dependent on the whole key.
  3. Third Normal Form – eliminates functional dependencies on non-key fields by putting them into a separate table. At this stage, all non-key fields are dependent on the key, the whole key and nothing but the key.
  4. Forth Normal Form – separates independent multi-valued facts stored in one table into separate tables.
  5. Fifth Normal Form – breaks out data redundancy that is not covered by any of the previous normal forms.

The Forth and Fifth Normal Forms are somewhat esoteric and are of more interest to academia. As such I will not be considering them here. The vast majority of databases that you design (and me as well) will be well served by getting them into Third Normal Form. Indeed, a relational database is often said to be 'normalised' when the underlying tables are in Third Normal Form.

If you are interested in learning more of the technical/theoretical aspects, then Google is your friend. There are a host of articles that will go into much more detail than this series of articles will. Search for terms such as 'Normalisation' or 'Relational Database Design'. You might like to start off with this Wikipedia entry: http://en.wikipedia.org/wiki/Database_normalization.

Now that that is out of the way, let me finish this particular episode by describing an example of the first of these.

First Normal Form

Take a look at the following table which represents the software projects worked on by members of staff and see if you can picture any potential problems:


Employees

Project 1

Project 2

Project 3

John Smith

CD Records Database

Internet Plug-ins


Jill Jones

Traffic Control Software

CD Records Database

Case Manangement

Thomas Harding

Traffic Control Software



Lee Walsh

Internet Plug-ins

Case Management

Satellite Navigation

Joe Jordan

Satellite Navigation

Traffic Control Software


Neil Watson

MP3 Development

Multi-media Training


Louise Harrower

Spreadsheet Utilities

Office Integration



In this case, it is clear that any one particular person can work on one or more projects at any given time. However, the way the table is constructed leaves us with a number of issues:

  1. Querying this table will be difficult. For example, how do we go about calculating the number of people working on a particular project?
  2. In the various 'Project' columns, the title of various projects are being repeated. This can lead to data input errors through misspelling and so on.
  3. Jill Jones and Lee Walsh are currently involved in three projects. What happens if they are assigned a forth? How is that information recorded? The only way would be to add an additional field to the table.
In the next instalment, we'll discuss these issues and how to correct them before moving on to the Second Normal Form.