Development Resource Project
Enforce Coding Standards with PHP_CodeSniffer and Eclipse IDE on Ubuntu Linux
ENUMs, User Preferences, and the MySQL SET Datatype
Using Multi-Byte Character Sets in PHP (Unicode, UTF-8, etc)
Installing Xdebug for use with Eclipse or Netbeans on Linux
JQuery Venetian Blinds Transition Effect

Optimising MySQL

Wednesday, 25 June 08, 12:44 pm

Natural vs Surrogate Primary Keys

The primary key (PK) of a table is a column or set of columns which uniquely identifies each row of the table. A natural primary key uses actual properties (ie columns) of an item. For instance, if you have a table of cars, you might choose to use the registration number as primary key. This would be a natural primary key.

A surrogate primary key on the other hand is one which has been created specifically for the purpose. Generally this takes the form of an auto-incrementing integer column.

Choosing between a Natural and Surrogate Primary Key

There are reasons for and against each type of primary key. Here are some factors to consider:
  • Key length - ideally you want a short value for your PK: numeric fields are best; integers best of all.
  • Record clustering - some database engines, such as Oracle and InnoDB, cluster records on disk according to the primary key, making look-ups on the PK potentially very fast. If you use a natural primary key that is often used in lookups, this can provide real performance benefits. Other engines, such as MyISAM, do not use record clustering, and would not see these benefits.
  • Foreign key lookups - if you use a natural primary key, then other tables will use this as a foreign key (FK). This can simplify some queries, by avoiding a join. Say for instance you used a car's registration number as PK in your cars table, and you have other tables such as repairs that link to this table. It would be possible to retrieve all rows from repairs for a particular registration number, without having to join to the cars table.
  • Long-term uniqueness - changing a primary key is no simple matter. The update has to cascade down through other tables that have it as an FK, and all those other tables' indexes must be rebuilt. It's something you really want to avoid as much as possible. Using a car registration number appears to be a great choice, but in fact it does not uniquely identify a single vehicle 100% reliably as it is possible for an owner to change their number plate. The same is true of a lot of properties that appear at first glance to be good candidates for a natural PK.
This guy is quite strongly in favour of surrogate primary keys, and I have to say that nine times out of ten their simplicity of use outweighs the small performance gains that natural keys might bring.

Composite Primary Keys

This chap gives a good defence of composite primary keys.

Here's an interesting analysis of the overhead of using character-based columns in joins.

Resolving Character Set Issues

First thing to do is to make sure you're using the same character set consistently. The following four main places can all potentially be using different character sets: the client, the connection, the server, the database. MySQL will try to convert characters as needed, so if these are inconsistent, it will magnify the charset confusion ten-fold.

Add these lines to /etc/mysql/my.cnf to ensure the MySQL client and MySQL server use UTF-8 by default:
[client] .... default-character-set=utf8
[mysqld] .... default-character-set=utf8 collation-server=utf8_general_ci character-set-server=utf8
The [client] section there only affects the MySQL command line client. In web development, the client is normally the web browser. If you're using UTF-8, you'll need to tell the browser that with a content-type tag in the head section of your site's pages:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
MySQL manual on EXPLAIN syntax
Leave Comment

/xkcd/ Self-Driving Issues

About This Page