Primary Keys vs GUIDs

Just thought I would post some code having read Savvy Duck’s comments about the above. Although the project I refer to below was greenfield and didn’t require any kind of database migration, I never encountered any of the performance problems mentioned in some of the related articles.

We were using an Oracle 10g database however, and I’ll report an update if I see any issues with my latest project which is on MySQL and seems to have a bit of a history in this department. But it really was never an issue in any way that we recognized.

The likelihood of a primary key clash was calculated as 1 in 2.27 x 10exp32 for every one million records.

Anyway, the code. We were mapping to Java using Hibernate (avoiding the primary key problem mentioned here) like so:

public abstract class BasePersistentObject implements Serializable
{
    // will be overrriden with data read from database
    @Id
    private String id = UUID.createUUID();
}

I can provide the above UUID code if anyone wants to have a play with this implementation.

Instead of using the standard auto-increment Sequence then, the primary key was implemented with the following function:

/**
 * Returns a UID string.
 *
 * This is string based on a randomly generated bit sequence just like in the app server.
 */
CREATE OR REPLACE FUNCTION f_get_uid RETURN VARCHAR2 IS
BEGIN
	RETURN UTL_RAW.CAST_TO_VARCHAR2(
	           UTL_RAW.TRANSLATE( DBMS_CRYPTO.RANDOMBYTES(24),
	                              uid_util.uid_from_set,
	                              uid_util.uid_to_set ) );
END;
/

-- defines two constants used by f_get_uid() to generate UID strings;
CREATE OR REPLACE PACKAGE uid_util
IS
	uid_from_set CONSTANT RAW(256) := UTL_RAW.XRANGE(hextoraw('00'), hextoraw('FF'));
	uid_to_set   CONSTANT RAW(256) := UTL_RAW.CONCAT(
		UTL_RAW.CONCAT(
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('A'), UTL_RAW.CAST_TO_RAW('Z')),
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('a'), UTL_RAW.CAST_TO_RAW('z')),
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('0'), UTL_RAW.CAST_TO_RAW('9')),
			UTL_RAW.CAST_TO_RAW('-'),
			UTL_RAW.CAST_TO_RAW('_')
		),
		UTL_RAW.CONCAT(
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('A'), UTL_RAW.CAST_TO_RAW('Z')),
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('a'), UTL_RAW.CAST_TO_RAW('z')),
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('0'), UTL_RAW.CAST_TO_RAW('9')),
			UTL_RAW.CAST_TO_RAW('-'),
			UTL_RAW.CAST_TO_RAW('_')
		),
		UTL_RAW.CONCAT(
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('A'), UTL_RAW.CAST_TO_RAW('Z')),
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('a'), UTL_RAW.CAST_TO_RAW('z')),
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('0'), UTL_RAW.CAST_TO_RAW('9')),
			UTL_RAW.CAST_TO_RAW('-'),
			UTL_RAW.CAST_TO_RAW('_')
		),
		UTL_RAW.CONCAT(
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('A'), UTL_RAW.CAST_TO_RAW('Z')),
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('a'), UTL_RAW.CAST_TO_RAW('z')),
			UTL_RAW.XRANGE(UTL_RAW.CAST_TO_RAW('0'), UTL_RAW.CAST_TO_RAW('9')),
			UTL_RAW.CAST_TO_RAW('-'),
			UTL_RAW.CAST_TO_RAW('_')
		)
	);
END;
/
Sunday, March 1st, 2009 Database

1 Comment to Primary Keys vs GUIDs

  1. I’ll second the vote for UUIDs as primary keys.

    Several years ago, I had a customer with to copies of a database – one “current”, and one where they ran another copy of the application to try to catch up on data entry for past years.

    Of course, once they finished the “past years” data entry, they wanted to merge the two databases. Of course, the two databases both used sequences to generate primary keys, and both started the sequences at the same starting values.

    Also, of course, I had to advise the customer that it wasn’t worth in terms of cost to do the merge, or the danger to the integrity of the data.

    Ever since, I’ve used UUIDs whenever I can.
    I’ve never seen a performance hit for doing this, but then, I’ve never worked on a VLDB

  2. greybeardedgeek on March 25th, 2009

Leave a comment

You must be logged in to post a comment.