Skip links

Selectivity in MySQL

April 27, 2011 at 7:57 PM - by Freek Lijten - 0 comments

Tags: ,

I heard a talk concerning MySQL by Joshua Thijssen some weeks ago where he spoke on Selectivity and MySQL for some minutes. I found this especially interesting and researched some more. This small blogpost is the result.

Selectivity defined

The Selectivity of a column is defined as the total amount of records devided by the cardinality (amount of unique records in a column) of the column. In "MySQL-speak" this means:

( count(column) / count(distinct column) ) * 100

This will net you a percentage, it is not uncommon to not multiply by 100 and describe the cardinality as a number between 0 and 1. Lets assume a user table with 63290 records. The cardinality of the (autoincremented, unique) id column will be 63290. The selectivity is 63290 / 63290 or 1. If however in the same user table there is a user_status column with only 3 different values the fomula would be totally different: 3 / 63290 or 0.000047401.

So why should I care?

Columns with a low selectivity are often columns which are used in queries alot (select user_name from users where user_status = ....) and it therefore may sound logical to add a key to this column. The opposite is true. I couldn't find a definitive answer but the internet agrees that a key on a column with a selectivity lower then 25-30 % is useless, or even worse slowing the queries. For a column with a selectivity below those percentages a full table scan would be faster and more efficient than using keyed columns. This sounded so contra-intuitive I had to try.

Expirementing

For this experiment I used a table with users on one of our live implementations. It held 65400 at the moment of querying and contains a column with a type_id which had only 4 different possible options. The type_id has a key on it as well. A simple select count(type_id) from users took 0.06 seconds. When I performed the same query with ignore key(type_id) added at the end it was twice at fast, only 0.03 seconds this time. So when I explicitly instruct MySQL to ignore the key, the query is twice as fast!

The above seems to insinuate MySQL's query optimizer does not take selectivity into account when calculating the fastes query. The internet however seems to think otherwise. I don't know what is really happening, I just know I will think twice before adding a key to a column the next time the problem arises.

Share this post!

Comments

Leave a comment!

Italic and bold

*This is italic*, and _so is this_.
**This is bold**, and __so is this__.

Links

This is a link to [Procurios](http://www.procurios.nl).

Lists

A bulleted list can be made with:
- Minus-signs,
+ Add-signs,
* Or an asterisk.

A numbered list can be made with:
1. List item number 1.
2. List item number 2.

Quote

The text below creates a quote:
> This is the first line.
> This is the second line.

Code

A text block with code can be created. Prefix a line with four spaces and a code-block will be made.