More Case Insensitive Data

Published: 05 July 2016
on channel: PG Casts by Hashrocket

471

In this episode, we'll be looking at a unique index we can use when working with case-insensitive data. This episode is brought to you by Hashrocket, a consultancy with expertise in PostgreSQL - visit us at https://hashrocket.com and www.pgcasts.com

Transcript:

In a previous episode, we explored some considerations when querying data
that is considered to be case-insensitive.

We started with a `users` table that looked like this:

```sql
\d old.users
```

We have a unique index on the email field, but because we expect most
queries to use the `lower()` function on email, we decided to add another
index. We can see that on our current table:

```sql
\d users
```

With this table, we can do indexed case-insensitive queries.

Both of these tables have a another major issue though. They will both allow
us to insert records with duplicate emails by using different casing for those emails.

To see what this looks like, let's look at the first record in both tables:

```sql
select * from old.users, users limit 1;
```

Now let's insert some records that should violate the uniqueness that we'd
like our table to be enforcing:

```sql
insert into old.users (email) values ('[email protected]');
```

```sql
insert into users (email) values ('[email protected]');
```

Yikes. We are not sufficiently enforcing uniqueness here. We need a better
index. Let's focus on the `users` table ignoring the `old.users` table.

If we take another look at the description of the table, we may notice the
problem.

```sql
\d users
```

The `users_lower_email_idx` is not a unique index. If we update it to
enforce uniqueness on the lowercase version of the email column, then we can
be sure we won't end up with any duplicate records.

So, what if we just try adding another index with this additional
constraint?

```sql
create unique index users_unique_lower_email_idx on users (lower(email));
-- fails
```

Oh right, we need to clear out any duplicates first which means we need to
remove that record we inserted earlier.

Also, we should replace the existing index to reduce the overhead on our
table. To do this, we'll want to start a transaction to ensure data
consistency as the indexes are being swapped.

```
begin;

delete from users where email = '[email protected]';

drop index users_lower_email_idx;

create unique index users_unique_lower_email_idx on users (lower(email));

commit;
```

We can see our new index in action by trying what is now an invalid insert
statement:

```sql
insert into users (email) values ('[email protected]');
```

Our case-insensitive email column is now even more robust.

Until next time, thanks for watching.

Watch video More Case Insensitive Data online without registration, duration hours minute second in high quality. This video was added by user PG Casts by Hashrocket 05 July 2016, don't forget to share it with your friends and acquaintances, it has been viewed on our site 471 once and liked it 6 people.

24,230

937