T-SQL Pattern Matching

We were recently migrating some user records over to a new system and found that the older system had allowed a number of rogue characters into the username field (auto-generated based on firstname/surnname). So I needed to query the old SQL Server 2000 database to find any usernames that wouldn’t be accepted by the new system. At first it seemed that a huge list of LIKE tests would be required, but ideally what I needed was a way to use a simple regular expression to match any username that didn’t contain acceptable characters. The T-SQL PATINDEX allows you to do simple expression matching so I was able to pull out the records quite easily.

SELECT * FROM [usertable]
WHERE PATINDEX('%[^- a-zA-Z0-9]%', username) > 0

Bascially the pattern starts and ends with a wildcard % and then contains a range of acceptable characters, which is negated with the ^ operator, thus ensuring it only returns usernames that do not contain the acceptable characters. Since PATINDEX returns the starting position of the first match, the statement is looking for results where PATINDEX returns > 0.

This statement could also be written using basic LIKE syntax but the negation makes it confusing to read.

SELECT * FROM [usertable]
WHERE username LIKE ('%[^- a-zA-Z0-9]%')

See the PATINDEX documentation over on MSDN

Leave a Reply

Your email address will not be published. Required fields are marked *