MySQL Regex: Complete Guide from Basics to Performance

目次

1. Introduction

Using Regular Expressions in MySQL

MySQL is a powerful database management system for searching and manipulating data, and it allows the use of regular expressions (REGEXP) to enable more flexible searches. By leveraging regular expressions, you can efficiently extract data that matches specific patterns, as well as format and validate data.

Benefits of Using Regular Expressions in MySQL

Using MySQL’s regular expressions provides the following benefits:
  • Advanced searching capabilities: Easily extract data containing specific strings or patterns.
  • Check data integrity: Useful for validating input data (e.g., email address format checks).
  • More powerful than the LIKE operator: Allows flexible definition of complex search conditions compared to LIKE with wildcards (% or _).

Purpose and Structure of This Article

This article covers everything from the basics of using regular expressions in MySQL to advanced applications, performance optimization techniques, and the new regular expression functions introduced in MySQL 8.0 and later. Practical examples are included to make the content accessible even to beginners. Structure of this article:
  1. Fundamentals and Syntax of MySQL Regular Expressions
  2. Regular Expression Functions Added in MySQL 8.0 and Later
  3. Practical Use Cases (Validation, Data Extraction, Format Conversion)
  4. Considerations When Using Regular Expressions (Performance, Security)
  5. FAQ (Frequently Asked Questions)
  6. Conclusion

2. Basics and Syntax of Regular Expressions in MySQL

How to Use Regular Expressions in MySQL

In MySQL, you use the REGEXP operator (or RLIKE) to work with regular expressions. Like the LIKE operator, it is used to determine whether a column’s value matches a given regular expression pattern. Basic Syntax
SELECT * FROM table_name WHERE column_name REGEXP 'regex';
or
SELECT * FROM table_name WHERE column_name RLIKE 'regex';
RLIKE is an alias for REGEXP, and both behave the same. Example For instance, to search for data where the name contains the character “mountain”:
SELECT * FROM users WHERE name REGEXP 'mountain';

Basic Regular Expression Patterns

In MySQL regular expressions, the following patterns can be used.
SymbolMeaningExampleResult
.Any single charactera.cMatches “abc”, “aac”, “adc”
^Start of string^abcMatches “abcde” but not “dabc”
$End of stringxyz$Matches “axyz” but not “xyzb”
[]Any one of the specified characters[abc]Matches any of “a”, “b”, or “c”
[^]Any single character not in the specified set[^abc]Matches any character except “a”, “b”, or “c”
*Zero or more repetitions of the preceding charactera*Matches “a”, “aa”, “aaa”, etc.
+One or more repetitions of the preceding charactera+Matches “a”, “aa”, “aaa”, etc. (does not match an empty string)
{n}Exactly n repetitions of the preceding charactera{3}Matches “aaa”
{n,}At least n repetitions of the preceding charactera{2,}Matches “aa”, “aaa”, “aaaa”, etc.
{n,m}Between n and m repetitions (inclusive) of the preceding charactera{2,4}Matches “aa”, “aaa”, “aaaa”

Differences from the LIKE Operator

MySQL also has the LIKE operator, which is commonly used for simple pattern matching, but REGEXP is more powerful than LIKE and allows searching with more complex conditions. Example 1: Using the LIKE Operator
SELECT * FROM users WHERE name LIKE '%mountain%';
  • With LIKE, you can search for data containing “mountain”, but only simple wildcards (% and _) can be used.
Example 2: Using the REGEXP Operator
SELECT * FROM users WHERE name REGEXP '^mountain';
  • Using REGEXP allows you to extract only data with names that start with “mountain”.

How to Specify Multiple Patterns

Using | lets you specify multiple patterns with an OR condition. Example: Retrieve data with the surname “Sato” or “Tanaka”
SELECT * FROM users WHERE name REGEXP 'Sato|Tanaka';

Negative Searches

Using ^ inside square brackets allows you to retrieve data that does not contain the specified characters. Example: Data containing characters other than “mountain”
SELECT * FROM users WHERE name REGEXP '^[^mountain]';
In this case, it searches for names that do not start with “mountain”.

Summary

  • In MySQL, using REGEXP enables more powerful pattern searching than the regular LIKE.
  • By understanding the basic syntax ( ., ^, $, [], *, +, etc.), flexible searches become possible.
  • Understanding the differences from LIKE and using them appropriately is important.
  • Using | enables searching for multiple patterns.
  • By leveraging the negative pattern [ ], you can search for data that does not contain specific characters.

3. Regular Expression Functions Added in MySQL 8.0 and Later

In MySQL 8.0, in addition to the traditional REGEXP operator, four new functions that enable more flexible regular‑expression searches were added. This allows you to retrieve match positions, extract or replace substrings, and greatly expands the range of regular‑expression use cases. This section explains each of the newly added regular‑expression functions in detail and provides concrete usage examples.

3.1 REGEXP_LIKE()

Overview

REGEXP_LIKE() determines whether the value of a specified column matches a particular regular‑expression pattern, just like the REGEXP operator.

Syntax

REGEXP_LIKE(column_name, 'regex_pattern' [, flags])
  • column_name: the column to search
  • 'regex_pattern': the regular expression to use as the search condition
  • flags (optional): set case‑sensitivity (specify i for case‑insensitive matching)

Example

“Retrieve users whose email ends with gmail.com or yahoo.co.jp
SELECT * FROM users WHERE REGEXP_LIKE(email, '(@gmail.com|@yahoo.co.jp)$');
“Search for usernames that contain ‘admin’ regardless of case”
SELECT * FROM users WHERE REGEXP_LIKE(username, 'admin', 'i');
REGEXP_LIKE() serves the same purpose as the REGEXP operator but offers a syntax that aligns better with standard SQL.

3.2 REGEXP_INSTR()

Overview

REGEXP_INSTR() returns the position (starting index) of the first match of a regular expression within a given string.

Syntax

REGEXP_INSTR(column_name, 'regex_pattern' [, start_position, occurrence, return_option, flags])
  • start_position (optional): where to begin the search (default is 1)
  • occurrence (optional): which match to return (default is 1)
  • return_option (optional): 0 (return start position) or 1 (return position after the matched substring)
  • flags (optional): specify i for case‑insensitive matching

Example

“Get the start position of phone numbers that begin with 090 or 080
SELECT phone, REGEXP_INSTR(phone, '^(090|080)') AS match_pos FROM users;
“Find the start position of the domain part in an email address”
SELECT email, REGEXP_INSTR(email, '@') AS domain_start FROM users;
  • The result is the position where @ is found (e.g., 5 for user@example.com).
✅ Using REGEXP_INSTR() lets you obtain positional information for specific patterns, expanding your string‑of‑text processing capabilities.

3.3 REGEXP_SUBSTR()

Overview

REGEXP_SUBSTR() extracts the substring that matches a regular‑expression pattern from a larger string.

Syntax

REGEXP_SUBSTR(column_name, 'regex_pattern' [, start_position, occurrence, flags])
  • occurrence (optional): which match to return (default is 1)
  • flags (optional): case‑sensitivity (ignore with i)

Example

“Extract only the domain part of an email address”
SELECT email, REGEXP_SUBSTR(email, '@[a-zA-Z0-9.-]+') AS domain FROM users;
“Get the first numeric sequence in a text string”
SELECT message, REGEXP_SUBSTR(message, '[0-1-9]+') AS first_number FROM logs;
REGEXP_SUBSTR() enables you to pull out specific patterns for data cleaning, transformation, or analysis.

3.4 REGEXP_REPLACE()

Overview

REGEXP_REPLACE() replaces parts of a string that match a regular expression with another string.

Syntax

REGEXP_REPLACE(column_name, 'regex_pattern', 'replacement' [, occurrence, flags])
  • replacement: the string to substitute for the matched portion
  • occurrence (optional): replace only the nth match (default replaces all matches)
  • flags (optional): specify i for case‑insensitive matching

Example

“Remove hyphens from phone numbers”
SELECT phone, REGEXP_REPLACE(phone, '-', '') AS clean_phone FROM users;
“Strip HTML tags from a comment field”
SELECT comment, REGEXP_REPLACE(comment, '<[^>]+>', '') AS clean_text FROM reviews;
✅ With REGEXP_REPLACE(), format conversion and data cleaning become straightforward.

3.5 Summary

FunctionDescriptionExample
REGEXP_LIKE()Determines whether a value matches a regular‑expression patternSELECT * FROM users WHERE REGEXP_LIKE(email, '@gmail.com$');
REGEXP_INSTR()Returns the start position of the matched substringSELECT REGEXP_INSTR(email, '@') FROM users;
REGEXP_SUBSTR()Extracts the matched substringSELECT REGEXP_SUBSTR(email, '@[a-zA-Z0-9.-]+') FROM users;
REGEXP_REPLACE()Replaces the matched substringSELECT REGEXP_REPLACE(phone, '-', '') FROM users;

4. Practical Applications of MySQL Regular Expressions

By leveraging MySQL regular expressions, you can streamline data processing in business operations, such as data validation, specific data extraction, and format conversion. This section presents real-world use cases, providing the SQL code and explanations for each example.

4.1 Email Address Validation

Overview

Use regular expressions to check whether email addresses stored in the database are in a proper format.

SQL Used

SELECT email FROM users 
WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$';

Explanation

  • ^[a-zA-Z0-9._%+-]+ → One or more alphanumeric characters, dots, underscores, or plus signs before the @
  • @[a-zA-Z0-9.-]+ → Domain name after the @
  • .[a-zA-Z]{2,}$ → Ends with a top-level domain (TLD) of at least two characters (e.g., .com, .jp, .net, etc.)
Using this query, you can exclude invalid email addresses (e.g., user@@example.com, user@.com).

4.2 Japanese Phone Number Format Check

Overview

Determine whether typical Japanese phone numbers (e.g., 090-1234-5678, 03-1234-5678) are in the correct format.

SQL Used

SELECT phone FROM users 
WHERE phone REGEXP '^(0[789]0-[0-9]{4}-[0-9]{4}|0[1-9]-[0-9]{4}-[0-9]{4})$';

Explanation

  • 0[789]0-[0-9]{4}-[0-9]{4} → Mobile numbers (e.g., 090-xxxx-xxxx, 080-xxxx-xxxx, etc.)
  • 0[1-9]-[0-9]{4}-[0-9]{4} → Landline numbers (e.g., 03-xxxx-xxxx, 06-xxxx-xxxx, etc.)
Thus, regular expressions are useful in scenarios where format standardization is required.

4.3 Credit Card Number Format Check

Overview

Validate the format of credit card numbers (Visa, Mastercard, American Express, etc.).

SQL Used

SELECT card_number FROM payments 
WHERE card_number REGEXP '^(4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$';

Explanation

  • ^4[0-9]{12}(?:[0-9]{3})?$ → Visa (16 or 13 digits)
  • ^5[1-5][0-9]{14}$ → Mastercard (16 digits)
  • ^3[47][0-9]{13}$ → American Express (15 digits)
Using this query, you can eliminate invalid card numbers (insufficient digits or incorrect leading numbers) from the database.

4.4 HTML Tag Removal

Overview

If user-submitted data contains HTML tags, they are removed and converted to plain text.

SQL Used

SELECT REGEXP_REPLACE(comment, '<[^>]+>', '') AS clean_comment FROM reviews;

Explanation

  • '<[^>]+>' → Removes all HTML tags between < and >
Example
Input DataConverted Data
<b>Hello</b> World!Hello World!
<p>This is a sample</p>This is a sample
Effective for removing HTML tags from user comments and blog posts.

4.5 Postal Code Format Check (Japan)

Overview

Validates whether a Japanese postal code (e.g., 123-4567) is in the correct format.

SQL Used

SELECT postal_code FROM addresses 
WHERE postal_code REGEXP '^[0-9]{3}-[0-9]{4}$';

Explanation

  • ^[0-9]{3}-[0-9]{4}$Postal codes with a “3-digit‑4-digit” format
By using this regular expression, you can maintain data consistency and prevent incorrect formats.

4.6 Username Filtering (Prohibited Word Detection)

Overview

Checks during registration that the username does not contain specific prohibited words (e.g., admin, root, system).

SQL Used

Summary

  • Email address, phone number, and credit card number validation can be performed using regular expressions.
  • Removing HTML tags and converting data formats can also be applied.
  • Filtering prohibited words and checking postal codes, etc., are effective for improving database security and consistency.

5. Precautions When Using Regular Expressions (Performance & Security)

Using regular expressions in MySQL enables powerful data searching and data cleansing. However, improper use can lead to performance degradation and security risks. This section explains the precautions for using MySQL regular expressions safely and efficiently.

5.1 Regular Expression Performance Optimization

MySQL’s regular expression searches are convenient, but they have the drawback that indexes are not applied, which can cause slower processing speeds.

Measures to Improve Performance

  1. Combine LIKE and FULLTEXT indexes
   SELECT * FROM users WHERE email LIKE '%gmail.com';
  1. Use simple regular expressions
   SELECT * FROM users WHERE name REGEXP '^admin|admin$';
  1. Filter target data beforehand
   SELECT * FROM users WHERE email LIKE 'a%' AND email REGEXP '^[a-zA-Z0-9._%+-]+@gmail.com$';

5.2 Security Risks and Countermeasures

1. Regular Expression DoS Attacks (ReDoS)

  • Avoid unlimited matches such as .*.
  • Use ^ and $ to limit the search range.
  • Do not pass user input directly to REGEXP.

2. SQL Injection and Regular Expressions

Dangerous query
SELECT * FROM users WHERE username REGEXP '$input';
Safe query (using placeholders)
$stmt = $pdo->prepare("SELECT * FROM users WHERE username REGEXP ?");
$stmt->execute([$sanitized_input]);

5.3 Compatibility Across Versions

VersionMain Features
MySQL 5.xREGEXP operator only available
MySQL 8.xAdded REGEXP_LIKE(), REGEXP_INSTR(), REGEXP_SUBSTR(), REGEXP_REPLACE()

Summary

  • REGEXP does not use indexes, so performance optimization is important.
  • To prevent ReDoS (regular expression DoS attacks), avoid excessive use of wildcards.
  • To prevent SQL injection, do not apply user input directly to REGEXP.

6. FAQ (Frequently Asked Questions)

We’ve compiled common questions when using MySQL regular expressions (REGEXP). Resolving these doubts will help you use regular expressions more effectively.

6.1 What regular expression patterns can be used in MySQL?

SymbolDescriptionExampleResult
.Any single charactera.cMatches "abc", "aac", "adc"
^Start of string^abcMatches "abcde" (does not match "dabc")
$End of stringxyz$Matches "axyz" (does not match "xyzb")
[]Any one of the specified characters[abc]Matches "a", "b", "c"
[^]Any character except the specified ones[^abc]Matches any character except "a", "b", "c"
*Zero or more repetitions of the preceding charactera*Matches "a", "aa", "aaa"
+One or more repetitions of the preceding charactera+Matches "a", "aa", "aaa" (does not match "")

6.2 What is the difference between LIKE and REGEXP?

Comparison ItemLIKEREGEXP
FunctionalitySimple pattern matchingSearch using complex regular expressions
Wildcards% (any string) _ (any single character).* (any string) ^ $ [a-z] and other diverse expressions are possible
Search speedFast because indexes can be appliedOften results in full table scans, which can be slower
Use casesSimple searches (partial match, prefix match, etc.)Complex searches based on specific patterns
Because LIKE is faster, prioritize LIKE for simple searches, and use REGEXP when complex searches are needed.

6.3 What are the differences in regular expressions between MySQL 5.x and MySQL 8.x?

VersionKey Features
MySQL 5.xREGEXP operator only available
MySQL 8.xAdded REGEXP_LIKE(), REGEXP_INSTR(), REGEXP_SUBSTR(), REGEXP_REPLACE()

6.4 How to troubleshoot when regular expressions do not work as expected?

Checklist
  1. Is the escaping done correctly?
   SELECT * FROM users WHERE email REGEXP '.com$';
  1. Try the i flag with REGEXP_LIKE()
   SELECT * FROM users WHERE REGEXP_LIKE(username, 'admin', 'i');
  1. Check the data encoding
   SHOW VARIABLES LIKE 'character_set_database';

6.5 How to improve the performance of REGEXP functions?

  1. Pre-filter using LIKE
   SELECT * FROM users 
   WHERE email LIKE '%gmail.com' 
   AND email REGEXP '^[a-zA-Z0-9._%+-]+@gmail.com$';
  1. Leverage indexes
   ALTER TABLE users ADD FULLTEXT(email);

Summary

  • REGEXP is more powerful than LIKE, but you need to be careful about performance.
  • In MySQL 8.0, new regular expression functions (such as REGEXP_LIKE()) have been added, enabling more flexible processing.
  • Improving performance relies on combining with LIKE, leveraging indexes, and designing simple regular expressions.

7. Summary

MySQL’s regular expressions (REGEXP) are a very useful tool for data searching, validation, format conversion, and more. This article provides a detailed explanation of everything from the basics of using regular expressions in MySQL to the new features added in MySQL 8.0 and later, practical use cases, cautions, and FAQs.

7.1 Review of Key Points

A quick recap of the article’s important points.

1. Basics of MySQL Regular Expressions

  • REGEXP operator enables more flexible pattern matching than the standard LIKE.
  • Common regular expression patterns
  • ^ (start), $ (end), . (any single character), [] (character class), + (one or more repetitions), etc.

2. New Regular Expression Functions in MySQL 8.0 and Later

MySQL 8.0 adds the following four functions, making data processing more flexible:
  • REGEXP_LIKE(): alternative to the REGEXP operator
  • REGEXP_INSTR(): returns the start position of a matching string
  • REGEXP_SUBSTR(): extracts the matching string
  • REGEXP_REPLACE(): replaces strings using regular expressions

3. Practical Use Cases

  • Email address validation
  • Phone number and credit card number checks
  • Removing HTML tags
  • Postal code format verification
  • Detecting prohibited words
  • Data cleansing (removing whitespace, commas, etc.)

4. Cautions When Using Regular Expressions

  • Performance optimization
  • REGEXP does not use indexes and results in a full table scan, so be careful.
  • You can improve this by combining LIKE or FULLTEXT indexes.
  • Security measures
  • To prevent ReDoS (regular expression DoS attacks), avoid excessive use of .*.
  • Use placeholder to prevent SQL injection.

5. FAQ (Frequently Asked Questions)

  • Patterns usable with MySQL regular expressions
  • Differences between LIKE and REGEXP
  • Feature differences between MySQL 5.x and 8.x
  • Troubleshooting when regular expressions don’t work as expected
  • Ways to improve performance

7.2 Best Practices for Using Regular Expressions

  1. Keep regular expressions simple
  • Example: avoid overusing .* and leverage ^ (start) and $ (end).
  1. Combine with LIKE or FULLTEXT
  • Pre-filter with LIKE to narrow down the target before applying REGEXP, reducing processing load.
  1. Use REGEXP_REPLACE() for data cleansing
  • Example: remove HTML tags, clean up unnecessary spaces.
  1. Leverage MySQL 8.0 and later
  • Using new functions like REGEXP_LIKE() lets you write more readable SQL.
  1. Enforce security measures
  • Do not apply user input directly to REGEXP (SQL injection protection).
  • Use placeholder for dynamic queries.

7.3 Resources for Further Learning

Here are some reference materials for deeper learning of MySQL regular expressions.

Official Documentation

7.4 Conclusion

MySQL regular expressions can be used in many scenarios such as data searching, cleaning, and validation. However, it is important to use them properly with an understanding of performance and security considerations. We hope this article serves as a useful reference for leveraging MySQL regular expressions. Feel free to apply them in your real projects!