MySQL is a powerful database management system for searching and manipulating data, and it allows the use of regular expressions (REGEXP) to enable more flexible searches. By leveraging regular expressions, you can efficiently extract data that matches specific patterns, as well as format and validate data.
Benefits of Using Regular Expressions in MySQL
Using MySQL’s regular expressions provides the following benefits:
Advanced searching capabilities: Easily extract data containing specific strings or patterns.
Check data integrity: Useful for validating input data (e.g., email address format checks).
More powerful than the LIKE operator: Allows flexible definition of complex search conditions compared to LIKE with wildcards (% or _).
Purpose and Structure of This Article
This article covers everything from the basics of using regular expressions in MySQL to advanced applications, performance optimization techniques, and the new regular expression functions introduced in MySQL 8.0 and later. Practical examples are included to make the content accessible even to beginners. Structure of this article:
Fundamentals and Syntax of MySQL Regular Expressions
Regular Expression Functions Added in MySQL 8.0 and Later
Practical Use Cases (Validation, Data Extraction, Format Conversion)
Considerations When Using Regular Expressions (Performance, Security)
FAQ (Frequently Asked Questions)
Conclusion
2. Basics and Syntax of Regular Expressions in MySQL
How to Use Regular Expressions in MySQL
In MySQL, you use the REGEXP operator (or RLIKE) to work with regular expressions. Like the LIKE operator, it is used to determine whether a column’s value matches a given regular expression pattern. Basic Syntax
SELECT * FROM table_name WHERE column_name REGEXP 'regex';
or
SELECT * FROM table_name WHERE column_name RLIKE 'regex';
RLIKE is an alias for REGEXP, and both behave the same. Example For instance, to search for data where the name contains the character “mountain”:
SELECT * FROM users WHERE name REGEXP 'mountain';
Basic Regular Expression Patterns
In MySQL regular expressions, the following patterns can be used.
Symbol
Meaning
Example
Result
.
Any single character
a.c
Matches “abc”, “aac”, “adc”
^
Start of string
^abc
Matches “abcde” but not “dabc”
$
End of string
xyz$
Matches “axyz” but not “xyzb”
[]
Any one of the specified characters
[abc]
Matches any of “a”, “b”, or “c”
[^]
Any single character not in the specified set
[^abc]
Matches any character except “a”, “b”, or “c”
*
Zero or more repetitions of the preceding character
a*
Matches “a”, “aa”, “aaa”, etc.
+
One or more repetitions of the preceding character
a+
Matches “a”, “aa”, “aaa”, etc. (does not match an empty string)
{n}
Exactly n repetitions of the preceding character
a{3}
Matches “aaa”
{n,}
At least n repetitions of the preceding character
a{2,}
Matches “aa”, “aaa”, “aaaa”, etc.
{n,m}
Between n and m repetitions (inclusive) of the preceding character
a{2,4}
Matches “aa”, “aaa”, “aaaa”
Differences from the LIKE Operator
MySQL also has the LIKE operator, which is commonly used for simple pattern matching, but REGEXP is more powerful than LIKE and allows searching with more complex conditions. Example 1: Using the LIKE Operator
SELECT * FROM users WHERE name LIKE '%mountain%';
With LIKE, you can search for data containing “mountain”, but only simple wildcards (% and _) can be used.
Example 2: Using the REGEXP Operator
SELECT * FROM users WHERE name REGEXP '^mountain';
Using REGEXP allows you to extract only data with names that start with “mountain”.
How to Specify Multiple Patterns
Using | lets you specify multiple patterns with an OR condition. Example: Retrieve data with the surname “Sato” or “Tanaka”
SELECT * FROM users WHERE name REGEXP 'Sato|Tanaka';
Negative Searches
Using ^ inside square brackets allows you to retrieve data that does not contain the specified characters. Example: Data containing characters other than “mountain”
SELECT * FROM users WHERE name REGEXP '^[^mountain]';
In this case, it searches for names that do not start with “mountain”.
Summary
In MySQL, using REGEXP enables more powerful pattern searching than the regular LIKE.
By understanding the basic syntax ( ., ^, $, [], *, +, etc.), flexible searches become possible.
Understanding the differences from LIKE and using them appropriately is important.
Using | enables searching for multiple patterns.
By leveraging the negative pattern [ ], you can search for data that does not contain specific characters.
3. Regular Expression Functions Added in MySQL 8.0 and Later
In MySQL 8.0, in addition to the traditional REGEXP operator, four new functions that enable more flexible regular‑expression searches were added. This allows you to retrieve match positions, extract or replace substrings, and greatly expands the range of regular‑expression use cases. This section explains each of the newly added regular‑expression functions in detail and provides concrete usage examples.
3.1 REGEXP_LIKE()
Overview
REGEXP_LIKE() determines whether the value of a specified column matches a particular regular‑expression pattern, just like the REGEXP operator.
replacement: the string to substitute for the matched portion
occurrence (optional): replace only the nth match (default replaces all matches)
flags (optional): specify i for case‑insensitive matching
Example
“Remove hyphens from phone numbers”
SELECT phone, REGEXP_REPLACE(phone, '-', '') AS clean_phone FROM users;
“Strip HTML tags from a comment field”
SELECT comment, REGEXP_REPLACE(comment, '<[^>]+>', '') AS clean_text FROM reviews;
✅ With REGEXP_REPLACE(), format conversion and data cleaning become straightforward.
3.5 Summary
Function
Description
Example
REGEXP_LIKE()
Determines whether a value matches a regular‑expression pattern
SELECT * FROM users WHERE REGEXP_LIKE(email, '@gmail.com$');
REGEXP_INSTR()
Returns the start position of the matched substring
SELECT REGEXP_INSTR(email, '@') FROM users;
REGEXP_SUBSTR()
Extracts the matched substring
SELECT REGEXP_SUBSTR(email, '@[a-zA-Z0-9.-]+') FROM users;
REGEXP_REPLACE()
Replaces the matched substring
SELECT REGEXP_REPLACE(phone, '-', '') FROM users;
4. Practical Applications of MySQL Regular Expressions
By leveraging MySQL regular expressions, you can streamline data processing in business operations, such as data validation, specific data extraction, and format conversion. This section presents real-world use cases, providing the SQL code and explanations for each example.
4.1 Email Address Validation
Overview
Use regular expressions to check whether email addresses stored in the database are in a proper format.
SQL Used
SELECT email FROM users
WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$';
Explanation
^[a-zA-Z0-9._%+-]+ → One or more alphanumeric characters, dots, underscores, or plus signs before the @
@[a-zA-Z0-9.-]+ → Domain name after the @
.[a-zA-Z]{2,}$ → Ends with a top-level domain (TLD) of at least two characters (e.g., .com, .jp, .net, etc.)
Using this query, you can exclude invalid email addresses (e.g., user@@example.com, user@.com).
4.2 Japanese Phone Number Format Check
Overview
Determine whether typical Japanese phone numbers (e.g., 090-1234-5678, 03-1234-5678) are in the correct format.
SQL Used
SELECT phone FROM users
WHERE phone REGEXP '^(0[789]0-[0-9]{4}-[0-9]{4}|0[1-9]-[0-9]{4}-[0-9]{4})$';
Explanation
0[789]0-[0-9]{4}-[0-9]{4} → Mobile numbers (e.g., 090-xxxx-xxxx, 080-xxxx-xxxx, etc.)
Thus, regular expressions are useful in scenarios where format standardization is required.
4.3 Credit Card Number Format Check
Overview
Validate the format of credit card numbers (Visa, Mastercard, American Express, etc.).
SQL Used
SELECT card_number FROM payments
WHERE card_number REGEXP '^(4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$';
Explanation
^4[0-9]{12}(?:[0-9]{3})?$ → Visa (16 or 13 digits)
^5[1-5][0-9]{14}$ → Mastercard (16 digits)
^3[47][0-9]{13}$ → American Express (15 digits)
Using this query, you can eliminate invalid card numbers (insufficient digits or incorrect leading numbers) from the database.
4.4 HTML Tag Removal
Overview
If user-submitted data contains HTML tags, they are removed and converted to plain text.
SQL Used
SELECT REGEXP_REPLACE(comment, '<[^>]+>', '') AS clean_comment FROM reviews;
Explanation
'<[^>]+>' → Removes all HTML tags between < and >
Example
Input Data
Converted Data
<b>Hello</b> World!
Hello World!
<p>This is a sample</p>
This is a sample
Effective for removing HTML tags from user comments and blog posts.
4.5 Postal Code Format Check (Japan)
Overview
Validates whether a Japanese postal code (e.g., 123-4567) is in the correct format.
SQL Used
SELECT postal_code FROM addresses
WHERE postal_code REGEXP '^[0-9]{3}-[0-9]{4}$';
Explanation
^[0-9]{3}-[0-9]{4}$ → Postal codes with a “3-digit‑4-digit” format
By using this regular expression, you can maintain data consistency and prevent incorrect formats.
4.6 Username Filtering (Prohibited Word Detection)
Overview
Checks during registration that the username does not contain specific prohibited words (e.g., admin, root, system).
SQL Used
Summary
Email address, phone number, and credit card number validation can be performed using regular expressions.
Removing HTML tags and converting data formats can also be applied.
Filtering prohibited words and checking postal codes, etc., are effective for improving database security and consistency.
5. Precautions When Using Regular Expressions (Performance & Security)
Using regular expressions in MySQL enables powerful data searching and data cleansing. However, improper use can lead to performance degradation and security risks. This section explains the precautions for using MySQL regular expressions safely and efficiently.
5.1 Regular Expression Performance Optimization
MySQL’s regular expression searches are convenient, but they have the drawback that indexes are not applied, which can cause slower processing speeds.
Measures to Improve Performance
Combine LIKE and FULLTEXT indexes
SELECT * FROM users WHERE email LIKE '%gmail.com';
Use simple regular expressions
SELECT * FROM users WHERE name REGEXP '^admin|admin$';
Filter target data beforehand
SELECT * FROM users WHERE email LIKE 'a%' AND email REGEXP '^[a-zA-Z0-9._%+-]+@gmail.com$';
5.2 Security Risks and Countermeasures
1. Regular Expression DoS Attacks (ReDoS)
Avoid unlimited matches such as .*.
Use ^ and $ to limit the search range.
Do not pass user input directly to REGEXP.
2. SQL Injection and Regular Expressions
❌ Dangerous query
SELECT * FROM users WHERE username REGEXP '$input';
✅ Safe query (using placeholders)
$stmt = $pdo->prepare("SELECT * FROM users WHERE username REGEXP ?");
$stmt->execute([$sanitized_input]);
REGEXP does not use indexes, so performance optimization is important.
To prevent ReDoS (regular expression DoS attacks), avoid excessive use of wildcards.
To prevent SQL injection, do not apply user input directly to REGEXP.
6. FAQ (Frequently Asked Questions)
We’ve compiled common questions when using MySQL regular expressions (REGEXP). Resolving these doubts will help you use regular expressions more effectively.
6.1 What regular expression patterns can be used in MySQL?
Symbol
Description
Example
Result
.
Any single character
a.c
Matches "abc", "aac", "adc"
^
Start of string
^abc
Matches "abcde" (does not match "dabc")
$
End of string
xyz$
Matches "axyz" (does not match "xyzb")
[]
Any one of the specified characters
[abc]
Matches "a", "b", "c"
[^]
Any character except the specified ones
[^abc]
Matches any character except "a", "b", "c"
*
Zero or more repetitions of the preceding character
a*
Matches "a", "aa", "aaa"
+
One or more repetitions of the preceding character
a+
Matches "a", "aa", "aaa" (does not match "")
6.2 What is the difference between LIKE and REGEXP?
Comparison Item
LIKE
REGEXP
Functionality
Simple pattern matching
Search using complex regular expressions
Wildcards
% (any string) _ (any single character)
.* (any string) ^$[a-z] and other diverse expressions are possible
Search speed
Fast because indexes can be applied
Often results in full table scans, which can be slower
6.4 How to troubleshoot when regular expressions do not work as expected?
✅ Checklist
Is the escaping done correctly?
SELECT * FROM users WHERE email REGEXP '.com$';
Try the i flag with REGEXP_LIKE()
SELECT * FROM users WHERE REGEXP_LIKE(username, 'admin', 'i');
Check the data encoding
SHOW VARIABLES LIKE 'character_set_database';
6.5 How to improve the performance of REGEXP functions?
Pre-filter using LIKE
SELECT * FROM users
WHERE email LIKE '%gmail.com'
AND email REGEXP '^[a-zA-Z0-9._%+-]+@gmail.com$';
Leverage indexes
ALTER TABLE users ADD FULLTEXT(email);
Summary
REGEXP is more powerful than LIKE, but you need to be careful about performance.
In MySQL 8.0, new regular expression functions (such as REGEXP_LIKE()) have been added, enabling more flexible processing.
Improving performance relies on combining with LIKE, leveraging indexes, and designing simple regular expressions.
7. Summary
MySQL’s regular expressions (REGEXP) are a very useful tool for data searching, validation, format conversion, and more. This article provides a detailed explanation of everything from the basics of using regular expressions in MySQL to the new features added in MySQL 8.0 and later, practical use cases, cautions, and FAQs.
7.1 Review of Key Points
A quick recap of the article’s important points.
1. Basics of MySQL Regular Expressions
REGEXP operator enables more flexible pattern matching than the standard LIKE.
Common regular expression patterns
^ (start), $ (end), . (any single character), [] (character class), + (one or more repetitions), etc.
2. New Regular Expression Functions in MySQL 8.0 and Later
MySQL 8.0 adds the following four functions, making data processing more flexible:
REGEXP_LIKE(): alternative to the REGEXP operator
REGEXP_INSTR(): returns the start position of a matching string
REGEXP_SUBSTR(): extracts the matching string
REGEXP_REPLACE(): replaces strings using regular expressions
3. Practical Use Cases
Email address validation
Phone number and credit card number checks
Removing HTML tags
Postal code format verification
Detecting prohibited words
Data cleansing (removing whitespace, commas, etc.)
4. Cautions When Using Regular Expressions
Performance optimization
REGEXP does not use indexes and results in a full table scan, so be careful.
You can improve this by combining LIKE or FULLTEXT indexes.
Security measures
To prevent ReDoS (regular expression DoS attacks), avoid excessive use of .*.
Use placeholder to prevent SQL injection.
5. FAQ (Frequently Asked Questions)
Patterns usable with MySQL regular expressions
Differences between LIKE and REGEXP
Feature differences between MySQL 5.x and 8.x
Troubleshooting when regular expressions don’t work as expected
Ways to improve performance
7.2 Best Practices for Using Regular Expressions
Keep regular expressions simple
Example: avoid overusing .* and leverage ^ (start) and $ (end).
Combine with LIKE or FULLTEXT
Pre-filter with LIKE to narrow down the target before applying REGEXP, reducing processing load.
Use REGEXP_REPLACE() for data cleansing
Example: remove HTML tags, clean up unnecessary spaces.
Leverage MySQL 8.0 and later
Using new functions like REGEXP_LIKE() lets you write more readable SQL.
Enforce security measures
Do not apply user input directly to REGEXP (SQL injection protection).
Use placeholder for dynamic queries.
7.3 Resources for Further Learning
Here are some reference materials for deeper learning of MySQL regular expressions.
MySQL regular expressions can be used in many scenarios such as data searching, cleaning, and validation. However, it is important to use them properly with an understanding of performance and security considerations. We hope this article serves as a useful reference for leveraging MySQL regular expressions. Feel free to apply them in your real projects!