MySQL REGEXP Guide: Basics, Advanced Uses & Functions

目次

1. Introduction

What is MySQL regular expression (REGEXP)?

MySQL regular expressions are a powerful tool for flexibly searching and manipulating strings within a database. They enable pattern matching that is difficult with ordinary string searches, allowing you to extract data that matches specific formats or conditions. For example, you can easily extract names that start with a particular character or codes that contain only numbers. This feature is especially useful in data cleansing and scenarios that involve complex search criteria.

Benefits of using regular expressions in MySQL

  1. Handling complex search conditions
  • You can specify complex string patterns that the standard LIKE operator cannot handle.
  1. Bulk data replacement and extraction possible
  • For example you can extract only data that follows a specific format or replace parts of a string.
  1. Feature enhancements in MySQL 8.0 and later
  • New functions (such as REGEXP_LIKE, REGEXP_SUBSTR, etc.) have been added, enabling more flexible operations.

Purpose of this article

This article provides a detailed explanation of MySQL regular expressions (REGEXP), covering basic usage, advanced examples, and cautions. It offers practical information useful for everyone from beginners to semi‑professionals, so please read through to the end. The next section will explain the fundamentals of regular expressions in MySQL in detail.

2. Basics of Regular Expressions in MySQL

What is the REGEXP operator?

In MySQL, the REGEXP operator is used to work with regular expressions. This operator is used to determine whether a given pattern matches. Also, RLIKE functions as an alias for REGEXP. The following example is a query that checks whether a string matches the pattern “abc”.
SELECT * FROM users WHERE name REGEXP 'abc';

Basic Syntax of the REGEXP Operator

The basic syntax for searches using regular expressions is as follows.
SELECT * FROM table_name WHERE column_name REGEXP 'pattern';

Common REGEXP Patterns

SymbolDescriptionExample
^Matches start of line^abc → strings that start with “abc”
$Matches end of lineabc$ → strings that end with “abc”
.Matches any single charactera.c → matches “abc”, “adc”, etc.
|OR (matches either)abc|xyz → matches “abc” or “xyz”
[]Matches any one of the specified characters[abc] → matches “a”, “b”, or “c”
*Matches zero or more repetitionsab*c → matches “ac”, “abc”, “abbc”, etc.

Differences Between REGEXP and LIKE

FeatureLIKEREGEXP
FlexibilityOnly wildcards (% and _)Supports advanced pattern matching
PerformanceFastMay be slightly slower when patterns are complex

Practical Example: Searches Using REGEXP

Example 1: Search for email address format
SELECT * FROM users WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$';
Example 2: Search for fields containing only numbers
SELECT * FROM orders WHERE order_id REGEXP '^[0-9]+$';

Summary

This section explained the basic usage and patterns of the REGEXP operator in MySQL. With this knowledge, you can perform a wide range of data operations, from simple searches to complex pattern matching.

3. Regular Expression Functions Added in MySQL 8.0

REGEXP_LIKE() – Regular Expression Match Test

REGEXP_LIKE(string, pattern [, flags])
Example:
SELECT REGEXP_LIKE('abcdef', 'abc');
Result: 1 (matches)

REGEXP_INSTR() – Locate Position of Match

REGEXP_INSTR(string, pattern [, start_position, occurrence, flags, return_type])
Example:
SELECT REGEXP_INSTR('abcdef', 'cd');
Result: 3

REGEXP_SUBSTR() – Extract Matching Substring

REGEXP_SUBSTR(string, pattern [, start_position, occurrence, flags])
Example:
SELECT REGEXP_SUBSTR('abc123def', '[0-9]+');
Result: 123

REGEXP_REPLACE() – Replacement Using Regular Expressions

REGEXP_REPLACE(string, pattern, replacement_string [, start_position, occurrence, flags])
Example:
SELECT REGEXP_REPLACE('Item123Price456', '[0-9]', '#');
Result: Item###Price###

Summary

The regular expression functions added in MySQL 8.0 enable detailed and flexible string manipulation. By leveraging them, efficient data extraction and processing become possible.

4. Practical Examples of Regular Expressions

Searching for Data Matching Specific Patterns

Example 1: Detecting Email Address Format
SELECT * FROM users WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$';
Example 2: Detecting Phone Number Format
SELECT * FROM contacts WHERE phone REGEXP '^[0-9]{3}-[0-9]{4}-[0-9]{4}$';

Extracting Substrings

Example 1: Extracting Numeric Parts
SELECT REGEXP_SUBSTR('abc123def456', '[0-9]+');
Result: 123

Data Replacement

Example 1: Replacing Digits with “#”
SELECT REGEXP_REPLACE('Item123Price456', '[0-9]', '#');
Result: Item###Price###

Data Validation and Cleaning

Example 1: Detecting Invalid Email Addresses
SELECT * FROM users WHERE email NOT REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$';

Summary

Using these examples, you can efficiently perform various tasks such as searching, extracting, replacing, and validating data.

5. Considerations and Best Practices

Handling Multibyte (Full-width) Characters

MySQL’s regular expressions are evaluated byte-by-byte by default, so you need to be careful when dealing with multibyte characters such as Japanese. Solution:
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Impact on Performance

Issue: Regular expressions involve complex processing, so searching large datasets can cause performance slowdowns. Solution:
SELECT * FROM users WHERE email LIKE '%@example.com' AND email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$';

Mitigating ReDoS (Regular Expression Denial of Service) Attacks

Issue: Malicious patterns can impose excessive load. Solution:
  1. Use simple patterns.
  2. Strengthen input validation.
  3. Monitor query execution time.

Checking Version Compatibility

New functions are unavailable in MySQL versions below 8.0. You need to verify the version according to your environment.

Verification in a Test Environment

Test the query behavior and performance in advance, including handling of edge cases.

Summary

Keep in mind the key points for using regular expressions safely and efficiently while considering performance and security.

6. Summary

Recap of Article Points

  1. By learning basic operations and how to use regular expression patterns, you can handle everything from simple searches to complex extractions.
  2. The regular expression functions added in MySQL 8.0 enable even more flexible operations.
  3. Leveraging practical examples streamlines concrete data manipulation.
  4. By covering cautions and best practices, you can achieve safe and high‑performance queries.

Benefits of Using MySQL Regular Expressions

  • Support for advanced search criteria: Conditions that are difficult with simple string searches can be set easily.
  • Streamlined data processing: Extraction, replacement, and validation can be done entirely within SQL.
  • Broad applicability: Suitable for everything from data cleansing to log analysis.

Future Learning and Application

  1. Deepen understanding by testing queries with real data.
  2. Proactively use the latest version features to optimize performance.
  3. Regularly review queries to maintain safety and speed.

Finally

Leverage your knowledge of MySQL regular expressions to improve operational efficiency and enhance data analysis capabilities.

7. Frequently Asked Questions (FAQ)

Q1. What is the difference between MySQL’s REGEXP and LIKE?

A. REGEXP allows advanced pattern matching, while LIKE is for partial match searches.
SELECT * FROM users WHERE email LIKE '%example.com';
SELECT * FROM users WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$';

Q2. How can performance be improved?

A.
  1. Apply filter conditions in advance.
  2. Leverage indexes.
  3. Simplify the query.

Q3. How to handle multibyte characters?

A. Set up UTF-8 support.
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Q4. Example of replacement using regular expressions?

A. Replace digits with “#”.
SELECT REGEXP_REPLACE('Item123Price456', '[0-9]', '#');

Q5. Query to convert date format?

A. Change “YYYY/MM/DD” to “YYYY-MM-DD”.
SELECT REGEXP_REPLACE('2023/12/20', '/', '-');

Q6. How to specify multiple conditions with REGEXP?

A. Use the pipe (|) character.
SELECT * FROM products WHERE name REGEXP 'phone|tablet';

Summary

The FAQ section addresses common questions and provides practical query examples.