In MySQL version 8.0, many new features have been added, and among them, the support for window functions stands out. This article focuses on the frequently used ROW_NUMBER() function. ROW_NUMBER() function provides powerful capabilities for data analysis and reporting, making it easy to sort and rank data based on specific criteria. This article explains the basic usage of this function, advanced examples, and alternative methods for earlier versions.
Target Audience
Beginners to intermediate users with basic SQL knowledge
Engineers and data analysts who use MySQL for data processing and analysis
Those considering migrating to the latest MySQL version
Benefits of the ROW_NUMBER() Function
This function can assign a unique number to rows based on specific conditions. This enables tasks such as “creating a ranking of highest sales” or “extracting and organizing duplicate data” to be expressed concisely. In earlier versions, complex queries using user variables were required, but using the ROW_NUMBER() function improves code simplicity and readability. In this article, we provide concrete query examples and explanations that are easy for beginners to understand. In the next section, we will examine the function’s basic syntax and behavior in detail.
2. What is the ROW_NUMBER function
The ROW_NUMBER() function, newly added in MySQL 8.0, is a type of window function that assigns a sequential number to each row of data. It provides the ability to number rows according to a specific order or group, which is extremely useful for data analysis and report generation. Here, we will explain it in detail, including its basic syntax and concrete examples.
Basic Syntax of the ROW_NUMBER Function
First, the basic format of the ROW_NUMBER() function is as follows.
SELECT
column_name,
ROW_NUMBER() OVER (PARTITION BY group_column ORDER BY order_column) AS row_number
FROM
table_name;
Meaning of Each Element
ROW_NUMBER(): A function that assigns a sequential number to each row.
OVER: The keyword used to define a window function.
PARTITION BY: Groups the data by the specified column. It is optional; if omitted, the sequential numbers are applied to all rows.
ORDER BY: Specifies the order in which numbers are assigned. This sets the sorting criteria.
Basic Example
For example, suppose there is a table named “sales” with the following data.
employee
department
sale
A
Sales
500
B
Sales
800
C
Development
600
D
Development
700
To assign sequential numbers within each department based on descending sales amounts, use the following query.
SELECT
employee,
department,
sale,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS row_num
FROM
sales;
Result
employee
department
sale
row_num
B
Sales
800
1
A
Sales
500
2
D
Development
700
1
C
Development
600
2
From this result, we can see that the ranking by sales for each department is displayed.
How to Use PARTITION BY
In the example above, the data is grouped by the “department” column, which results in separate sequential numbers for each department.
If you omit PARTITION BY, a single sequence is assigned across all rows.
SELECT
employee,
sale,
ROW_NUMBER() OVER (ORDER BY sale DESC) AS row_num
FROM
sales;
Result
employee
sale
row_num
B
800
1
D
700
2
C
600
3
A
500
4
Features and Considerations of the ROW_NUMBER() Function
Unique numbering: Even with identical values, the sequential numbers remain unique.
Handling of NULL values: When NULLs are present in ORDER BY, they appear first in ascending order and last in descending order.
Performance impact: In large datasets, ORDER BY can be costly, so setting appropriate indexes is important.
3. Practical Use Cases
Here we introduce concrete scenarios that use MySQL’s ROW_NUMBER() function. This function can be applied in many practical cases such as data ranking and handling duplicate data.
3-1. Ranking Within Groups
For example, consider a case where you want to assign ranks to sales data in descending order of revenue for each department. Below is an example dataset.
employee
department
sale
A
Sales
500
B
Sales
800
C
Development
600
D
Development
700
Query Example: Sales Ranking by Department
SELECT
employee,
department,
sale,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
sales;
Result:
employee
department
sale
rank
B
Sales
800
1
A
Sales
500
2
D
Development
700
1
C
Development
600
2
In this way, sequential numbers are assigned in sales order for each department, making ranking easy.
3-2. Extracting Top N Records
Next, let’s look at a case where you want to extract the top 3 sales employees for each department. Query Example: Query to Extract Top N Records
WITH RankedSales AS (
SELECT
employee,
department,
sale,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
sales
)
SELECT
employee,
department,
sale
FROM
RankedSales
WHERE
rank <= 3;
Result:
employee
department
sale
B
Sales
800
A
Sales
500
D
Development
700
C
Development
600
In this example, only the top 3 sales data for each department are retrieved. Thus, the ROW_NUMBER() function is suitable not only for ranking but also for filtering top records.
3-3. Extracting and Deleting Duplicate Data
Databases can contain duplicate data. In such cases, you can also handle them easily using the ROW_NUMBER() function. Query Example: Extract Duplicate Data
SELECT *
FROM (
SELECT
employee,
sale,
ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sale DESC) AS rank
FROM
sales
) tmp
WHERE rank > 1;
This query detects duplicates when multiple records exist for the same employee name. Query Example: Delete Duplicate Data
DELETE FROM sales
WHERE id IN (
SELECT id
FROM (
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sale DESC) AS rank
FROM
sales
) tmp
WHERE rank > 1
);
Summary
ROW_NUMBER() function is,
Ranking within groups
Extracting top N records
Detecting and deleting duplicate data
It works in various scenarios such as these. This enables complex data processing and analysis to be performed easily and efficiently.
4. Comparison with Other Window Functions
MySQL 8.0 provides, in addition to ROW_NUMBER(), window functions such as RANK() and DENSE_RANK() for ranking and position calculations. While these functions serve similar purposes, they differ in behavior and results. Here we compare each function and explain appropriate use cases.
4-1. RANK() Function
RANK() function assigns rankings, but its characteristic is that it gives the same rank to identical values and skips the next rank.
Basic Syntax
SELECT
column_name,
RANK() OVER (PARTITION BY group_column ORDER BY sort_column) AS rank
FROM
table_name;
Example
We use the following data to calculate sales rankings.
employee
department
sale
A
Sales
800
B
Sales
800
C
Sales
600
D
Sales
500
Query example: Using RANK()
SELECT
employee,
sale,
RANK() OVER (ORDER BY sale DESC) AS rank
FROM
sales;
Result:
employee
sale
rank
A
800
1
B
800
1
C
600
3
D
500
4
Features:
A and B, which have the same sales amount (800), are treated as rank 1.
The next rank, 2, is skipped, so C becomes rank 3.
4-2. DENSE_RANK() Function
DENSE_RANK() function also assigns the same rank to identical values like RANK(), but it does not skip the next rank.
Basic Syntax
SELECT
column_name,
DENSE_RANK() OVER (PARTITION BY group_column ORDER BY sort_column) AS rank
FROM
table_name;
Example
We use the same data as before to try the DENSE_RANK() function. Query example: Using DENSE_RANK()
SELECT
employee,
sale,
DENSE_RANK() OVER (ORDER BY sale DESC) AS dense_rank
FROM
sales;
Result:
employee
sale
dense_rank
A
800
1
B
800
1
C
600
2
D
500
3
Features:
A and B, with the same sales amount (800), are treated as rank 1.
Unlike RANK(), the next rank starts at 2, preserving rank continuity.
4-3. Difference from ROW_NUMBER() Function
ROW_NUMBER() function differs from the other two functions in that it assigns a unique number even when values are the same.
Example
SELECT
employee,
sale,
ROW_NUMBER() OVER (ORDER BY sale DESC) AS row_num
FROM
sales;
Result:
employee
sale
row_num
A
800
1
B
800
2
C
600
3
D
500
4
Features:
Even when values are identical, a unique number is assigned, so there are no duplicate ranks.
It is suitable when you need strict control over data order or require row-level uniqueness.
4-4. Summary of Use Cases for Each Function
Function
Ranking Behavior
Use Case
ROW_NUMBER()
Assigns a unique number
When adding sequential numbers to data or when a unique identifier is needed
RANK()
Assigns the same number to ties and skips the next number
When you want to display ranking positions as they are with ties
DENSE_RANK()
Assigns the same number to ties without skipping numbers
When continuity of ranking is important
Summary
ROW_NUMBER(), RANK(), DENSE_RANK() each need to be used in different scenarios.
ROW_NUMBER() is suitable when a unique number is needed.
RANK() is useful when you want to assign the same rank to identical values while emphasizing gaps in ranking.
DENSE_RANK() is appropriate for scenarios that require continuous ranking without gaps.
5. Alternatives for MySQL versions below 8.0
MySQL versions prior to 8.0 do not support the ROW_NUMBER() function or other window functions. However, by leveraging user variables, you can achieve similar functionality. This section explains concrete alternatives for MySQL versions below 8.0.
5-1. Sequential numbering using user variables
In environments with MySQL 5.7 or earlier, you can assign a sequential number to each row using user variables. See the example below.
Example: Sales ranking by department
Sample data:
employee
department
sale
A
Sales
500
B
Sales
800
C
Development
600
D
Development
700
Query:
SET @row_num = 0;
SET @dept = '';
SELECT
employee,
department,
sale,
@row_num := IF(@dept = department, @row_num + 1, 1) AS rank,
@dept := department
FROM
(SELECT * FROM sales ORDER BY department, sale DESC) AS sorted_sales;
Result:
employee
department
sale
rank
B
Sales
800
1
A
Sales
500
2
D
Development
700
1
C
Development
600
2
5-2. Extracting the top N rows
To retrieve the top N rows, you can similarly use user variables. Query:
SET @row_num = 0;
SET @dept = '';
SELECT *
FROM (
SELECT
employee,
department,
sale,
@row_num := IF(@dept = department, @row_num + 1, 1) AS rank,
@dept := department
FROM
(SELECT * FROM sales ORDER BY department, sale DESC) AS sorted_sales
) AS ranked_sales
WHERE rank <= 3;
Result:
employee
department
sale
rank
B
Sales
800
1
A
Sales
500
2
D
Development
700
1
C
Development
600
2
This query assigns a rank per department and then extracts only the rows within the top three.
5-3. Detecting and removing duplicate rows
Duplicate data handling can also be addressed using user variables. Query example: Detect duplicate rows
SET @row_num = 0;
SET @id_check = '';
SELECT *
FROM (
SELECT
id,
name,
@row_num := IF(@id_check = name, @row_num + 1, 1) AS rank,
@id_check := name
FROM
(SELECT * FROM customers ORDER BY name, id) AS sorted_customers
) AS tmp
WHERE rank > 1;
Query example: Delete duplicate rows
DELETE FROM customers
WHERE id IN (
SELECT id
FROM (
SELECT
id,
@row_num := IF(@id_check = name, @row_num + 1, 1) AS rank,
@id_check := name
FROM
(SELECT * FROM customers ORDER BY name, id) AS sorted_customers
) AS tmp
WHERE rank > 1
);
5-4. Cautions when using user variables
Session dependency
User variables are only valid within the session. They cannot be used in other queries or sessions.
Dependency on execution order
User variables depend on the query execution order, so setting the ORDER BY clause correctly is crucial.
Readability and maintainability of SQL
Because the code can become complex, using window functions is recommended for MySQL 8.0 and later.
Summary
In MySQL versions prior to 8.0, you can use user variables to achieve sequential numbering and ranking since window functions are unavailable. However, because queries can become complex, migrating to a newer version is advisable when possible.
6. Points to Note and Best Practices
MySQL’s ROW_NUMBER() function and alternative methods that use user variables are very handy, but there are points to watch to operate them accurately and efficiently. Here we explain in detail the cautions when using them and best practices for performance optimization.
6-1. Performance Considerations
1. ORDER BY Overhead
ROW_NUMBER() function must always be used together with ORDER BY. This operation sorts the data, so processing time increases with large data sets. Countermeasure:
Use indexes: Set indexes on columns used in ORDER BY to speed up sorting.
Use LIMIT: Retrieve only the needed number of rows to reduce the amount of data processed.
Example:
SELECT
employee,
sale,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
sales
LIMIT 1000;
2. Increased Memory Usage and Disk I/O
Window functions use temporary tables and memory, so as data volume grows, memory consumption and disk I/O increase. Countermeasure:
Split queries: Break the work into smaller queries and extract data step by step to reduce load.
Use temporary tables: Store extracted data in a temporary table and perform aggregation from there to distribute the load.
6-2. Query Tuning Points
1. Checking the Execution Plan
In MySQL you can use the EXPLAIN statement to view a query’s execution plan. This lets you verify whether indexes are being used correctly. Example:
EXPLAIN
SELECT
employee,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
sales;
Sample Output:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
sales
index
NULL
sale
4
NULL
500
Using index
Thus, if Using index is shown, it indicates that the index is being used appropriately.
2. Index Optimization
Make sure to set indexes on columns used in ORDER BY and WHERE clauses. Pay particular attention to the following points.
Single-column index: Applied for simple sort conditions
Composite index: Effective when multiple columns are involved in conditions
Example:
CREATE INDEX idx_department_sale ON sales(department, sale DESC);
3. Leveraging Batch Processing
Instead of processing a huge amount of data at once, you can distribute load by using batch processing to handle data in smaller chunks. Example:
SELECT * FROM sales WHERE department = 'Sales Department' LIMIT 1000 OFFSET 0;
SELECT * FROM sales WHERE department = 'Sales Department' LIMIT 1000 OFFSET 1000;
6-3. Maintaining Data Consistency
1. Data Updates and Recalculation
Adding or deleting data can cause numbering to become out of sync. Therefore, set up a mechanism to recalculate the numbered data as needed. Example:
CREATE VIEW ranked_sales AS
SELECT
employee,
sale,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
sales;
By using a view, you can always maintain a ranking based on the latest data.
6-4. Query Example Best Practices
Below is an example of best practices that consider performance and maintainability. Example: Extract Top N Rows
WITH RankedSales AS (
SELECT
employee,
department,
sale,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
sales
)
SELECT *
FROM RankedSales
WHERE rank <= 3;
This structure uses a Common Table Expression (CTE) to improve code readability and reusability.
Summary
When using the ROW_NUMBER() function or its alternatives, it is important to keep the following points in mind.
Optimize indexes to improve processing speed.
Check the execution plan to identify performance bottlenecks.
Address data updates and implement mechanisms to continuously maintain consistency.
Leverage batch processing and CTEs to distribute load.
By applying these best practices, you can efficiently perform large-scale data analysis and reporting.
7. Summary
In the previous articles, we focused on MySQL’s ROW_NUMBER() function, covering its basic usage, advanced examples, alternative methods for older versions, and cautions and best practices in detail. In this section, we review the key points of the article and summarize practical usage tips.
7-1. Convenience of the ROW_NUMBER() Function
ROW_NUMBER() function is extremely useful in data analysis and report generation for the following reasons.
Sequential numbering per group: Easily achieve sales rankings by department or rankings by category.
Extracting top N rows: Efficiently retrieve and filter data based on specific criteria.
Detecting and removing duplicate data: Helpful for data organization and cleaning tasks.
These capabilities allow you to write complex queries more simply, greatly improving SQL readability and maintainability.
7-2. Comparison with Other Window Functions
Compared with window functions such as RANK() and DENSE_RANK(), ROW_NUMBER() differs in that it assigns a unique number to each row, even when values are the same.
Function
Feature
Use case
ROW_NUMBER()
Assigns a unique sequential number to each row
Ideal when you need to identify data or rank without allowing duplicates
RANK()
Assigns the same rank to ties and skips subsequent ranks
Useful when ranking with ties and gaps between ranks matter
DENSE_RANK()
Assigns the same rank to ties but does not skip ranks
Ranking that considers ties while preserving rank continuity
Choosing the appropriate function: Selecting the right function for your use case enables efficient data processing.
7-3. Strategies for Older Versions
In environments below MySQL 8.0, we also showed how you can use user-defined variables to achieve similar functionality. However, these approaches come with the following considerations.
Reduced readability due to increased code complexity
Optimization of query execution can be difficult
Additional processing is required to maintain data integrity
If possible, we strongly recommend upgrading to MySQL 8.0 or later and using window functions.
7-4. Performance Optimization Tips
Leverage indexes: Create indexes on columns used in ORDER BY to improve processing speed.
Review execution plans: Use the EXPLAIN statement to pre‑validate performance.
Adopt batch processing: Break large datasets into smaller chunks to distribute load.
Utilize views and CTEs: Increase code reusability and simplify complex queries.
Applying these optimization techniques enables efficient and stable data processing.
7-5. Conclusion
ROW_NUMBER() function is a powerful tool that dramatically improves data analysis efficiency.
This article covered everything from basic syntax to advanced examples, cautions, and alternative methods.
We encourage readers to try out the queries in practice as you follow along. By improving your SQL skills, you’ll gain confidence tackling more complex data analyses and report generation.