MySQL ROW_NUMBER Function: Guide from Basics to Alternatives

目次

1. Introduction

In MySQL version 8.0, many new features have been added, and among them, the support for window functions stands out. This article focuses on the frequently used ROW_NUMBER() function. ROW_NUMBER() function provides powerful capabilities for data analysis and reporting, making it easy to sort and rank data based on specific criteria. This article explains the basic usage of this function, advanced examples, and alternative methods for earlier versions.

Target Audience

  • Beginners to intermediate users with basic SQL knowledge
  • Engineers and data analysts who use MySQL for data processing and analysis
  • Those considering migrating to the latest MySQL version

Benefits of the ROW_NUMBER() Function

This function can assign a unique number to rows based on specific conditions. This enables tasks such as “creating a ranking of highest sales” or “extracting and organizing duplicate data” to be expressed concisely. In earlier versions, complex queries using user variables were required, but using the ROW_NUMBER() function improves code simplicity and readability. In this article, we provide concrete query examples and explanations that are easy for beginners to understand. In the next section, we will examine the function’s basic syntax and behavior in detail.

2. What is the ROW_NUMBER function

The ROW_NUMBER() function, newly added in MySQL 8.0, is a type of window function that assigns a sequential number to each row of data. It provides the ability to number rows according to a specific order or group, which is extremely useful for data analysis and report generation. Here, we will explain it in detail, including its basic syntax and concrete examples.

Basic Syntax of the ROW_NUMBER Function

First, the basic format of the ROW_NUMBER() function is as follows.
SELECT
    column_name,
    ROW_NUMBER() OVER (PARTITION BY group_column ORDER BY order_column) AS row_number
FROM
    table_name;

Meaning of Each Element

  • ROW_NUMBER(): A function that assigns a sequential number to each row.
  • OVER: The keyword used to define a window function.
  • PARTITION BY: Groups the data by the specified column. It is optional; if omitted, the sequential numbers are applied to all rows.
  • ORDER BY: Specifies the order in which numbers are assigned. This sets the sorting criteria.

Basic Example

For example, suppose there is a table named “sales” with the following data.
employeedepartmentsale
ASales500
BSales800
CDevelopment600
DDevelopment700
To assign sequential numbers within each department based on descending sales amounts, use the following query.
SELECT
    employee,
    department,
    sale,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS row_num
FROM
    sales;

Result

employeedepartmentsalerow_num
BSales8001
ASales5002
DDevelopment7001
CDevelopment6002
From this result, we can see that the ranking by sales for each department is displayed.

How to Use PARTITION BY

In the example above, the data is grouped by the “department” column, which results in separate sequential numbers for each department. If you omit PARTITION BY, a single sequence is assigned across all rows.
SELECT
    employee,
    sale,
    ROW_NUMBER() OVER (ORDER BY sale DESC) AS row_num
FROM
    sales;

Result

employeesalerow_num
B8001
D7002
C6003
A5004

Features and Considerations of the ROW_NUMBER() Function

  • Unique numbering: Even with identical values, the sequential numbers remain unique.
  • Handling of NULL values: When NULLs are present in ORDER BY, they appear first in ascending order and last in descending order.
  • Performance impact: In large datasets, ORDER BY can be costly, so setting appropriate indexes is important.

3. Practical Use Cases

Here we introduce concrete scenarios that use MySQL’s ROW_NUMBER() function. This function can be applied in many practical cases such as data ranking and handling duplicate data.

3-1. Ranking Within Groups

For example, consider a case where you want to assign ranks to sales data in descending order of revenue for each department. Below is an example dataset.
employeedepartmentsale
ASales500
BSales800
CDevelopment600
DDevelopment700
Query Example: Sales Ranking by Department
SELECT
    employee,
    department,
    sale,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
    sales;
Result:
employeedepartmentsalerank
BSales8001
ASales5002
DDevelopment7001
CDevelopment6002
In this way, sequential numbers are assigned in sales order for each department, making ranking easy.

3-2. Extracting Top N Records

Next, let’s look at a case where you want to extract the top 3 sales employees for each department. Query Example: Query to Extract Top N Records
WITH RankedSales AS (
    SELECT
        employee,
        department,
        sale,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
    FROM
        sales
)
SELECT
    employee,
    department,
    sale
FROM
    RankedSales
WHERE
    rank <= 3;
Result:
employeedepartmentsale
BSales800
ASales500
DDevelopment700
CDevelopment600
In this example, only the top 3 sales data for each department are retrieved. Thus, the ROW_NUMBER() function is suitable not only for ranking but also for filtering top records.

3-3. Extracting and Deleting Duplicate Data

Databases can contain duplicate data. In such cases, you can also handle them easily using the ROW_NUMBER() function. Query Example: Extract Duplicate Data
SELECT *
FROM (
    SELECT
        employee,
        sale,
        ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sale DESC) AS rank
    FROM
        sales
) tmp
WHERE rank > 1;
This query detects duplicates when multiple records exist for the same employee name. Query Example: Delete Duplicate Data
DELETE FROM sales
WHERE id IN (
    SELECT id
    FROM (
        SELECT
            id,
            ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sale DESC) AS rank
        FROM
            sales
    ) tmp
    WHERE rank > 1
);

Summary

ROW_NUMBER() function is,
  1. Ranking within groups
  2. Extracting top N records
  3. Detecting and deleting duplicate data
It works in various scenarios such as these. This enables complex data processing and analysis to be performed easily and efficiently.

4. Comparison with Other Window Functions

MySQL 8.0 provides, in addition to ROW_NUMBER(), window functions such as RANK() and DENSE_RANK() for ranking and position calculations. While these functions serve similar purposes, they differ in behavior and results. Here we compare each function and explain appropriate use cases.

4-1. RANK() Function

RANK() function assigns rankings, but its characteristic is that it gives the same rank to identical values and skips the next rank.

Basic Syntax

SELECT
    column_name,
    RANK() OVER (PARTITION BY group_column ORDER BY sort_column) AS rank
FROM
    table_name;

Example

We use the following data to calculate sales rankings.
employeedepartmentsale
ASales800
BSales800
CSales600
DSales500
Query example: Using RANK()
SELECT
    employee,
    sale,
    RANK() OVER (ORDER BY sale DESC) AS rank
FROM
    sales;
Result:
employeesalerank
A8001
B8001
C6003
D5004
Features:
  • A and B, which have the same sales amount (800), are treated as rank 1.
  • The next rank, 2, is skipped, so C becomes rank 3.

4-2. DENSE_RANK() Function

DENSE_RANK() function also assigns the same rank to identical values like RANK(), but it does not skip the next rank.

Basic Syntax

SELECT
    column_name,
    DENSE_RANK() OVER (PARTITION BY group_column ORDER BY sort_column) AS rank
FROM
    table_name;

Example

We use the same data as before to try the DENSE_RANK() function. Query example: Using DENSE_RANK()
SELECT
    employee,
    sale,
    DENSE_RANK() OVER (ORDER BY sale DESC) AS dense_rank
FROM
    sales;
Result:
employeesaledense_rank
A8001
B8001
C6002
D5003
Features:
  • A and B, with the same sales amount (800), are treated as rank 1.
  • Unlike RANK(), the next rank starts at 2, preserving rank continuity.

4-3. Difference from ROW_NUMBER() Function

ROW_NUMBER() function differs from the other two functions in that it assigns a unique number even when values are the same.

Example

SELECT
    employee,
    sale,
    ROW_NUMBER() OVER (ORDER BY sale DESC) AS row_num
FROM
    sales;
Result:
employeesalerow_num
A8001
B8002
C6003
D5004
Features:
  • Even when values are identical, a unique number is assigned, so there are no duplicate ranks.
  • It is suitable when you need strict control over data order or require row-level uniqueness.

4-4. Summary of Use Cases for Each Function

FunctionRanking BehaviorUse Case
ROW_NUMBER()Assigns a unique numberWhen adding sequential numbers to data or when a unique identifier is needed
RANK()Assigns the same number to ties and skips the next numberWhen you want to display ranking positions as they are with ties
DENSE_RANK()Assigns the same number to ties without skipping numbersWhen continuity of ranking is important

Summary

ROW_NUMBER(), RANK(), DENSE_RANK() each need to be used in different scenarios.
  1. ROW_NUMBER() is suitable when a unique number is needed.
  2. RANK() is useful when you want to assign the same rank to identical values while emphasizing gaps in ranking.
  3. DENSE_RANK() is appropriate for scenarios that require continuous ranking without gaps.

5. Alternatives for MySQL versions below 8.0

MySQL versions prior to 8.0 do not support the ROW_NUMBER() function or other window functions. However, by leveraging user variables, you can achieve similar functionality. This section explains concrete alternatives for MySQL versions below 8.0.

5-1. Sequential numbering using user variables

In environments with MySQL 5.7 or earlier, you can assign a sequential number to each row using user variables. See the example below.

Example: Sales ranking by department

Sample data:
employeedepartmentsale
ASales500
BSales800
CDevelopment600
DDevelopment700
Query:
SET @row_num = 0;
SET @dept = '';

SELECT
    employee,
    department,
    sale,
    @row_num := IF(@dept = department, @row_num + 1, 1) AS rank,
    @dept := department
FROM
    (SELECT * FROM sales ORDER BY department, sale DESC) AS sorted_sales;
Result:
employeedepartmentsalerank
BSales8001
ASales5002
DDevelopment7001
CDevelopment6002

5-2. Extracting the top N rows

To retrieve the top N rows, you can similarly use user variables. Query:
SET @row_num = 0;
SET @dept = '';

SELECT *
FROM (
    SELECT
        employee,
        department,
        sale,
        @row_num := IF(@dept = department, @row_num + 1, 1) AS rank,
        @dept := department
    FROM
        (SELECT * FROM sales ORDER BY department, sale DESC) AS sorted_sales
) AS ranked_sales
WHERE rank <= 3;
Result:
employeedepartmentsalerank
BSales8001
ASales5002
DDevelopment7001
CDevelopment6002
This query assigns a rank per department and then extracts only the rows within the top three.

5-3. Detecting and removing duplicate rows

Duplicate data handling can also be addressed using user variables. Query example: Detect duplicate rows
SET @row_num = 0;
SET @id_check = '';

SELECT *
FROM (
    SELECT
        id,
        name,
        @row_num := IF(@id_check = name, @row_num + 1, 1) AS rank,
        @id_check := name
    FROM
        (SELECT * FROM customers ORDER BY name, id) AS sorted_customers
) AS tmp
WHERE rank > 1;
Query example: Delete duplicate rows
DELETE FROM customers
WHERE id IN (
    SELECT id
    FROM (
        SELECT
            id,
            @row_num := IF(@id_check = name, @row_num + 1, 1) AS rank,
            @id_check := name
        FROM
            (SELECT * FROM customers ORDER BY name, id) AS sorted_customers
    ) AS tmp
    WHERE rank > 1
);

5-4. Cautions when using user variables

  1. Session dependency
  • User variables are only valid within the session. They cannot be used in other queries or sessions.
  1. Dependency on execution order
  • User variables depend on the query execution order, so setting the ORDER BY clause correctly is crucial.
  1. Readability and maintainability of SQL
  • Because the code can become complex, using window functions is recommended for MySQL 8.0 and later.

Summary

In MySQL versions prior to 8.0, you can use user variables to achieve sequential numbering and ranking since window functions are unavailable. However, because queries can become complex, migrating to a newer version is advisable when possible.

6. Points to Note and Best Practices

MySQL’s ROW_NUMBER() function and alternative methods that use user variables are very handy, but there are points to watch to operate them accurately and efficiently. Here we explain in detail the cautions when using them and best practices for performance optimization.

6-1. Performance Considerations

1. ORDER BY Overhead

ROW_NUMBER() function must always be used together with ORDER BY. This operation sorts the data, so processing time increases with large data sets. Countermeasure:
  • Use indexes: Set indexes on columns used in ORDER BY to speed up sorting.
  • Use LIMIT: Retrieve only the needed number of rows to reduce the amount of data processed.
Example:
SELECT
    employee,
    sale,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
    sales
LIMIT 1000;

2. Increased Memory Usage and Disk I/O

Window functions use temporary tables and memory, so as data volume grows, memory consumption and disk I/O increase. Countermeasure:
  • Split queries: Break the work into smaller queries and extract data step by step to reduce load.
  • Use temporary tables: Store extracted data in a temporary table and perform aggregation from there to distribute the load.

6-2. Query Tuning Points

1. Checking the Execution Plan

In MySQL you can use the EXPLAIN statement to view a query’s execution plan. This lets you verify whether indexes are being used correctly. Example:
EXPLAIN
SELECT
    employee,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
    sales;
Sample Output:
idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1SIMPLEsalesindexNULLsale4NULL500Using index
Thus, if Using index is shown, it indicates that the index is being used appropriately.

2. Index Optimization

Make sure to set indexes on columns used in ORDER BY and WHERE clauses. Pay particular attention to the following points.
  • Single-column index: Applied for simple sort conditions
  • Composite index: Effective when multiple columns are involved in conditions
Example:
CREATE INDEX idx_department_sale ON sales(department, sale DESC);

3. Leveraging Batch Processing

Instead of processing a huge amount of data at once, you can distribute load by using batch processing to handle data in smaller chunks. Example:
SELECT * FROM sales WHERE department = 'Sales Department' LIMIT 1000 OFFSET 0;
SELECT * FROM sales WHERE department = 'Sales Department' LIMIT 1000 OFFSET 1000;

6-3. Maintaining Data Consistency

1. Data Updates and Recalculation

Adding or deleting data can cause numbering to become out of sync. Therefore, set up a mechanism to recalculate the numbered data as needed. Example:
CREATE VIEW ranked_sales AS
SELECT
    employee,
    sale,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
    sales;
By using a view, you can always maintain a ranking based on the latest data.

6-4. Query Example Best Practices

Below is an example of best practices that consider performance and maintainability. Example: Extract Top N Rows
WITH RankedSales AS (
    SELECT
        employee,
        department,
        sale,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
    FROM
        sales
)
SELECT *
FROM RankedSales
WHERE rank <= 3;
This structure uses a Common Table Expression (CTE) to improve code readability and reusability.

Summary

When using the ROW_NUMBER() function or its alternatives, it is important to keep the following points in mind.
  1. Optimize indexes to improve processing speed.
  2. Check the execution plan to identify performance bottlenecks.
  3. Address data updates and implement mechanisms to continuously maintain consistency.
  4. Leverage batch processing and CTEs to distribute load.
By applying these best practices, you can efficiently perform large-scale data analysis and reporting.

7. Summary

In the previous articles, we focused on MySQL’s ROW_NUMBER() function, covering its basic usage, advanced examples, alternative methods for older versions, and cautions and best practices in detail. In this section, we review the key points of the article and summarize practical usage tips.

7-1. Convenience of the ROW_NUMBER() Function

ROW_NUMBER() function is extremely useful in data analysis and report generation for the following reasons.
  1. Sequential numbering per group: Easily achieve sales rankings by department or rankings by category.
  2. Extracting top N rows: Efficiently retrieve and filter data based on specific criteria.
  3. Detecting and removing duplicate data: Helpful for data organization and cleaning tasks.
These capabilities allow you to write complex queries more simply, greatly improving SQL readability and maintainability.

7-2. Comparison with Other Window Functions

Compared with window functions such as RANK() and DENSE_RANK(), ROW_NUMBER() differs in that it assigns a unique number to each row, even when values are the same.
FunctionFeatureUse case
ROW_NUMBER()Assigns a unique sequential number to each rowIdeal when you need to identify data or rank without allowing duplicates
RANK()Assigns the same rank to ties and skips subsequent ranksUseful when ranking with ties and gaps between ranks matter
DENSE_RANK()Assigns the same rank to ties but does not skip ranksRanking that considers ties while preserving rank continuity
Choosing the appropriate function: Selecting the right function for your use case enables efficient data processing.

7-3. Strategies for Older Versions

In environments below MySQL 8.0, we also showed how you can use user-defined variables to achieve similar functionality. However, these approaches come with the following considerations.
  • Reduced readability due to increased code complexity
  • Optimization of query execution can be difficult
  • Additional processing is required to maintain data integrity
If possible, we strongly recommend upgrading to MySQL 8.0 or later and using window functions.

7-4. Performance Optimization Tips

  1. Leverage indexes: Create indexes on columns used in ORDER BY to improve processing speed.
  2. Review execution plans: Use the EXPLAIN statement to pre‑validate performance.
  3. Adopt batch processing: Break large datasets into smaller chunks to distribute load.
  4. Utilize views and CTEs: Increase code reusability and simplify complex queries.
Applying these optimization techniques enables efficient and stable data processing.

7-5. Conclusion

ROW_NUMBER() function is a powerful tool that dramatically improves data analysis efficiency. This article covered everything from basic syntax to advanced examples, cautions, and alternative methods. We encourage readers to try out the queries in practice as you follow along. By improving your SQL skills, you’ll gain confidence tackling more complex data analyses and report generation.

Appendix: Reference Resources