How to delete all duplicate rows except the one in MySQL

Delete all duplicate rows in MySQL

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

In this post, I am sharing a demonstration on how to remove all rows except for one in MySQL.

Before a few days ago, I was working in one of our production reports and found that there are a lot of duplicate records.

This table has more than 2,00,00,000 records and I removed duplicate based upon two varchar column and required to store one record for each group.

I prepared a demo with a different solution because I have also to take care of performance.

The Solutions are:

Using Self – Join to check every next row with current row and create a delete flag for duplicate rows.
Swapping of the table – copy unique records in the temp table and restore back to the original table after deleting of duplicate records.

But this didn’t work for me because this solution requires a massive amount of DML operation which will degrade performance.

After some research, I found that MySQL has a feature called ALTER IGNORE.
Using ALTER IGNORE, you can apply unique key on those columns which required for removing duplicates. Once you execute this DDL command, it creates a unique key constraint and removes all duplicate records from the database.

This is an excellent option in MySQL and also worked for me.

Let me demonstrate this:

First, create demo table and data for varchar datatype:

CREATE DATABASE Employee;

CREATE TABLE Employee.tbl_DuplicateEmployee_VarcharDatatype

(

EmpID INTEGER

,EmpName VARCHAR(250)

);

INSERT INTO Employee.tbl_DuplicateEmployee_VarcharDatatype

VALUES

(1,'ABC'),(2,'XYZ'),(3,'PQR')

,(1,'ABC'),(2,'XYZ'),(3,'PQR')

,(1,'ABC'),(2,'XYZ'),(3,'PQR');

SET old ALTER table behavior:

1	SET SESSION old_alter_table=1;

Execute ALTER IGNORE:

ALTER IGNORE

TABLE Employee.tbl_DuplicateEmployee_VarcharDatatype

ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_VarcharDatatype

(EmpID,EmpName);

Find result:

1 2	SELECT *FROM Employee.tbl_DuplicateEmployee_VarcharDatatype;

As you can see in the above result, all duplicate records removed.

If your column has a BLOB data type, you cannot directly insert any unique key constraint on it.
If you have a BLOB data type column like TEXT, so temporary, you have to create one MD5 column and use this MD5 column in ALTER IGNORE.

Below is a demonstration for deleting BLOB duplicate records.

First, create a demo table and data for TEXT datatype:

CREATE TABLE Employee.tbl_DuplicateEmployee_BLOBDataType

(

EmpID INTEGER

,EmpName TEXT

);

INSERT INTO Employee.tbl_DuplicateEmployee_BLOBDataType

VALUES

(1,'ABC'),(2,'XYZ'),(3,'PQR')

,(1,'ABC'),(2,'XYZ'),(3,'PQR')

,(1,'ABC'),(2,'XYZ'),(3,'PQR');

Create MD5 column for EmpID(Integer) + EmpName(Text):

ALTER TABLE Employee.tbl_DuplicateEmployee_BLOBDataType ADD MD5Column VARCHAR(50);

UPDATE Employee.tbl_DuplicateEmployee_BLOBDataType

SET MD5Column = MD5(CONCAT(EmpID,EmpName));

Execute ALTER IGNORE:

1	ALTER IGNORE TABLE Employee.tbl_DuplicateEmployee_BLOBDataType ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_BLOBDataType(MD5Column);

Now check the result:

1 2	SELECT *FROM Employee.tbl_DuplicateEmployee_BLOBDataType;

Remove MD5 column:

1	ALTER TABLE Employee.tbl_DuplicateEmployee_BLOBDataType DROP COLUMN MD5Column;

Sep 19, 2015Anvesh Patel

Comments: 4

Hareesh

December 27, 2016 at 4:17 pm

Hi Anvesh,

Please explain a bit more about below queries.

what does this query :
SET SESSION old_alter_table=1;

ALTER IGNORE
TABLE Employee.tbl_DuplicateEmployee_VarcharDatatype
ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_VarcharDatatype
(EmpID,EmpName);

Thanks

Anvesh Patel
December 29, 2016 at 6:33 pm
When this option is given, the server does not use the optimized method of processing an ALTER TABLE operation. It reverts to using a temporary table, copying over the data, and then renaming the temporary table to the original.

Sudheer

September 23, 2019 at 6:25 am

Hiii Anvesh,

How to delete the duplicate data in mysql workbench.There are 6 lakhs above data in the table. I have used below query for delete but it shows Lost connection to Mysql Server during query. can you please send the query or explain how to solve problem.

SET SQL_SAFE_UPDATES=0;
DELETE t1 FROM PumpErrorLog t1
INNER JOIN PumpErrorLog t2
WHERE
t1.id < t2.id AND
t1.device_id = t2.device_id AND
t1.side = t2.side AND
t1.event_date = t2.event_date AND
t1.data_source = t2.data_source AND
t1.code= t2.code AND
t1.level= t2.level AND
t1.description= t2.description AND
t1.pump_number= t2.pump_number AND
t1.TimeStamp= t2.TimeStamp AND
t1.report_time= t2.report_time AND
t1.s3logfile= t2.s3logfile AND
t1.board_timestamp= t2.board_timestamp ;

Shaik

February 18, 2020 at 10:17 am

I am getting below error while trying this

ALTER IGNORE
TABLE tbl_DuplicateEmployee_VarcharDatatype
ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_VarcharDatatype
(EmpID,EmpName);

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘IGNORE
TABLE tbl_DuplicateEmployee_VarcharDatatype
ADD UNIQUE INDEX idx_tbl_Dup’ at line 1

Delete all duplicate rows in MySQL

Anvesh Patel

About Me!

About DBRND !

Recent Comments !

Follow Me !