MetaDeveloper: Remove Duplicate Lines In Python

Sunday, July 29, 2007

Remove Duplicate Lines In Python

{

I had posted about the set operator in Python with some questions. All that changed today when I wrote a little script to remove duplicate lines from a file. The set operator takes a list and automatically gets rid of duplicate items. Very useful for situations like this:


#!/usr/bin/env python

f = open("c:\\temp\\Original.txt")
f2 = open("c:\\temp\\Unique.txt", "w")
uniquelines = set(f.read().split("\n"))
f2.write("".join([line + "\n" for line in uniquelines]))
f2.close()

}

4 comments:

Unknown said...: Superb!

I had a database in filemaker with 250000 records and had written a script to delete duplicates. I was looking at around 24 hours to run that in Filemaker.

Exported it as a CSV and ran your code in Python. Took less than 3 seconds and spat out a CSV that I imported back to filemaker.

Thank you!; 6:05 AM
G.T. Rajpurohit said...: it is great.
but it alter the sequence of file in it; 3:56 AM
Covert Assassin said...: Hi.. I just used this one to eliminate duplicates in my file. I would like to know more about this function set. I'm seriously left wondering how did such few LOC do that perfectly?! Any thoughts?; 6:46 PM
Ramen said...: But the problem is the set operator automatically sort after removing the duplicate...What if we don't want to change the order...; 10:04 PM

Sunday, July 29, 2007

Remove Duplicate Lines In Python

4 comments:

Blog Archive

Blogroll

About Me

Sunday, July 29, 2007

Remove Duplicate Lines In Python

4 comments:

Blog Archive

Blogroll

About Me

Subscribe To Metadeveloper