How to remove duplicate items from list in Python

Hi everybody,

This is a simple question that “How to remove duplicate items from a list in Python?”. There are quite some methods you can use to do so. Some programmers like easy and simple ways to remove items from a list in Python. Some of them want sophisticated and reliable method. It all depends on the requirement and to some extend the personal choice of the programmer.

Unfortunately, Python Programming Language doesn’t have a built-in method or function for this task. We have to do some manual and find the best solution for our purpose.

Searching the web and Googling gave me many results and most of them are confusing or don’t fulfill the requirement. Let have some discussion on them. If you want to read a full tutorial with examples and explanation you can read this article 5 Best Ways to Remove Duplicate Elements from list in Python

There are some points to remember before you remove duplicate items from a list in Python. They should be considered before you choose a method.

Consideration: How to remove duplicate items from the list in Python?

  1. Whether objects in the list are hashable or not?
  2. Whether they support element/item comparison?
  3. Do you need to preserve the order of the list after removing the items?
  4. Are you dealing with large list object and need a fast method?

1. Objects in the list are hashable or not!

If all objects in the list are hashable like integers, strings, float (Keep in mind objects like set are not hashable) and you don’t need to preserve the order of the list. Then one of the simplest methods is you convert a list into a set and then back into a list object. Something like list(set(list_obj)). This simple yet powerful method and quite fast too.

2. Whether they support element/item comparison!

If objects in the list are not primitive objects like integers, floats, and strings. They will not be a simple way to find identical objects in the Python list. So Python interpretative will not be able to group and remove them straight away. A special comparison algorithm will be required to achieve this task. Fortunately, Python has an elegant way for such tasks. __cmp__ method of the class is a clue to success.

3. Do you need to preserve the order of the list after removing the items?

In most of the situations, an order doesn’t matter after removing items from a list in Python. Sometimes, it is required and make life tedious for programmers to implement it. Python doesn’t have any built-in method for this task. So we have to do some homework and find a solution. Python built-in set.setdefault() method can be used with the list comprehension to achieve this goal.

Here is a simple code snippet you can implement in you Python program or modify it as you require.
[set.setdefault(x,x) for x in alist if x not in set]

4. Are you dealing with large Python list object and need a fast method?

Here comes all expertise to solve this situation. Small Python list object doesn’t need any consideration. Even moderate Python lists don’t need as much consideration if they are not used often. If you have large statistical data which involve heavy use of Python list and you need to remove duplicate items from them. You must pay a very close consideration to for this task. Sometimes, it becomes the pain in the head and requires special expertise.

Fortunately, Python has OrderedDict class from collection package. It is purely implemented in C programming language from Python 3.5 and onward. If this class fulfills your requirement then you are in the heaven. This OrderedDict class doesn’t support non-hashable object such as, set in the list. elements.

I have also found an external Python library  iteration_utilities. Here is a link to this iteration_utilities library home page. The author of this library claims that it is the fastest library. This library is also implemented in C programming language. Obviously, it is an edge and speed comes in the tradition of C language. You should give it a try.

I have discussed major issues involved in this task. I have found some resources where a clear solution is provided. You can follow these links and find the right solution for yourself. As I have described earlier that you should perform an extensive test on your data before you implement in the production environment. Here are some useful links you may follow are learnandlearn and code academy.

Thank you for your reading. Don’t forget to comment and post your solution.

Best of luck.

Leave a comment