Remove duplicate dictionary value in a list - similar to excel

Tony.Nguyen · September 11, 2020, 1:27pm

Hi,

I have a list of thousands of dictionaries similar to the one below. Is there an good way to go through the list and remove duplicated key value of name and date_of_birth with condition that status is correct.

In term of excel and python, the list is table, dictionaries is rows and keys are in the column names.
It is very easy to remove duplicate values in excel. https://support.microsoft.com/en-us/office/find-and-remove-duplicates-00e35bea-b46a-4d5d-b28e-66a552dc138d

But I am struggle with Python due to my limited knowledge about it. There surely a way, I just don’t know how to do it.

Any help is greatly appreciated.

[
{
  "name": "Tony",
  "date_of_birth": "1/1/1991",
  "status": "correct"
  "score": "10",
  "checked_by": "Su"
},

{
  "name": "Tony",
  "date_of_birth": "1/1/1991",
  "status": "incorrect",
  "score": "1",
   "checked_by": "Jenny"
}

p.colbert · September 11, 2020, 1:45pm

So when a duplicate pair is found, and its row is marked “correct”, it is to be removed?

Re the list above: is that the “before” or the “after”?

Would it be easier – or clearer – to specify constraints on the resulting table?

stefano.menci · September 11, 2020, 1:57pm

One way could be to convert the list to a dictionary. This should create a dictionary leaving out all the items with identical keys:

no_duplicates = {dic['name']: dic for dic in full_list}

Then you can iterate no_duplicates.values().

stucork · September 11, 2020, 1:59pm

i’m assuming here that the unique keys are name/dob since presumably name is not sufficient…

If this is serverside then pandas is a nice option…

import pandas as pd
orig_list = [dict1, dict2, ...]
df = pd.DataFrame(orig_list)
df = df.drop_duplicates(['name', 'date_of_birth'])
new_list = df.to_dict(orient='records')

Tony.Nguyen · September 12, 2020, 4:41am

Thanks @p.colbert, @stefano.menci and @stucork. I will try all of them and let you know the result.