Hi,
I have a list of thousands of dictionaries similar to the one below. Is there an good way to go through the list and remove duplicated key value of name and date_of_birth with condition that status is correct.
In term of excel and python, the list is table, dictionaries is rows and keys are in the column names.
It is very easy to remove duplicate values in excel. https://support.microsoft.com/en-us/office/find-and-remove-duplicates-00e35bea-b46a-4d5d-b28e-66a552dc138d
But I am struggle with Python due to my limited knowledge about it. There surely a way, I just don’t know how to do it.
Any help is greatly appreciated.
[
{
"name": "Tony",
"date_of_birth": "1/1/1991",
"status": "correct"
"score": "10",
"checked_by": "Su"
},
{
"name": "Tony",
"date_of_birth": "1/1/1991",
"status": "incorrect",
"score": "1",
"checked_by": "Jenny"
}
So when a duplicate pair is found, and its row is marked “correct”, it is to be removed?
Re the list above: is that the “before” or the “after”?
Would it be easier – or clearer – to specify constraints on the resulting table ?
One way could be to convert the list to a dictionary. This should create a dictionary leaving out all the items with identical keys:
no_duplicates = {dic['name']: dic for dic in full_list}
Then you can iterate no_duplicates.values()
.
stucork
September 11, 2020, 1:59pm
4
i’m assuming here that the unique keys are name/dob
since presumably name
is not sufficient…
If this is serverside then pandas is a nice option…
import pandas as pd
orig_list = [dict1, dict2, ...]
df = pd.DataFrame(orig_list)
df = df.drop_duplicates(['name', 'date_of_birth'])
new_list = df.to_dict(orient='records')
Thanks @p.colbert , @stefano.menci and @stucork . I will try all of them and let you know the result.