Removing duplicates from list of dictionaries with multiple key/value pairs by comparing on some of the values [on hold]
Removing duplicates from list of dictionaries with multiple key/value pairs by comparing on some of the values [on hold]
having some trouble getting this done. I would go for fully functional approach as a nested for loops but that easily gets out of hand (n^2). I have a list of dictionaries (think phonebook even if it isn't) with each dict having this kind of key,values:
{'mat':'name_of_the_material', 'tex': [list_of_textures], 'geo':[list_of_geo]}
The goal is to reduce the list of dictionaries and match them on first two keys ('mat','tex') and combine the third list when de-duplicating. So mat
, tex
wouldn't repeat, and geo
list would be a merge of N items from the original list of dict.
mat
tex
geo
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
give your input and expected output
– sachin dubey
Jun 29 at 8:43
can you give more example ?
– Vikas Damodar
Jun 29 at 8:44
It's not clear what you want to do with
tex
if they aren't identical? DOes that need merging too?– doctorlove
Jun 29 at 8:49
tex
3 Answers
3
The most efficient way is to build a new dictionary to act as an index, and then run through the list updating the dictionary as you go. This code will modify the original dictionaries, but you can easily avoid that by copying on insert if you wish. It's not fully functional.
def make_key(item):
tex = item['tex']
tex = tex.sort()
return (item['mat'], ','.join(tex))
index = {}
for item in list_of_dict:
key = make_key(item)
item2 = index.get(key)
if item2:
item2['geo'] += item['geo']
else:
index[key] = item
This code assumes all the fields are always present.
You should just use ('mat', 'tex')
as keys in a dictionnary and aggregate the values like so :
('mat', 'tex')
from collections import defaultdict
l = [
{'mat': '0', 'tex': ['a', 'b'], 'geo': ['A', 'B', 'C']},
{'mat': '0', 'tex': ['a', 'b'], 'geo': ['C', 'D']},
{'mat': '1', 'tex':['b', 'c'], 'geo': ['A', 'C']}
]
d = defaultdict(list)
for item in l:
mat = item['mat']
tex = tuple(item['tex']) #you need to make it a tuple because list is not a hashable type
geo = item['geo']
d[(mat, tex)] += geo
d = dict(d)
print(d)
>>> {('0', ('a', 'b')): ['A', 'B', 'C', 'C', 'D'], ('1', ('b', 'c')): ['A', 'C']}
final_list =
for key, value in d.items():
final_list.append({
'mat': key[0],
'tex': list(key[1]),
'geo': value
})
print(final_list)
>>> [{'mat': '0', 'tex': ['a', 'b'], 'geo': ['A', 'B', 'C', 'C', 'D']}, {'mat': '1', 'tex': ['b', 'c'], 'geo': ['A', 'C']}]
Also if the order of the values in 'tex'
doesn't matter you can sort the list before making it a tuple.
'tex'
You can also use sets instead of list for the geo
if you don't want values to be duplicated
geo
I'd use a dictionary with dict.setdefault
:
dict.setdefault
L = [{'mat': 'Mat1', 'tex': [1, 2], 'geo': [3, 4]},
{'mat': 'Mat2', 'tex': [1, 2], 'geo': [5, 6]},
{'mat': 'Mat1', 'tex': [1, 2], 'geo': [7, 8]},
{'mat': 'Mat2', 'tex': [3, 4], 'geo': [9, 10]},
{'mat': 'Mat2', 'tex': [1, 2], 'geo': [11, 12]}]
reduced = {}
for entry in L:
mat = reduced.setdefault(entry['mat'], {})
geo = mat.setdefault(tuple(entry['tex']), )
geo.extend(entry['geo'])
from pprint import pprint
pprint(reduced)
Output:
{'Mat1': {(1, 2): [3, 4, 7, 8]},
'Mat2': {(1, 2): [5, 6, 11, 12], (3, 4): [9, 10]}}
And you ultimately want to end up with one dictionary?
– doctorlove
Jun 29 at 8:43