How to find the distinct values in the array in all the indexes using elasticsearch-dsl?
How to find the distinct values in the array in all the indexes using elasticsearch-dsl?
I am using elasticsearch-dsl in django. And I have a DocType document defined and a keyword containing a list of values.
Here is my code for the same.
from elasticsearch_dsl import DocType, Text, Keyword
class ProductIndex(DocType):
"""
Index for products
"""
id = Keyword()
slug = Keyword()
name = Text()
filter_list = Keyword()
filter_list is the array here which contains multiple values. Now I have some values say sample_filter_list which are the distinct values from and some of these elements can be present in some product's filter_list array. So given this sample_filter_list, I want all the unique elements of filter_list of all the products whose filter_list intersection with sample_filter_list in not null.
for example I have 5 products whose filter_list is like :
1) ['a', 'b', 'c']
2) ['d', 'e', 'f']
3) ['g', 'h', 'i']
4) ['j', 'k', 'l']
5) ['m', 'n', 'o']
and if my sample filter_list is ['a', 'd', 'g', 'j', 'm']
then elasticsearch should return an array containg distinct element
i.e. ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o']
2 Answers
2
Writing Answer not specific to django but general,
Suppose you have some ES index some_index2 with mapping
PUT some_index2
{
"mappings": {
"some_type": {
"dynamic_templates": [
{
"strings": {
"mapping": {
"type": "string"
},
"match_mapping_type": "string"
}
}
],
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
}
}
}
}
}
Also you have inserted the documents
{
"field1":"id1",
"field2":["a","b","c","d]
}
{
"field1":"id2",
"field2":["e","f","g"]
}
{
"field1":"id3",
"field2":["e","l","k"]
}
Now as you stated you want all the distinct values of field2(filter_list) in your case, You can easily get that by using ElasticSearch term aggregation
GET some_index2/_search
{
"aggs": {
"some_name": {
"terms": {
"field": "field2",
"size": 10000
}
}
},
"size": 0
}
Which will give you result as:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits":
},
"aggregations": {
"some_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "e",
"doc_count": 2
},
{
"key": "a",
"doc_count": 1
},
{
"key": "b",
"doc_count": 1
},
{
"key": "c",
"doc_count": 1
},
{
"key": "d",
"doc_count": 1
},
{
"key": "f",
"doc_count": 1
},
{
"key": "g",
"doc_count": 1
},
{
"key": "k",
"doc_count": 1
},
{
"key": "l",
"doc_count": 1
}
]
}
}
}
where buckets contains the list of all the distinct values.
you can easily iterate through bucket and find the value under KEY.
Hope this is what is required to you.
I am a little confused about what is it that you want, to just query the products whose filter_list
intersect with sample_filter_list
just run a terms
query:
filter_list
sample_filter_list
terms
ProductIndex.search().filter('terms', filter_list=sample_filter_list)
ProductIndex.search().filter('terms', filter_list=sample_filter_list)
Hope this helps!
I have edited the question please check.
– Nipun Garg
Jun 27 at 19:30
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I just want the single array that contains distinct elements from the filter_list of all the products.
– Nipun Garg
Jun 27 at 19:18