五、浅析[ElasticSearch]底层原理与分组聚合查询

目录

  • 一、ElasticSearch文档分值_score计算底层原理
    • 1.boolean model
    • 2.relevance score算法
    • 2、分析一个document上的_score是如何被计算出来的
  • 二、分词器工作流程
    • 1.character filter、tokenizer、token filter
    • 2、内置分词器的简单介绍
    • 3、定制分词器
      • 3.1默认的分词器--standard
      • 3.2修改分词器的设置
      • 3.3定制化自己的分词器
      • 3.4 ik分词器详解
  • 三、高亮显示
    • 1.高亮简述
    • 2.常用的highlight
    • 3.fast vector highlight
    • 4.高亮片段fragment的设置
  • 四、 聚合搜索技术深入
    • 1.bucket和metric
    • 2聚合操作案例
      • 2.1聚合操作之histogram 区间统计
      • 2.2date_histogram区间分组
      • 2.3_global bucket
      • 2.4 aggs+order(聚合+排序)
      • 2.5search+aggs (条件查询+聚合)
      • 2.6filter+aggs(过滤+聚合)
      • 2.7聚合中使用filter

集群节点介绍

es配置文件夹中

主节点:node.master:true
数据节点: node.data: true
  1. 客户端节点
      当主节点和数据节点配置都设置为false的时候,该节点只能处理路由请求,处理搜索,分发索引操作等,从本质上来说该客户节点表现为智能负载平衡器。独立的客户端节点在一个比较大的集群中是非常有用的,他协调主节点和数据节点,客户端节点加入集群可以得到集群的状态,根据集群的状态可以直接路由请求。

  2. 数据节点
      数据节点主要是存储索引数据的节点,主要对文档进行增删改查操作,聚合操作等。数据节点对cpu,内存,io要求较高, 在优化的时候需要监控数据节点的状态,当资源不够的时候,需要在集群中添加新的节点。

  3. 主节点
      主资格节点的主要职责是和集群操作相关的内容,如创建或删除索引,跟踪哪些节点是群集的一部分,并决定哪些分片分配给相关的节点。稳定的主节点对集群的健康是非常重要的,默认情况下任何一个集群中的节点都有可能被选为主节点,索引数据和搜索查询等操作会占用大量的cpu,内存,io资源,为了确保一个集群的稳定,分离主节点和数据节点是一个比较好的选择。

一、ElasticSearch文档分值_score计算底层原理

1.boolean model

第一步、根据用户的query条件,先过滤出包含指定term(关键字)的doc(文档)
例如查询"hello world"

query "hello world"  拆分不同的term-->  hello / world / hello & world

第二步、根据你的条件进行筛选

bool --> must/must not/should 筛选条件--> 过滤 --> 包含 / 不包含 / 可能包含

到这里还没有进行打分。

2.relevance score算法

该算法是计算出一个索引中的文本,与搜索文本,他们之间的关联匹配程度。
Elasticsearch使用的是 term frequency/inverse document frequency算法,简称为TF/IDF算法(TF除以IDF)。
第三步、开始计算

  1. Term frequency(TF):搜索文本中的各个词条在field文本中出现了多少次,出现次数越多,就越相关。
    例如
    搜索请求:hello world
    会拆成hello和world。去文档中去找这些关键字出现的次数。出现次数越多,分数越高。
doc1:hello you, and world is very good

doc2:hello, how are you
  1. Inverse document frequency(IDF):搜索文本中的各个词条在整个索引的所有文档中出现了多少次,出现的次数越多,就越不相关。
    (可以这么理解,就比如你搜索的关键字为:'的,是’这些关键字几乎在整个索引存在很多。考虑到类似这一情况进行的该算法。)
    例如
    搜索请求:hello world
doc1:hello, july is good

doc2:hi world, how are you

此外处理上述的tf和idf外还有一个因素有关
3. Field-length norm:field长度,field越长,相关度越弱

例如
搜索请求:hello world

doc1:{ "title": "hello july", "content": "...... 1000个单词" }
doc2:{ "title": "my baby", "content": "...... 1000个单词,hi world" }

hello world在整个index中出现的次数是一样多的,但是,doc1更相关,title 字段中内容更短。

2、分析一个document上的_score是如何被计算出来的

使用_explain进行一个简单的查询举例。

GET /test_index08/_doc/3/_explain
{"query":{"match":{"f":"hello"}}}

结果
包含上述所说的idf和tf等相关分数,这里先简单了解。es的计算分数涉及到的数学知识还是比较复杂的这里不展开讲解了。
在这里插入图片描述


二、分词器工作流程

1.character filter、tokenizer、token filter

  • 切分词语和normalization

根据指定的分词器,把要保存到es中的数据进行切分,给你一段句子,然后将这段句子拆分成一个一个的单个的单词,同时对每个单词进行normalization(时态转换,单复数转换等)。

工作流程大致可以分为三个步骤
第一步:character filter:在一段文本进行分词之前,先进行预处理,比如说最常见的就是,过滤一些内容(把html标签过滤掉,把一些特殊符号进行转换& --> and,&转and等。)

第二步:tokenizer:分词,hello you and me --> hello, you, and, me

第三步:token filter:lowercase,stop word,synonymom,(例如处理大小写转换,停用词的处理,同义词的处理等。)

经过各种处理后,最后处理好的结果才会拿去建立倒排索引。

2、内置分词器的简单介绍

测试内容:Set the shape to semi-transparent by calling set_trans(5)

  • standard analyze
    结果:set, the, shape, to, semi, transparent, by, calling, set_trans, 5(默认的是standard分词器)
  • simple analyzer
    结果:set, the, shape, to, semi, transparent, by, calling, set, trans
  • whitespace analyzer
    结果:Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
  • stop analyzer
    结果:移除停用词,比如a the it等等

举例

POST _analyze
{
  "analyzer": "standard",
  "text": "Set the shape to semi-transparent by calling set_trans(5)"
}

详细结果

{
  "tokens" : [
    {
      "token" : "set",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "the",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "shape",
      "start_offset" : 8,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "to",
      "start_offset" : 14,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "semi",
      "start_offset" : 17,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "transparent",
      "start_offset" : 22,
      "end_offset" : 33,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "by",
      "start_offset" : 34,
      "end_offset" : 36,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "calling",
      "start_offset" : 37,
      "end_offset" : 44,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "set_trans",
      "start_offset" : 45,
      "end_offset" : 54,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "5",
      "start_offset" : 55,
      "end_offset" : 56,
      "type" : "<NUM>",
      "position" : 9
    }
  ]
}

3、定制分词器

3.1默认的分词器–standard

standard tokenizer:以单词边界进行切分

standard token filter:什么都不做

lowercase token filter:将所有字母转换为小写

stop token filer(默认被禁用):移除停用词,比如a the it等等

3.2修改分词器的设置

英文环境下,启用停用词。
例如
创建一个名为my_index的索引,其中es_std为自定义分词器名称,stopwords为设置英文环境下启用停用词。

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "es_std": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  }
}

默认分词器分词

GET /my_index/_analyze
{
  "analyzer": "standard", 
  "text": "a dog is in the house"
}

结果

{
  "tokens" : [
    {
      "token" : "a",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "dog",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "is",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "in",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "the",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "house",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}

测试自定义分词器的分词结果

GET /my_index/_analyze
{
  "analyzer": "es_std", 
  "text": "a dog is in the house"
}

结果

{
  "tokens" : [
    {
      "token" : "dog",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "house",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}

3.3定制化自己的分词器

创建一个my_index2索引,要求内容中的 & 转换成and,其中&Toand名称是自定义的,类型为mapping(映射关系),多个条件使用逗号分隔,设置停用词文本中有the、a把他过滤掉,其中my_stopwords名称自定义,类型为stop(停用词)。my_analyzer为自定分词的名称,类型为custom(自定义分词器),html_strip为es中自带的,自动过滤掉html标签,lowercase作用是大写转小写,“tokenizer”: "standard"表示在standard分词器基础上进行扩展。

PUT /my_index2
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&Toand": {
          "type": "mapping",
          "mappings": [
            "&=> and",
            "!=> not"
          ]
        }
      },
      "filter": {
        "my_stopwords": {
          "type": "stop",
          "stopwords": [
            "the",
            "a"
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": [
            "html_strip",
            "&Toand"
          ],
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stopwords"
          ]
        }
      }
    }
  }
}

进行测试

GET /my_index2/_analyze
{
  "text": "tom&jerry are a friend in the house, <a>, HAHA!!",
  "analyzer": "my_analyzer"
}

结果

{
  "tokens" : [
    {
      "token" : "tomandjerry",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "are",
      "start_offset" : 10,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "friend",
      "start_offset" : 16,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "in",
      "start_offset" : 23,
      "end_offset" : 25,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "house",
      "start_offset" : 30,
      "end_offset" : 35,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "hahanotnot",
      "start_offset" : 42,
      "end_offset" : 48,
      "type" : "<ALPHANUM>",
      "position" : 7
    }
  ]
}

3.4 ik分词器详解

ik配置文件地址:config目录下
在这里插入图片描述
文件主要作用:

  1. IKAnalyzer.cfg.xml:用来配置自定义词库
  2. main.dic:ik原生内置的中文词库,总共有27万多条,只要是这些单词,都会被分在一起
  3. quantifier.dic:放了一些单位相关的词
  4. suffix.dic:放了一些后缀
  5. surname.dic:中国的姓氏
  6. stopword.dic:英文停用词
  7. main.dic:包含了原生的中文词语,会按照这个里面的词语去分词
  8. stopword.dic:包含了英文的停用词

如何对IK分词器自定义词库?
方法1:
增加需要自定义的词库,更改指定配置文件中的内容,把增加的词库地址配置进去。
例如,我在config目录下新建了一个文件夹叫custom,然后里边有一个custom.dic文件
修改IKAnalyzer.cfg.xml配置文件内容(每个节点都要修改)

<entry key="ext_dict">custom/custom.dic</entry>

这种方法需要重启es,才能生效。

方法2(IK热更新):
把整个custom.dic文件放到一个指定的地址上,比如192.168.5.5:8888/custom.dic。当配置es 的时候把地址统一写成这个地址,此时你要更新custom.dic内容时,直接对它进行修改即可。也不需要再重启es了。

方法3(修改源码):
修改es中的源码,使其读取mysql中的词库。下载源码进行修改。


三、高亮显示

1.高亮简述

多查询的内容,进行高亮显示,类似百度搜索的结果。
在这里插入图片描述
高亮演示
先新建一个索引并增加一条数据。
指定某些字段使用的分词器。

PUT /test_highlight
{
  "mappings": {

      "properties": {
        "title": {
          "type": "text",
          "analyzer": "ik_max_word"
        },
        "content": {
          "type": "text",
          "analyzer": "ik_max_word"
        }
      }
    }
  
}

或者设置索引默认分词器

PUT /test_highlight
{
    "settings" : {
        "index" : {
            "analysis.analyzer.default.type": "ik_max_word"
        }
    }
}

插入数据

PUT /test_highlight/_doc/1
{
  "title": "这是july写的第一篇文章",
  "content": "大家好,这是我写的第一篇文章,特别喜欢这个文章"
}

查询内容进行高亮

GET /test_highlight/_doc/_search
{
  "query": {
    "match": {
      "title": "文章"
    }
  },
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

结果

{
  "took" : 416,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "test_highlight",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "title" : "这是july写的第一篇文章",
          "content" : "大家好,这是我写的第一篇文章,特别喜欢这个文章"
        },
        "highlight" : {
          "title" : [
            "这是july写的第一篇<em>文章</em>"
          ]
        }
      }
    ]
  }
}

<em></em>标签,会变成红色,所以说你的指定的field中,如果包含了那个搜索词的话,就会在那个field的文本中,对搜索词进行红色的高亮显示

注意:这里只有query中的title条件这一个字段进行高亮,如果你想让content也高亮的话,content字段需要出现在query中,如果只是添加在highlight中是不生效的!请看如下举例

GET /test_highlight/_doc/_search 
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "文章"
          }
        },
        {
          "match": {
            "content": "文章"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "content": {}
    }
  }
}

结果

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.68324494,
    "hits" : [
      {
        "_index" : "test_highlight",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.68324494,
        "_source" : {
          "title" : "这是july写的第一篇文章",
          "content" : "大家好,这是我写的第一篇文章,特别喜欢这个文章"
        },
        "highlight" : {
          "title" : [
            "这是july写的第一篇<em>文章</em>"
          ],
          "content" : [
            "大家好,这是我写的第一篇<em>文章</em>,特别喜欢这个<em>文章</em>"
          ]
        }
      }
    ]
  }
}

2.常用的highlight

  • plain highlight,lucene highlight,默认

  • posting highlight,index_options=offsets

posting性能比plain要高,因为不需要重新对高亮文本进行分词。对磁盘的消耗更少。

高亮查询如何使用posting方式
在新建索引时,指定mapping格式如下。
例如:要对content字段进行高亮,设置"index_options": “offsets”。

PUT /test_highlight
{
  "mappings": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "ik_max_word"
        },
        "content": {
          "type": "text",
          "analyzer": "ik_max_word",
          "index_options": "offsets"
        }
      }
  }
}

查询方式和默认高亮是一样的

GET /test_highlight/_doc/_search 
{
  "query": {
    "match": {
      "content": "文章"
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

3.fast vector highlight

index-time term vector设置在mapping中,就会用fast verctor highlight。
对大field而言(大于1mb),性能更高
如何使用
例如:要对content字段进行高亮,设置"term_vector" : “with_positions_offsets”
PUT /test_highlight

{
  "mappings": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "ik_max_word"
        },
        "content": {
          "type": "text",
          "analyzer": "ik_max_word",
          "term_vector" : "with_positions_offsets"
        }
      }
  }
}

查询方式也是一样的。
如何强制使用指定高亮类型查询

GET /test_highlight/_doc/_search 
{
  "query": {
    "match": {
      "content": "文章"
    }
  },
  "highlight": {
    "fields": {
      "content": {
        "type": "plain"
      }
    }
  }
}

4.高亮片段fragment的设置

场景:你需要高亮的内容’java’,对应字段中内容超过1w个字。那么我可能不需要把所有内容都拿出来,只需要拿出来一小部分就可以,也不需要把所有匹配的一下子都展示出来,只展示前边几个高亮的就可以。

GET /test_highlight/_search
{
    "query" : {
        "match": { "content": "文章" }
    },
    "highlight" : {
        "fields" : {
            "content" : {"fragment_size" : 5, "number_of_fragments" : 3 }
        }
    }
}

fragment_size: 默认是100,设置获取内容的长度。
number_of_fragments:你可能你的高亮的fragment文本片段有多个片段,你可以指定就显示几个片段。

四、 聚合搜索技术深入

1.bucket和metric

在Elasticsearch中,bucket和metric是两种重要的聚合(Aggregation)类型。它们被用于在搜索结果中分组、过滤和计算数据。
Bucket:是一个用于将文档分成段或者桶的聚合操作。我们可以将Bucket看作是一种分类操作,通过Bucket聚合可以将搜索结果按照某种规则进行分组,形成多个不同的Bucket。

常见的Bucket类型有:

  • Terms Bucket:按照指定字段的值进行分组,类似于SQL中的GROUP BY。
  • Date Histogram Bucket:按照时间间隔对文档进行分组,比如每天、每周、每月等。
  • Range Bucket:按照数值范围进行分组,例如按照价格区间进行分组。

Metric:是对Bucket中的文档进行计算的聚合操作。Metric通常会应用于已经分组的数据上,从而计算出汇总数据。
常见的Metric类型有:

  • Sum Metric:对指定字段的数值进行求和计算。
  • Avg Metric:对指定字段的数值进行平均计算。
  • Max Metric:对指定字段的数值取最大值。
  • Min Metric:对指定字段的数值取最小值。
  • Cardinality Metric:对指定字段的不同值进行计数。

举个例子,如果我们有一个包含产品销售记录的索引,其中有字段"category"表示产品类型,那么我们可以使用Terms Bucket对每种产品类型进行分组,然后再应用某些Metric,如Sum Metric来计算每种产品类型的总销售额。
这可以通过以下Elasticsearch查询实现:

{
    "aggs": {
        "sales_by_category": {
            "terms": { "field": "category" },
            "aggs": {
                "total_sales": { "sum": { "field": "price" } }
            }
        }
    }
}

上述查询首先使用Terms Bucket将所有产品按照产品类型进行分组,然后使用Sum Metric对每个分组内的价格进行求和,最终得到每个产品类型的总销售额。其中sales_by_category为自定的分组名称。

2聚合操作案例

新建索引,并插入数据。

PUT /cars
{
  "mappings": {
    "properties": {
      "price": {
        "type": "long"
      },
      "color": {
        "type": "keyword"
      },
      "brand": {
        "type": "keyword"
      },
      "model": {
        "type": "keyword"
      },
      "sold_date": {
        "type": "date"
      },
      "remark": {
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

添加数据

POST /cars/_bulk
{"index":{}}
{"price":258000,"color":"金色","brand":"大众","model":"大众迈腾","sold_date":"2021-10-28","remark":"大众中档车"}
{"index":{}}
{"price":123000,"color":"金色","brand":"大众","model":"大众速腾","sold_date":"2021-11-05","remark":"大众神车"}
{"index":{}}
{"price":239800,"color":"白色","brand":"标志","model":"标志508","sold_date":"2021-05-18","remark":"标志品牌全球上市车型"}
{"index":{}}
{"price":148800,"color":"白色","brand":"标志","model":"标志408","sold_date":"2021-07-02","remark":"比较大的紧凑型车"}
{"index":{}}
{"price":1998000,"color":"黑色","brand":"大众","model":"大众辉腾","sold_date":"2021-08-19","remark":"大众最让人肝疼的车"}
{"index":{}}
{"price":218000,"color":"红色","brand":"奥迪","model":"奥迪A4","sold_date":"2021-11-05","remark":"小资车型"}
{"index":{}}
{"price":489000,"color":"黑色","brand":"奥迪","model":"奥迪A6","sold_date":"2022-01-01","remark":"政府专用?"}
{"index":{}}
{"price":1899000,"color":"黑色","brand":"奥迪","model":"奥迪A 8","sold_date":"2022-02-12","remark":"很贵的大A6"}

①根据color分组统计销售数量
只执行聚合分组,不做复杂的聚合统计。在ES中最基础的聚合为terms,相当于SQL中的count。
在ES中默认为分组数据做排序,使用的是doc_count数据执行降序排列。可以使用_key元数据,根据分组后的字段数据执行不同的排序方案,也可以根据_count元数据,根据分组后的统计值执行不同的排序方案。

GET /cars/_search
{
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color",
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

结果,其中hits展示的是元数据内容,aggregations展示的是聚合后的内容。

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "VIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1899000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A 8",
          "sold_date" : "2022-02-12",
          "remark" : "很贵的大A6。。。"
        }
      }
    ]
  },
  "aggregations" : {
    "group_by_color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "黑色",
          "doc_count" : 3
        },
        {
          "key" : "白色",
          "doc_count" : 2
        },
        {
          "key" : "金色",
          "doc_count" : 2
        },
        {
          "key" : "红色",
          "doc_count" : 1
        }
      ]
    }
  }
}

如果不想要元数据则需设置一下size即可。

GET /cars/_search
{
  "size": 0, 
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color",
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

②统计不同color车辆的平均价格(下钻分析,aggs嵌套aggs)
本案例先根据color执行聚合分组,在此分组的基础上,对组内数据执行聚合统计,这个组内数据的聚合统计就是metric。同样可以执行排序,因为组内有聚合统计,且对统计数据给予了命名avg_by_price,所以可以根据这个聚合统计数据字段名执行排序逻辑。

GET /cars/_search
{
  "size": 0, 
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color",
        "order": {
          "avg_by_price": "asc"
        }
      },
      "aggs": {
        "avg_by_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "金色",
          "doc_count" : 2,
          "avg_by_price" : {
            "value" : 190500.0
          }
        },
        {
          "key" : "白色",
          "doc_count" : 2,
          "avg_by_price" : {
            "value" : 194300.0
          }
        },
        {
          "key" : "红色",
          "doc_count" : 1,
          "avg_by_price" : {
            "value" : 218000.0
          }
        },
        {
          "key" : "黑色",
          "doc_count" : 3,
          "avg_by_price" : {
            "value" : 1462000.0
          }
        }
      ]
    }
  }
}

size可以设置为0,表示不返回ES中的文档,只返回ES聚合之后的数据,提高查询速度,当然如果你需要这些文档的话,也可以按照实际情况进行设置。

③统计不同color不同brand中车辆的平均价格

查询

GET /cars/_search
{
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color",
        "order": {
          "avg_by_price_color": "asc"
        }
      },
      "aggs": {
        "avg_by_price_color": {
          "avg": {
            "field": "price"
          }
        },
        "group_by_brand": {
          "terms": {
            "field": "brand",
            "order": {
              "avg_by_price_brand": "desc"
            }
          },
          "aggs": {
            "avg_by_price_brand": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

结果

{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "VIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1899000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A 8",
          "sold_date" : "2022-02-12",
          "remark" : "很贵的大A6。。。"
        }
      }
    ]
  },
  "aggregations" : {
    "group_by_color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "金色",
          "doc_count" : 2,
          "avg_by_price_color" : {
            "value" : 190500.0
          },
          "group_by_brand" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "大众",
                "doc_count" : 2,
                "avg_by_price_brand" : {
                  "value" : 190500.0
                }
              }
            ]
          }
        },
        {
          "key" : "白色",
          "doc_count" : 2,
          "avg_by_price_color" : {
            "value" : 194300.0
          },
          "group_by_brand" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "标志",
                "doc_count" : 2,
                "avg_by_price_brand" : {
                  "value" : 194300.0
                }
              }
            ]
          }
        },
        {
          "key" : "红色",
          "doc_count" : 1,
          "avg_by_price_color" : {
            "value" : 218000.0
          },
          "group_by_brand" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "奥迪",
                "doc_count" : 1,
                "avg_by_price_brand" : {
                  "value" : 218000.0
                }
              }
            ]
          }
        },
        {
          "key" : "黑色",
          "doc_count" : 3,
          "avg_by_price_color" : {
            "value" : 1462000.0
          },
          "group_by_brand" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "大众",
                "doc_count" : 1,
                "avg_by_price_brand" : {
                  "value" : 1998000.0
                }
              },
              {
                "key" : "奥迪",
                "doc_count" : 2,
                "avg_by_price_brand" : {
                  "value" : 1194000.0
                }
              }
            ]
          }
        }
      ]
    }
  }
}

先根据color聚合分组,在组内根据brand再次聚合分组,这种操作可以称为下钻分析。(即嵌套定义)
aggs也可水平定义,、格式如下。

GET /index_name/type_name/_search
{
"aggs" : {
"分组名称1" : {},
"分组名称2" : {}
}
}

举例:

GET /cars/_search
{
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      }
    },
    "avg_by_price_color": {
      "avg": {
        "field": "price"
      }
    }
  }

}

结果

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "VIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1899000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A 8",
          "sold_date" : "2022-02-12",
          "remark" : "很贵的大A6。。。"
        }
      }
    ]
  },
  "aggregations" : {
    "avg_by_price_color" : {
      "value" : 671700.0
    },
    "group_by_color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "黑色",
          "doc_count" : 3
        },
        {
          "key" : "白色",
          "doc_count" : 2
        },
        {
          "key" : "金色",
          "doc_count" : 2
        },
        {
          "key" : "红色",
          "doc_count" : 1
        }
      ]
    }
  }
}

④统计不同color中的最大和最小价格、总价
查询

GET /cars/_search
{
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "max_price": {
          "max": {
            "field": "price"
          }
        },
        "min_price": {
          "min": {
            "field": "price"
          }
        },
        "sum_price": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "VIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1899000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A 8",
          "sold_date" : "2022-02-12",
          "remark" : "很贵的大A6。。。"
        }
      }
    ]
  },
  "aggregations" : {
    "group_by_color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "黑色",
          "doc_count" : 3,
          "max_price" : {
            "value" : 1998000.0
          },
          "min_price" : {
            "value" : 489000.0
          },
          "sum_price" : {
            "value" : 4386000.0
          }
        },
        {
          "key" : "白色",
          "doc_count" : 2,
          "max_price" : {
            "value" : 239800.0
          },
          "min_price" : {
            "value" : 148800.0
          },
          "sum_price" : {
            "value" : 388600.0
          }
        },
        {
          "key" : "金色",
          "doc_count" : 2,
          "max_price" : {
            "value" : 258000.0
          },
          "min_price" : {
            "value" : 123000.0
          },
          "sum_price" : {
            "value" : 381000.0
          }
        },
        {
          "key" : "红色",
          "doc_count" : 1,
          "max_price" : {
            "value" : 218000.0
          },
          "min_price" : {
            "value" : 218000.0
          },
          "sum_price" : {
            "value" : 218000.0
          }
        }
      ]
    }
  }
}

⑤统计不同品牌汽车中价格排名最高的车型
查询

GET cars/_search
{
  "size": 0,
  "aggs": {
    "group_by_brand": {
      "terms": {
        "field": "brand"
      },
      "aggs": {
        "top_car": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "price": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "includes": [
                "model",
                "price"
              ]
            }
          }
        }
      }
    }
  }
}

结果

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_brand" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "大众",
          "doc_count" : 3,
          "top_car" : {
            "hits" : {
              "total" : {
                "value" : 3,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "cars",
                  "_type" : "_doc",
                  "_id" : "UYR_-4cBUF6rBrkiDpRJ",
                  "_score" : null,
                  "_source" : {
                    "price" : 1998000,
                    "model" : "大众辉腾"
                  },
                  "sort" : [
                    1998000
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "奥迪",
          "doc_count" : 3,
          "top_car" : {
            "hits" : {
              "total" : {
                "value" : 3,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "cars",
                  "_type" : "_doc",
                  "_id" : "VIR_-4cBUF6rBrkiDpRJ",
                  "_score" : null,
                  "_source" : {
                    "price" : 1899000,
                    "model" : "奥迪A 8"
                  },
                  "sort" : [
                    1899000
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "标志",
          "doc_count" : 2,
          "top_car" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "cars",
                  "_type" : "_doc",
                  "_id" : "T4R_-4cBUF6rBrkiDpRJ",
                  "_score" : null,
                  "_source" : {
                    "price" : 239800,
                    "model" : "标志508"
                  },
                  "sort" : [
                    239800
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

2.1聚合操作之histogram 区间统计

histogram类似terms,也是进行bucket分组操作的,是根据一个field,实现数据区间分组。
例如:以100万为一个范围,统计不同范围内车辆的销售量和平均价格。那么使用histogram的聚合的时候,field指定价格字段price。区间范围是100万(即interval : 1000000)。这个时候ES会将price价格区间划分为: [0, 1000000), [1000000, 2000000), [2000000, 3000000)等,依次类推。在划分区间的同时,histogram会类似terms进行数据数量的统计(count),可以通过嵌套aggs对聚合分组后的组内数据做再次聚合分析。

查询

GET /cars/_search
{
  "aggs": {
    "histogram_by_price": {
      "histogram": {
        "field": "price",
        "interval": 1000000
      },
      "aggs": {
        "avg_by_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "VIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1899000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A 8",
          "sold_date" : "2022-02-12",
          "remark" : "很贵的大A6。。。"
        }
      }
    ]
  },
  "aggregations" : {
    "histogram_by_price" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 6,
          "avg_by_price" : {
            "value" : 246100.0
          }
        },
        {
          "key" : 1000000.0,
          "doc_count" : 2,
          "avg_by_price" : {
            "value" : 1948500.0
          }
        }
      ]
    }
  }
}

2.2date_histogram区间分组

date_histogram可以对date类型的field执行区间聚合分组,如每月销量,每年销量等。
如:以月为单位,统计不同月份汽车的销售数量及销售总金额。这个时候可以使用date_histogram实现聚合分组,其中field来指定用于聚合分组的字段,interval指定区间范围(可选值有:year、quarter、month、week、day、hour、minute、second),format指定日期格式化,min_doc_count指定每个区间的最少document(如果不指定,默认为0,当区间范围内没有document时,也会显示bucket分组),extended_bounds指定起始时间和结束时间(如果不指定,默认使用字段中日期最小值所在范围和最大值所在范围为起始和结束时间)。

举例:统计2021年到2022年这个区间统计总价。
es7.x之前版本的语法

GET /cars/_search
{
  "aggs": {
    "histogram_by_date": {
      "date_histogram": {
        "field": "sold_date",
        "interval": "month",
        "format": "yyyy-MM-dd",
        "min_doc_count": 1,
        "extended_bounds": {
          "min": "2021-01-01",
          "max": "2022-12-31"
        }
      },
      "aggs": {
        "sum_by_price": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果

#! Deprecation: [interval] on [date_histogram] is deprecated, use [fixed_interval] or [calendar_interval] in the future.
{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "VIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1899000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A 8",
          "sold_date" : "2022-02-12",
          "remark" : "很贵的大A6。。。"
        }
      }
    ]
  },
  "aggregations" : {
    "histogram_by_date" : {
      "buckets" : [
        {
          "key_as_string" : "2021-05-01",
          "key" : 1619827200000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 239800.0
          }
        },
        {
          "key_as_string" : "2021-07-01",
          "key" : 1625097600000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 148800.0
          }
        },
        {
          "key_as_string" : "2021-08-01",
          "key" : 1627776000000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 1998000.0
          }
        },
        {
          "key_as_string" : "2021-10-01",
          "key" : 1633046400000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 258000.0
          }
        },
        {
          "key_as_string" : "2021-11-01",
          "key" : 1635724800000,
          "doc_count" : 2,
          "sum_by_price" : {
            "value" : 341000.0
          }
        },
        {
          "key_as_string" : "2022-01-01",
          "key" : 1640995200000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 489000.0
          }
        },
        {
          "key_as_string" : "2022-02-01",
          "key" : 1643673600000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 1899000.0
          }
        }
      ]
    }
  }
}

es7.x版本之后的语法
查询
把关键字interval换成calendar_interval

GET /cars/_search
{
  "aggs": {
    "histogram_by_date": {
      "date_histogram": {
        "field": "sold_date",
        "calendar_interval": "month",
        "format": "yyyy-MM-dd",
        "min_doc_count": 1,
        "extended_bounds": {
          "min": "2021-01-01",
          "max": "2022-12-31"
        }
      },
      "aggs": {
        "sum_by_price": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "VIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1899000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A 8",
          "sold_date" : "2022-02-12",
          "remark" : "很贵的大A6。。。"
        }
      }
    ]
  },
  "aggregations" : {
    "histogram_by_date" : {
      "buckets" : [
        {
          "key_as_string" : "2021-05-01",
          "key" : 1619827200000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 239800.0
          }
        },
        {
          "key_as_string" : "2021-07-01",
          "key" : 1625097600000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 148800.0
          }
        },
        {
          "key_as_string" : "2021-08-01",
          "key" : 1627776000000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 1998000.0
          }
        },
        {
          "key_as_string" : "2021-10-01",
          "key" : 1633046400000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 258000.0
          }
        },
        {
          "key_as_string" : "2021-11-01",
          "key" : 1635724800000,
          "doc_count" : 2,
          "sum_by_price" : {
            "value" : 341000.0
          }
        },
        {
          "key_as_string" : "2022-01-01",
          "key" : 1640995200000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 489000.0
          }
        },
        {
          "key_as_string" : "2022-02-01",
          "key" : 1643673600000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 1899000.0
          }
        }
      ]
    }
  }
}

2.3_global bucket

在聚合统计数据的时候,有些时候需要对比部分数据和总体数据。
例如:
统计某品牌车辆平均价格和所有车辆平均价格。global是用于定义一个全局bucket,这个bucket会忽略query的条件,检索所有document进行对应的聚合统计。
查询

GET /cars/_search
{
  "size": 0,
  "query": {
    "match": {
      "brand": "大众"
    }
  },
  "aggs": {
    "volkswagen_of_avg_price": {
      "avg": {
        "field": "price"
      }
    },
    "all_avg_price": {
      "global": {},
      "aggs": {
        "all_of_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "all_avg_price" : {
      "doc_count" : 8,
      "all_of_price" : {
        "value" : 671700.0
      }
    },
    "volkswagen_of_avg_price" : {
      "value" : 793000.0
    }
  }
}

2.4 aggs+order(聚合+排序)

对聚合统计数据进行排序。
例如:
统计每个品牌的汽车销量和销售总额,按照销售总额的降序排列。
查询

GET /cars/_search
{
  "aggs": {
    "group_of_brand": {
      "terms": {
        "field": "brand",
        "order": {
          "sum_of_price": "desc"
        }
      },
      "aggs": {
        "sum_of_price": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "VIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1899000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A 8",
          "sold_date" : "2022-02-12",
          "remark" : "很贵的大A6。。。"
        }
      }
    ]
  },
  "aggregations" : {
    "group_of_brand" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "奥迪",
          "doc_count" : 3,
          "sum_of_price" : {
            "value" : 2606000.0
          }
        },
        {
          "key" : "大众",
          "doc_count" : 3,
          "sum_of_price" : {
            "value" : 2379000.0
          }
        },
        {
          "key" : "标志",
          "doc_count" : 2,
          "sum_of_price" : {
            "value" : 388600.0
          }
        }
      ]
    }
  }
}

如果有多层aggs,执行下钻聚合的时候,也可以根据最内层聚合数据执行排序。(即外层排序的内容可以使用里层的别名进行排序)
例如
统计每个品牌中每种颜色车辆的销售总额,并根据销售总额降序排列。这就像SQL中的分组排序一样,

只能组内数据排序,而不能跨组实现排序。

查询

GET /cars/_search
{
  "aggs": {
    "group_by_brand": {
      "terms": {
        "field": "brand"
      },
      "aggs": {
        "group_by_color": {
          "terms": {
            "field": "color",
            "order": {
              "sum_of_price": "desc"
            }
          },
          "aggs": {
            "sum_of_price": {
              "sum": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

结果

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "VIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 1899000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A 8",
          "sold_date" : "2022-02-12",
          "remark" : "很贵的大A6。。。"
        }
      }
    ]
  },
  "aggregations" : {
    "group_by_brand" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "大众",
          "doc_count" : 3,
          "group_by_color" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "黑色",
                "doc_count" : 1,
                "sum_of_price" : {
                  "value" : 1998000.0
                }
              },
              {
                "key" : "金色",
                "doc_count" : 2,
                "sum_of_price" : {
                  "value" : 381000.0
                }
              }
            ]
          }
        },
        {
          "key" : "奥迪",
          "doc_count" : 3,
          "group_by_color" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "黑色",
                "doc_count" : 2,
                "sum_of_price" : {
                  "value" : 2388000.0
                }
              },
              {
                "key" : "红色",
                "doc_count" : 1,
                "sum_of_price" : {
                  "value" : 218000.0
                }
              }
            ]
          }
        },
        {
          "key" : "标志",
          "doc_count" : 2,
          "group_by_color" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "白色",
                "doc_count" : 2,
                "sum_of_price" : {
                  "value" : 388600.0
                }
              }
            ]
          }
        }
      ]
    }
  }
}

2.5search+aggs (条件查询+聚合)

聚合类似SQL中的group by子句,search类似SQL中的where子句。在ES中是完全可以将search和aggregations整合起来,执行相对更复杂的搜索统计。
例如:
统计某品牌车辆每个季度的销量和销售额。
查询

GET /cars/_search
{
  "query": {
    "match": {
      "brand": "大众"
    }
  },
  "aggs": {
    "histogram_by_date": {
      "date_histogram": {
        "field": "sold_date",
        "calendar_interval": "quarter",
        "min_doc_count": 1
      },
      "aggs": {
        "sum_by_price": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.9444616,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 0.9444616,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 0.9444616,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 0.9444616,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      }
    ]
  },
  "aggregations" : {
    "histogram_by_date" : {
      "buckets" : [
        {
          "key_as_string" : "2021-07-01T00:00:00.000Z",
          "key" : 1625097600000,
          "doc_count" : 1,
          "sum_by_price" : {
            "value" : 1998000.0
          }
        },
        {
          "key_as_string" : "2021-10-01T00:00:00.000Z",
          "key" : 1633046400000,
          "doc_count" : 2,
          "sum_by_price" : {
            "value" : 381000.0
          }
        }
      ]
    }
  }
}

2.6filter+aggs(过滤+聚合)

filter也可以和aggs组合使用实现过滤聚合分析。
例如:
统计10万–50万之间的车辆的平均价格。

GET /cars/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 100000,
            "lte": 500000
          }
        }
      }
    }
  },
  "aggs": {
    "avg_by_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}

结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "T4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 239800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志508",
          "sold_date" : "2021-05-18",
          "remark" : "标志品牌全球上市车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UIR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 148800,
          "color" : "白色",
          "brand" : "标志",
          "model" : "标志408",
          "sold_date" : "2021-07-02",
          "remark" : "比较大的紧凑型车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UoR_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 218000,
          "color" : "红色",
          "brand" : "奥迪",
          "model" : "奥迪A4",
          "sold_date" : "2021-11-05",
          "remark" : "小资车型"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "U4R_-4cBUF6rBrkiDpRJ",
        "_score" : 1.0,
        "_source" : {
          "price" : 489000,
          "color" : "黑色",
          "brand" : "奥迪",
          "model" : "奥迪A6",
          "sold_date" : "2022-01-01",
          "remark" : "政府专用?"
        }
      }
    ]
  },
  "aggregations" : {
    "avg_by_price" : {
      "value" : 246100.0
    }
  }
}

2.7聚合中使用filter

filter也可以使用在aggs句法中,filter的范围决定了其过滤的范围。
如:统计某品牌汽车最近一年的销售总额。将filter放在aggs内部,代表这个过滤器只对query搜索得到的结果执行filter过滤。如果filter放在aggs外部,过滤器则会过滤所有的数据。

①12M/M 表示 12 个月。
②1y/y 表示 1年。
③d 表示天

查询

GET /cars/_search
{
  "query": {
    "match": {
      "brand": "大众"
    }
  },
  "aggs": {
    "count_last_year": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-12M"
          }
        }
      },
      "aggs": {
        "sum_of_price_last_year": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.9444616,
    "hits" : [
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "TYR_-4cBUF6rBrkiDpRJ",
        "_score" : 0.9444616,
        "_source" : {
          "price" : 258000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众迈腾",
          "sold_date" : "2021-10-28",
          "remark" : "大众中档车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "ToR_-4cBUF6rBrkiDpRJ",
        "_score" : 0.9444616,
        "_source" : {
          "price" : 123000,
          "color" : "金色",
          "brand" : "大众",
          "model" : "大众速腾",
          "sold_date" : "2021-11-05",
          "remark" : "大众神车"
        }
      },
      {
        "_index" : "cars",
        "_type" : "_doc",
        "_id" : "UYR_-4cBUF6rBrkiDpRJ",
        "_score" : 0.9444616,
        "_source" : {
          "price" : 1998000,
          "color" : "黑色",
          "brand" : "大众",
          "model" : "大众辉腾",
          "sold_date" : "2021-08-19",
          "remark" : "大众最让人肝疼的车"
        }
      }
    ]
  },
  "aggregations" : {
    "count_last_year" : {
      "meta" : { },
      "doc_count" : 0,
      "sum_of_price_last_year" : {
        "value" : 0.0
      }
    }
  }
}

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/32071.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Spring 事务的相关配置、传播行为、隔离级别及注解配置声明式事务

目录 一、事务的相关配置 1. 添加测试标签 2. 添加对应方法 3. 测试 二、事务的传播行为 三、事务的隔离级别 四、注解配置声明式事务 1. 注册事务注解驱动 2. 加上注解 3. 配置类代替xml文件中的注解事务支持 4. 测试 往期专栏&文章相关导读 1. Maven系列专栏…

【三维编辑】Editing Conditional Radiance Fields 编辑条件辐射场

Editing Conditional Radiance Fields&#xff08;ICCV 2021&#xff09; 作者单位&#xff1a;Steven Liu, Xiuming Zhang, Zhoutong Zhang, Richard Zhang MIT, Adobe Research, CMU 代码地址&#xff1a;https://github.com/stevliu/editnerf 文章目录 摘要前言一、相关工作…

23. 数据结构之位图

前言 之前在讲散列表的时候&#xff0c;提到过位图的概念。位图&#xff08;Bitmap&#xff09;作为一种特殊的数据结构&#xff0c;它使用一系列位来表示数据&#xff0c;每个位只有两个状态&#xff08;0或1&#xff09;。由于它的高效性和节省空间的特性&#xff0c;位图在…

MySQL事务相关笔记

杂项 InnoDB最大特点&#xff1a;支持事务和行锁&#xff1b; MyISAM不支持事务 介绍 一个事务是由一条或者多条对数据库操作的SQL语句所组成的一个不可分割的单元&#xff0c;只有当事务中的所有操作都正常执行完了&#xff0c;整个事务才会被提交给数据库。事务有如下特性…

使用传统图像处理算法+机器学习进行shadow detection

前言 阴影是图像中常见的现象&#xff0c;它们对于场景理解和分析非常重要。由于阴影区域通常比较暗淡&#xff0c;而且与周围物体区别较大&#xff0c;因此在图像处理和计算机视觉领域中&#xff0c;阴影检测是一个重要的研究方向。传统的阴影检测算法通常基于阈值或边缘检测…

SVM算法的介绍

一、SVM算法的介绍 1.什么是SVM算法&#xff1f; SVM&#xff08;Support Vector Machine&#xff09;是一种常见的监督学习算法&#xff0c;用于进行二分类或多分类任务。它的主要思想是找到一个最优的超平面&#xff0c;将不同类别的样本分隔开。 超平面最大间隔介绍&#…

人体姿态估计技术的理解(Human Pose Estimination)

本人毕设题目是人体姿态估计技术的相关课题&#xff0c;本人按照自己对人体姿态估计技术的学习和理解进行论述&#xff0c;如有不足&#xff0c;请大家指正&#xff01;&#xff01;&#xff01; 首先讨论一个问题&#xff1a;什么是姿态估计? “姿势估计?……姿势这个词对…

opencv如何使用GPU的三种方法

我在工作实验涉及到图像和视频处理时&#xff0c;通常使用opencv提供的库来做处理&#xff0c;虽然OpenCV是一个广泛使用的库&#xff0c;它提供了丰富的功能和工具。然而&#xff0c;有时候在处理大量图片或视频时&#xff0c;我们可能会面临速度受限的问题。 opencv执行图像…

【C/C++】之内存管理(超详细练气篇)

个人主页&#xff1a;平行线也会相交&#x1f4aa; 欢迎 点赞&#x1f44d; 收藏✨ 留言✉ 加关注&#x1f493;本文由 平行线也会相交 原创 收录于专栏【C之路】&#x1f48c; 本专栏旨在记录C的学习路线&#xff0c;望对大家有所帮助&#x1f647;‍ 希望我们一起努力、成长&…

基本 SQL 命令 、重要的 SQL命令、SQL 约束 及 SQL语句 的 执行顺序

学习目标&#xff1a; 学习目标如下&#xff1a; SQL语句执行顺序 学习内容&#xff1a; 基本 SQL 命令&#xff1a; FROMONJOINWHEREGROUP BYAGG_FUNCWITHHAVINGSELECT 从数据库中提取数据UNIONDISTINCTORDER BY 排序LIMIT 重要的sql命令&#xff1a; 1、SELECT - 从数据…

Finalshell安全吗?Xshell怎么样?

文章目录 一、我的常用ssh连接工具二、Xshell2.1 下载&#xff1a;认准官网2.2 Xshell 配置2.3 Xftp和WinSCP 一、我的常用ssh连接工具 之前讲过&#xff1a; 【服务器】远程连接选SSH&#xff08;PUTTY、Finalshell、WinSCP&#xff09; 还是 远程桌面&#xff08;RDP、VNC、…

解决 CentOS/Alma 安装 libpcap-devel 报错:No match for argument: libpcap-devel

环境&#xff1a;Alma 8.5、Centos 7.x 解决方案 Linux 安装软件的时候&#xff0c;需要 libpcap-devel 这个组件&#xff0c;执行命令&#xff1a;yum install libpcap-devel &#xff0c;然后报错如下&#xff1a; Last metadata expiration check: 0:05:24 ago on Mon 12…

【算法】数学相关知识总结

文章目录 gcd 和 lcm取模运算 %求一个点和一片矩形区域之间的最短距离 本文用于记录一些关于算法题中偶尔被使用到的数学相关知识。 gcd 和 lcm gcd 和 lcm 分别是 最大公约数&#xff08;Greatest common divisor&#xff09; 和 最小公因数&#xff08;Least Common Multip…

机器学习——决策树算法

一、实验目的 掌握如何实现决策树算法&#xff0c;用并决策树算法完成预测。 二、实验内容 本次实验任务我们使用贷款申请样本数据表&#xff0c;该数据表中每列数据分别代表ID、年龄、高薪、有房、信贷情况、类别&#xff0c;我们根据如下数据生成决策树&#xff0c;使用代…

二值化的mask生成yolov5-7.0的实例分割训练标签

背景&#xff1a;要用yolov5-7.0训练分割&#xff0c;这里使用自己的数据&#xff0c;mask是二值化的数据&#xff0c;要先转换成COCO格式&#xff0c;这里用imantics实现。 详见&#xff1a;https://zhuanlan.zhihu.com/p/427096258 截取部分代码如下图&#xff0c;读取image图…

ninja的简单使用

文章目录 Ninja安装windows环境Linux环境 入门使用与CMake一起使用 Ninja安装 windows环境 问题的解决通常有多种方法。按照结果的好坏程度&#xff0c;可以将解决方法简单的划分为&#xff0c;上中下三个层次&#xff0c;见:为什么谋士总喜欢提上中下三策&#xff1f; 在w…

C++静态和动态链接库导出和使用

1、简介 代码开发过程中会遇到很多已有的函数库&#xff0c;这些函数库是现有的&#xff0c;成熟的&#xff0c;可以复用的代码。现实中每个程序都要依赖很多基础的底层库&#xff0c;不可能每个人的代码都从零开始&#xff0c;因此库的存在意义非同寻常。 本质上来说库是一种…

在 K8S 中部署一个应用 上

本身在 K8S 中部署一个应用是需要写 yaml 文件的&#xff0c;我们这次简单部署&#xff0c;通过拉取网络上的镜像来部署应用&#xff0c;会用图解的方式来分享一下&#xff0c;过程中都发生了什么 简单部署一个程序 我们可以通过 kubectl run 的方式来简单部署一个应用&#…

测试技术体系

目录&#xff1a; 软件测试分类分层测试体系 1.软件测试分类 软件测试的分类_安全性测试属于功能测试吗_阿瞒有我良计15的博客-CSDN博客 1.单元测试&#xff08;Unit Testing&#xff09;&#xff1a;单元测试是指对软件的最小可测试单元进行测试&#xff0c;例如一个函数、一…

ES+Redis+MySQL,这个高可用架构设计

一、背景 会员系统是一种基础系统&#xff0c;跟公司所有业务线的下单主流程密切相关。如果会员系统出故障&#xff0c;会导致用户无法下单&#xff0c;影响范围是全公司所有业务线。所以&#xff0c;会员系统必须保证高性能、高可用&#xff0c;提供稳定、高效的基础服务。 …
最新文章