ElasticSearch 零基础入门（3）: 查询语法和聚合语法

说明

ES 的查询语句相对复杂，要从搜索引擎的立场理解 ES 查询语句设计思路。

准备数据

参照 Quick Start（这个文档里有些查询语法过时了）中的例子准备查询数据。

提交数据，自动创建 index 和 mapping：

POST demo/_doc
{
  "@timestamp": "2011-05-06T16:21:15.000Z",
  "author": {
    "name": "朱自清",
    "gender": "男"
  },
  "title": "春天来了",
  "content": "盼望着，盼望着，东风来了，春天的脚步近了。一切都像刚睡醒的样子，欣欣然张开了眼。山朗润起来了，水涨起来了，太阳的脸红起来了。",
  "click_count":100,
  "workplace": ["中国","北京","浙江","上海"]
}

查看 mapping，es 会自动推断 field 类型：

GET demo/_mapping
{
  "demo" : {
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "author" : {
          "properties" : {
            "gender" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "click_count" : {
          "type" : "long"
        },
...省略...

批量写入数据：

# 注意 ：create 下面的要插入的数据不能分行
PUT demo/_bulk
{"create": { } }
{ "@timestamp": "2011-05-06T16:21:15.000Z", "author": {"name": "朱自清","gender": "男"},"title": "背影","content": "<p>  我与父亲不相见已二年余了，我最不能忘记的是他的背影。</p><p>  那年冬天，祖母死了，父亲的差使也交卸了，正是祸不单行的日子。我从北京到徐州，打算跟着父亲奔丧回家。到徐州见着父亲，看见满院狼藉的东西，又想起祖母，不禁簌簌地流下眼泪。父亲说：“事已如此，不必难过，好在天无绝人之路！”</p>","click_count":200,"workplace": ["中国","北京","浙江","上海"]}
{ "create": { } }
{"@timestamp": "2011-05-06T16:21:15.000Z","author": {"name": "鲁迅","gender": "男"},"title": "从百草园到三味书屋原文","content": "<p>我家的后面有一个很大的园，相传叫作百草园。现在是早已并屋子一起卖给朱文公的子孙了，连那最末次的相见也已经隔了七八年，其中似乎确凿只有一些野草；但那时却是我的乐园。 </p>","click_count":50,"workplace": ["中国","北京","重庆","厦门"]}
{ "create": { } }
{"@timestamp": "2011-05-06T16:21:15.000Z","author": {"name": "舒婷","gender": "女"},"title": "致橡树","content": "我如果爱你—— 绝不像攀援的凌霄花，借你的高枝炫耀自己；我如果爱你——绝不学痴情的鸟儿，为绿荫重复单调的歌曲；也不止像泉源，常年送来清凉的慰藉；也不止像险峰，增加你的高度，衬托你的威仪。","click_count":600, "workplace": ["中国","北京","福建","厦门"]}

查看所有数据：

GET demo/_search
{
  "query": {
    "match_all": {}
  }
}

数据不再使用后，用下面的命令删除：

DELETE demo

Search 接口

Search API 的 uri 是 “/index名称/_search”，方法为 GET，支持的参数见 Search API 。

可以用 kibana 提供的 Dev Tools 通过 es 的 http 接口提交查询语句，ES query 语法见 ES Query DSL。

这里只举一个例子，返回所有文档的 event field 的数值，按照 @timestamp 倒排序：

GET demo/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["title"],   <-- 指定返回的 field
  "sort": [
    {
      "@timestamp": {     <-- 按 timestamp 排序
        "order": "desc"
      }
    }
  ]
}

ES 根据查询语句对所有 Document 进行相关性评分，查询结果按照评分从高到低排列。_score 是每个文档的得分，最高分是 1.0。

这里只是示例，和前面的查询语句不对应
{
  "took" : 254,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,    <-- 结果中的最高分
    "hits" : [
      {
        "_index" : "demo",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,   <-- 当前文档的相关性评分 
        "_source" : {
          "age" : 11,
          "email" : "[email protected]",
          "name" : "小王"
        }
      },
      {
        "_index" : "demo",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,   <-- 当前文档的相关性评分 
        "_source" : {
          "age" : 11,
          "email" : "[email protected]",
          "name" : "小明"
        }
      }
    ]
  }
}

Query 语句结构

查询语句分为叶子语句（Leaf query clauses）和组合语句（Compound query clauses）。

叶子语句：针对单个 field 的查询，样式为某个 field 的数值满足 xx 条件，譬如 match 语句。

GET /banke/_search
{
  "query": { "match": { "address": "mill lane" } }
}

组合语句：是叶子语句或组合语句的组合，一次请求中给出了多个数值要求，譬如 bool 语句。

POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user.id" : "kimchy" }
      },
      "filter": {
        "term" : { "tags" : "production" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tags" : "env1" } },
        { "term" : { "tags" : "deployed" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

Query 执行上下文

ES 查询语句有 query context 和 filter context 两个执行上下文，所在上下文不同，语句用途不同。

query 中的查询语句处于 query context：

GET /banke/_search
{
  "query": { "match": { "address": "mill lane" } }    <-- 位于 query context
}

query 的 filter 中的查询语句处于 filter context，例如 bool 语句中的 filter ：

GET /_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "title":   "Search"        }},    <--  位于 query context
        { "match": { "content": "Elasticsearch" }}
      ],
      "filter": [   
        { "term":  { "status": "published" }},         <--  位于 filter context
        { "range": { "publish_date": { "gte": "2015-01-01" }}}
      ]
    }
  }
}

query context：按照语句要求对 es 中的文档进行相关性评分，选出相关性高的文档。

filter context：按照语句要求对查询结果进行过滤。

即 es 的查询操作和过滤操作，使用了同一套语法。

Query 查询功能

全部查询：match_all / match_none

match_all 最简单的查询语句，没有查询条件，把 es 中所有文档都查出来：

GET /_search
{
    "query": {
        "match_all": {}
    }
}

match_all 默认所有文档的相关性得分都是默认值 1.0，如果要修改默认值用 boost 设置：

GET /_search
{
  "query": {
    "match_all": { "boost" : 1.2 }  <--  所有文档的相关性得分都设置为 1.2
  }
}

match_none 是 match_all 的取反，返回空结果：

GET /_search
{
  "query": {
    "match_none": {}
  }
}

精确查询：Term-level Queries

term-level-queries 用于对结构化数据进行精确查询，主要有以下语句：

exists
ids
prefix
range
regexp
wildcard
term
terms
terms_set
fuzzy
type

语句：exists

exists 查找包含指定列的文档:

## 是否存在
GET demo/_search
{
  "query": {
    "exists": {
      "field": "content"
    }
  }
}

语句：ids

ids 指定目标文档的ID：

## 按文档 id 查询
GET demo/_search
{
  "query": {
    "ids": {
      "values": ["OqdvxnwB7Uq2vygbUxnh","PadbxnwB7Uq2vygbxRcp"]
    }
  }
}

语句：prefix

prefix 前缀匹配：

## 前缀匹配
GET demo/_search
{
  "query":{
    "prefix": {
      "title": {
        "value": "背"
      }
    }
  }
}

GET demo/_search
{
  "query":{
    "prefix": {
      "title.keyword": {
        "value": "背"
      }
    }
  }
}

语句：range

范围内使用 range，譬如查找指定时间段的数据：

## 范围查找
GET demo/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2011-01-01",
        "lte": "2013-01-01"
      }
    }
  }
}

语句：regexp

regexp 正则匹配：

## 正则匹配
GET demo/_search
{
  "query":{
    "regexp": {
      "author.name": "朱.*"
    }
  }
}

语句：wildcard

wildcard 支持通配符，? 匹配一个字符，* 匹配多个字符：

## 通配符
GET demo/_search
{
  "query": {
    "wildcard": {
      "author.name": {
        "value": "朱*"
      }
    }
  }
}

语句：term

term 用于 field 字段数值的精确匹配：

## field 值精确匹配
GET demo/_search
{
  "query":{
    "term":{
      "author.name.keyword": {
        "value": "朱自清"
      }
    }
  }
}

GET demo/_search
{
  "query":{
    "term":{
      "click_count": {
        "value":100
      }
    }
  }
}

语句：terms

terms 可指定多个匹配数值：

## field 值匹配
GET demo/_search
{
  "query":{
    "terms":{
      "author.name.keyword": ["鲁迅","舒婷"]
    }
  }
}

语句：terms_set

terms-set，查找和目标 terms 有 N 个匹配的文档：

GET demo/_search
{
  "query":{
    "terms_set":{
      "workplace.keyword":{
        "terms":["北京","福建","厦门"],
        "minimum_should_match_script":{
          "source":"params['atLeastNum']",
          "params":{
            "atLeastNum": 2
          }
        }
      }
    }
  }
}

语句：type

从 7.0 开始 Index 不再支持多 type，type 全部为为 _doc，type 没有使用意义了。

## 类型查询
GET demo/_search
{
  "query": {
    "type":{
      "value": "_doc"
    }
  }
}

语句：fuzzy

fuzzy 和输入字符串位于特定编辑距离内：

## 近似值
GET demo/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "影背",
        "fuzziness": 1
      }
    }
  }
}

全文检索：full text queries

针对 text 类型的 field，使用 full text queries ，支持以下语句：

match
match_bool_prefix
match_phrase
match_phrase_prefix
multi_match
intervals
combined_fields
query_string
simple_query_string

语句：match

match 可以用于多种类型，text/numbers/date/boolean。

GET demo/_search
{
  "query":{
    "match": {
      "content": "我与父亲"
    }
  }
}

组合查询：Compound queries

查询条件不止一个时，用 Compound queries 进行组合，支持以下语句：

bool
boosting
constant_score
dis_max
function_score

语句：bool

组合多个子查询语句，子查询语句的组合条件可以是 must、filter、should、must_not。

ES Boolean query：

GET demo/_search
{
  "query":{
    "bool":{
      "must": [
        {"term":{"author.name.keyword":"朱自清"}},
        {"match":{"content":"春天"}}
      ],
      "filter": [
        {"term":{"click_count": 200}}
      ]
    }
  }
}

语句：boosting

降低命中特定条件的文档的评分，ES boosting query：

GET /_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "text": "apple"
        }
      },
      "negative": {
        "term": {
          "text": "pie tart fruit crumble tree"
        }
      },
      "negative_boost": 0.5
    }
  }
}

语句：constant_score

为匹配的文档设置固定的评分，ES constant score：

GET /_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "user.id": "kimchy" }
      },
      "boost": 1.2
    }
  }
}

语句：dis_max

ES Disjunction max query

语句：function_score

ES Function score query

连接查询：Join

es 的 Joining queries 语句提供了一定的连接查询能力：

nested
has child
has parent
parent id

地理查询：Geo

es 支持地理数据查询，Geo queries：

geo-bounding box
geo-distance
geo-polygon
geoshape

几何查询：Shape

es 支持存放二维的几何图形，Shape queries：

shape

无法归类的特殊查询

还有一些特殊查询无法归类， Specialized queries：

distance feature
more like this
percolate
rank feature
script
script score
wrapper
pinned query

Aggregation 聚合语法

aggs 语句用于将结果按照指定 field 的数值进行分组/聚合/数值运算。

GET demo/_search
{
  "size":0,     <-- 只返回聚合数据，不返回命中的文档（hits） 
  "aggs": {
    "avg_click_count": {
      "avg": {
        "field": "click_count"
      }
    }
  }
}

GET demo/_search
{
  "size":0,
  "aggs": {
    "avg_click_count": {
      "avg": {
        "field": "click_count"
      }
    },
    "author_doc_count":{
      "terms": {
        "field": "author.name.keyword",
        "size": 10
      }
    }
  }
}

GET demo/_search
{
  "size":0,
  "aggs": {
    "author_doc_count":{
      "terms": {
        "field": "author.name.keyword",
        "size": 10,
        "order":{
          "avg_click_count":"desc"
        }
      },
      "aggs":{
        "avg_click_count":{
          "avg":{
            "field": "click_count"
          }
        }
      }
    }
  }
}

SQL 查询

SQL查询

参考

编程基础系统基础关注跟踪

Kubernetes Prometheus 超级账本

ElasticSearch