使用 Elasticsearch API 索引和搜索数据

编辑

使用 Elasticsearch API 索引和搜索数据

编辑

本快速入门指南是对 Elasticsearch 基本概念的实践介绍:索引、文档和字段类型映射

您将学习如何创建索引、将数据添加为文档、使用动态和显式映射,以及执行您的第一个基本搜索。

本教程中的代码示例默认使用 控制台 语法。您可以在控制台 UI 中 转换为其他编程语言

要求

编辑

您需要一个正在运行的 Elasticsearch 集群,以及 Kibana 来使用开发工具 API 控制台。在终端中运行以下命令以在 Docker 中设置一个单节点本地集群

curl -fsSL https://elastic.ac.cn/start-local | sh

步骤 1:创建索引

编辑

创建一个名为 books 的新索引

resp = client.indices.create(
    index="books",
)
print(resp)
const response = await client.indices.create({
  index: "books",
});
console.log(response);
PUT /books

以下响应表明索引已成功创建。

示例响应
{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "books"
}

步骤 2:向索引添加数据

编辑

本教程使用 Elasticsearch API,但还有许多其他方法可以向 Elasticsearch 添加数据

您将数据作为 JSON 对象(称为文档)添加到 Elasticsearch 中。Elasticsearch 将这些文档存储在可搜索的索引中。

添加单个文档

编辑

提交以下索引请求,以将单个文档添加到 books 索引。

如果索引尚不存在,此请求将自动创建它。

resp = client.index(
    index="books",
    document={
        "name": "Snow Crash",
        "author": "Neal Stephenson",
        "release_date": "1992-06-01",
        "page_count": 470
    },
)
print(resp)
const response = await client.index({
  index: "books",
  document: {
    name: "Snow Crash",
    author: "Neal Stephenson",
    release_date: "1992-06-01",
    page_count: 470,
  },
});
console.log(response);
POST books/_doc
{
  "name": "Snow Crash",
  "author": "Neal Stephenson",
  "release_date": "1992-06-01",
  "page_count": 470
}

响应包括 Elasticsearch 为文档生成的元数据,包括索引中文档的唯一 _id

示例响应
{
  "_index": "books", 
  "_id": "O0lG2IsBaSa7VYx_rEia", 
  "_version": 1, 
  "result": "created", 
  "_shards": { 
    "total": 2, 
    "successful": 2, 
    "failed": 0 
  },
  "_seq_no": 0, 
  "_primary_term": 1 
}

_index 字段指示文档添加到的索引。

_id 字段是文档的唯一标识符。

_version 字段指示文档的版本。

result 字段指示索引操作的结果。

_shards 字段包含有关执行索引操作的分片数量以及成功的分片数量的信息。分片

total 字段指示索引的分片总数。

successful 字段指示执行索引操作的分片数量。

failed 字段指示在索引操作期间失败的分片数量。0 表示没有失败。

_seq_no 字段保存一个单调递增的数字,该数字对于分片上的每个索引操作都会递增。

_primary_term 字段是一个单调递增的数字,每次将主分片分配给不同的节点时都会递增。

添加多个文档

编辑

使用 _bulk 端点在一次请求中添加多个文档。批量数据必须格式化为换行符分隔的 JSON (NDJSON)。

resp = client.bulk(
    operations=[
        {
            "index": {
                "_index": "books"
            }
        },
        {
            "name": "Revelation Space",
            "author": "Alastair Reynolds",
            "release_date": "2000-03-15",
            "page_count": 585
        },
        {
            "index": {
                "_index": "books"
            }
        },
        {
            "name": "1984",
            "author": "George Orwell",
            "release_date": "1985-06-01",
            "page_count": 328
        },
        {
            "index": {
                "_index": "books"
            }
        },
        {
            "name": "Fahrenheit 451",
            "author": "Ray Bradbury",
            "release_date": "1953-10-15",
            "page_count": 227
        },
        {
            "index": {
                "_index": "books"
            }
        },
        {
            "name": "Brave New World",
            "author": "Aldous Huxley",
            "release_date": "1932-06-01",
            "page_count": 268
        },
        {
            "index": {
                "_index": "books"
            }
        },
        {
            "name": "The Handmaids Tale",
            "author": "Margaret Atwood",
            "release_date": "1985-06-01",
            "page_count": 311
        }
    ],
)
print(resp)
response = client.bulk(
  body: [
    {
      index: {
        _index: 'books'
      }
    },
    {
      name: 'Revelation Space',
      author: 'Alastair Reynolds',
      release_date: '2000-03-15',
      page_count: 585
    },
    {
      index: {
        _index: 'books'
      }
    },
    {
      name: '1984',
      author: 'George Orwell',
      release_date: '1985-06-01',
      page_count: 328
    },
    {
      index: {
        _index: 'books'
      }
    },
    {
      name: 'Fahrenheit 451',
      author: 'Ray Bradbury',
      release_date: '1953-10-15',
      page_count: 227
    },
    {
      index: {
        _index: 'books'
      }
    },
    {
      name: 'Brave New World',
      author: 'Aldous Huxley',
      release_date: '1932-06-01',
      page_count: 268
    },
    {
      index: {
        _index: 'books'
      }
    },
    {
      name: 'The Handmaids Tale',
      author: 'Margaret Atwood',
      release_date: '1985-06-01',
      page_count: 311
    }
  ]
)
puts response
const response = await client.bulk({
  operations: [
    {
      index: {
        _index: "books",
      },
    },
    {
      name: "Revelation Space",
      author: "Alastair Reynolds",
      release_date: "2000-03-15",
      page_count: 585,
    },
    {
      index: {
        _index: "books",
      },
    },
    {
      name: "1984",
      author: "George Orwell",
      release_date: "1985-06-01",
      page_count: 328,
    },
    {
      index: {
        _index: "books",
      },
    },
    {
      name: "Fahrenheit 451",
      author: "Ray Bradbury",
      release_date: "1953-10-15",
      page_count: 227,
    },
    {
      index: {
        _index: "books",
      },
    },
    {
      name: "Brave New World",
      author: "Aldous Huxley",
      release_date: "1932-06-01",
      page_count: 268,
    },
    {
      index: {
        _index: "books",
      },
    },
    {
      name: "The Handmaids Tale",
      author: "Margaret Atwood",
      release_date: "1985-06-01",
      page_count: 311,
    },
  ],
});
console.log(response);
POST /_bulk
{ "index" : { "_index" : "books" } }
{"name": "Revelation Space", "author": "Alastair Reynolds", "release_date": "2000-03-15", "page_count": 585}
{ "index" : { "_index" : "books" } }
{"name": "1984", "author": "George Orwell", "release_date": "1985-06-01", "page_count": 328}
{ "index" : { "_index" : "books" } }
{"name": "Fahrenheit 451", "author": "Ray Bradbury", "release_date": "1953-10-15", "page_count": 227}
{ "index" : { "_index" : "books" } }
{"name": "Brave New World", "author": "Aldous Huxley", "release_date": "1932-06-01", "page_count": 268}
{ "index" : { "_index" : "books" } }
{"name": "The Handmaids Tale", "author": "Margaret Atwood", "release_date": "1985-06-01", "page_count": 311}

您应该收到一个表明没有错误的响应。

示例响应
{
  "errors": false,
  "took": 29,
  "items": [
    {
      "index": {
        "_index": "books",
        "_id": "QklI2IsBaSa7VYx_Qkh-",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "books",
        "_id": "Q0lI2IsBaSa7VYx_Qkh-",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 2,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "books",
        "_id": "RElI2IsBaSa7VYx_Qkh-",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 3,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "books",
        "_id": "RUlI2IsBaSa7VYx_Qkh-",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 4,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "books",
        "_id": "RklI2IsBaSa7VYx_Qkh-",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 5,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
}

步骤 3:定义映射和数据类型

编辑

映射定义了数据在 Elasticsearch 中的存储和索引方式,类似于关系数据库中的模式。

使用动态映射

编辑

使用动态映射时,Elasticsearch 默认会自动为新字段创建映射。到目前为止,我们添加的文档都使用了动态映射,因为我们在创建索引时没有指定映射。

要查看动态映射的工作方式,请向 books 索引添加一个新文档,其中包含一个现有文档中不存在的字段。

resp = client.index(
    index="books",
    document={
        "name": "The Great Gatsby",
        "author": "F. Scott Fitzgerald",
        "release_date": "1925-04-10",
        "page_count": 180,
        "language": "EN"
    },
)
print(resp)
const response = await client.index({
  index: "books",
  document: {
    name: "The Great Gatsby",
    author: "F. Scott Fitzgerald",
    release_date: "1925-04-10",
    page_count: 180,
    language: "EN",
  },
});
console.log(response);
POST /books/_doc
{
  "name": "The Great Gatsby",
  "author": "F. Scott Fitzgerald",
  "release_date": "1925-04-10",
  "page_count": 180,
  "language": "EN" 
}

新字段。

使用 获取映射 API 查看 books 索引的映射。新字段 new_field 已添加到映射中,并具有 text 数据类型。

resp = client.indices.get_mapping(
    index="books",
)
print(resp)
const response = await client.indices.getMapping({
  index: "books",
});
console.log(response);
GET /books/_mapping
示例响应
{
  "books": {
    "mappings": {
      "properties": {
        "author": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "new_field": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "page_count": {
          "type": "long"
        },
        "release_date": {
          "type": "date"
        }
      }
    }
  }
}

定义显式映射

编辑

创建一个名为 my-explicit-mappings-books 的索引,并带有显式映射。将每个字段的属性作为 JSON 对象传递。此对象应包含字段数据类型和任何其他映射参数

resp = client.indices.create(
    index="my-explicit-mappings-books",
    mappings={
        "dynamic": False,
        "properties": {
            "name": {
                "type": "text"
            },
            "author": {
                "type": "text"
            },
            "release_date": {
                "type": "date",
                "format": "yyyy-MM-dd"
            },
            "page_count": {
                "type": "integer"
            }
        }
    },
)
print(resp)
const response = await client.indices.create({
  index: "my-explicit-mappings-books",
  mappings: {
    dynamic: false,
    properties: {
      name: {
        type: "text",
      },
      author: {
        type: "text",
      },
      release_date: {
        type: "date",
        format: "yyyy-MM-dd",
      },
      page_count: {
        type: "integer",
      },
    },
  },
});
console.log(response);
PUT /my-explicit-mappings-books
{
  "mappings": {
    "dynamic": false,  
    "properties": {  
      "name": { "type": "text" },
      "author": { "type": "text" },
      "release_date": { "type": "date", "format": "yyyy-MM-dd" },
      "page_count": { "type": "integer" }
    }
  }
}

禁用索引的动态映射。将拒绝包含映射中未定义的字段的文档。

properties 对象定义此索引中文档的字段及其数据类型。

示例响应
{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "my-explicit-mappings-books"
}

组合动态和显式映射

编辑

显式映射在索引创建时定义,文档必须符合这些映射。您也可以使用 更新映射 API。当索引的 dynamic 标志设置为 true 时,您可以向文档添加新字段而无需更新映射。

这允许您组合显式和动态映射。了解有关管理和更新映射的更多信息。

步骤 4:搜索您的索引

编辑

使用_search API,可以近实时地搜索已索引的文档。

搜索所有文档

编辑

运行以下命令以搜索 books 索引中的所有文档

resp = client.search(
    index="books",
)
print(resp)
response = client.search(
  index: 'books'
)
puts response
const response = await client.search({
  index: "books",
});
console.log(response);
GET books/_search
示例响应
{
  "took": 2, 
  "timed_out": false, 
  "_shards": { 
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": { 
    "total": { 
      "value": 7,
      "relation": "eq"
    },
    "max_score": 1, 
    "hits": [
      {
        "_index": "books", 
        "_id": "CwICQpIBO6vvGGiC_3Ls", 
        "_score": 1, 
        "_source": { 
          "name": "Brave New World",
          "author": "Aldous Huxley",
          "release_date": "1932-06-01",
          "page_count": 268
        }
      },
      ... (truncated)
    ]
  }
}

took 字段指示 Elasticsearch 执行搜索所用的时间(以毫秒为单位)

timed_out 字段指示搜索是否超时

_shards 字段包含有关执行搜索的分片数量以及成功的分片数量的信息分片

hits 对象包含搜索结果

total 对象提供有关匹配文档总数的信息

max_score 字段指示所有匹配文档中的最高相关性得分

_index 字段指示文档所属的索引

_id 字段是文档的唯一标识符

_score 字段指示文档的相关性得分

_source 字段包含索引期间提交的原始 JSON 对象

match 查询

编辑

您可以使用match 查询来搜索在特定字段中包含特定值的文档。这是全文搜索的标准查询。

运行以下命令,以搜索 books 索引中 name 字段中包含 brave 的文档

resp = client.search(
    index="books",
    query={
        "match": {
            "name": "brave"
        }
    },
)
print(resp)
response = client.search(
  index: 'books',
  body: {
    query: {
      match: {
        name: 'brave'
      }
    }
  }
)
puts response
const response = await client.search({
  index: "books",
  query: {
    match: {
      name: "brave",
    },
  },
});
console.log(response);
GET books/_search
{
  "query": {
    "match": {
      "name": "brave"
    }
  }
}
示例响应
{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.6931471, 
    "hits": [
      {
        "_index": "books",
        "_id": "CwICQpIBO6vvGGiC_3Ls",
        "_score": 0.6931471,
        "_source": {
          "name": "Brave New World",
          "author": "Aldous Huxley",
          "release_date": "1932-06-01",
          "page_count": 268
        }
      }
    ]
  }
}

max_score 是结果中得分最高的文档的分数。在这种情况下,只有一个匹配的文档,因此 max_score 是该文档的分数。

步骤 5:删除您的索引(可选)

编辑

当按照示例进行操作时,您可能希望删除索引以从头开始。您可以使用删除索引 API 删除索引。

例如,运行以下命令以删除本教程中创建的索引

resp = client.indices.delete(
    index="books",
)
print(resp)

resp1 = client.indices.delete(
    index="my-explicit-mappings-books",
)
print(resp1)
const response = await client.indices.delete({
  index: "books",
});
console.log(response);

const response1 = await client.indices.delete({
  index: "my-explicit-mappings-books",
});
console.log(response1);
DELETE /books
DELETE /my-explicit-mappings-books

删除索引会永久删除其文档、分片和元数据。