加载中

提取信息

Elastic Stack Serverless

这些 NLP 任务使您能够从非结构化文本中提取信息。

命名实体识别 (NER) 任务可以识别和分类非结构化文本中的特定实体——通常是专有名词。命名实体通常指代现实世界中的对象,例如人物、地点、组织和其他被专有名词持续引用的各种实体。

NER 是一个有用的工具,可以识别关键信息、增加结构并深入了解您的内容。在处理和探索大量文本(如新闻文章、维基百科页面或网站)时,它尤其有用。它可以更容易地理解文本的主题并将相似的内容分组。

在下面的示例中,将分析简短文本中的任何命名实体,模型不仅会提取构成实体的单个单词,还会提取由多个单词组成的短语。

{
    "docs": [{"text_field": "Elastic is headquartered in Mountain View, California."}]
}
...

该任务返回以下结果:

{
  "inference_results": [{
    ...
      entities: [
        {
          "entity": "Elastic",
          "class": "organization"
        },
        {
          "entity": "Mountain View",
          "class": "location"
        },
        {
          "entity": "California",
          "class": "location"
        }
      ]
    }
  ]
}
...

填空任务的目的是预测文本序列中缺失的单词。模型使用被遮盖单词的上下文来预测最有可能完成文本的单词。

填空任务可用于快速轻松地测试您的模型。

在下面的示例中,“[MASK]”这个特殊单词用作占位符,以告知模型需要预测哪个单词。

{
    docs: [{"text_field": "The capital city of France is [MASK]."}]
}
...

该任务返回以下结果:

...
{
  "predicted_value": "Paris"
  ...
}
...

问答(或抽取式问答)任务可以通过从提供的文本中提取信息来获得某些问题的答案。

模型将(通常很长的)非结构化文本字符串分词,然后尝试从文本中提取问题的答案。其逻辑通过以下示例展示:

{
    "docs": [{"text_field": "The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."}],
    "inference_config": {"question_answering": {"question": "Which name is also used to describe the Amazon rainforest in English?"}}
}
...

答案显示在下方的对象中:

...
{
  "predicted_value": "Amazonia"
  ...
}
...
© . This site is unofficial and not affiliated with Elasticsearch BV.