提取信息

编辑

这些 NLP 任务使您能够从非结构化文本中提取信息。

命名实体识别

编辑

命名实体识别 (NER) 任务可以识别和分类非结构化文本中的某些实体——通常是专有名词。命名实体通常指现实世界中的对象,例如人、地点、组织以及其他由专有名词一致引用的杂项实体。

NER 是识别关键信息、添加结构并深入了解内容的有用工具。在处理和探索大型文本集合(例如新闻文章、维基页面或网站)时,它特别有用。它使理解文本主题和将类似的内容片段组合在一起变得更容易。

在以下示例中,对短文本进行任何命名实体分析,模型不仅提取构成实体的单个单词,还提取由多个单词组成的短语。

{
    "docs": [{"text_field": "Elastic is headquartered in Mountain View, California."}]
}
...

任务返回以下结果

{
  "inference_results": [{
    ...
      entities: [
        {
          "entity": "Elastic",
          "class": "organization"
        },
        {
          "entity": "Mountain View",
          "class": "location"
        },
        {
          "entity": "California",
          "class": "location"
        }
      ]
    }
  ]
}
...

填空

编辑

填空任务的目的是从文本序列中预测一个缺失的单词。模型使用掩码单词的上下文来预测最有可能完成文本的单词。

填空任务可用于快速轻松地测试您的模型。

在以下示例中,特殊单词“[MASK]”用作占位符,以告知模型预测哪个单词。

{
    docs: [{"text_field": "The capital city of France is [MASK]."}]
}
...

任务返回以下结果

...
{
  "predicted_value": "Paris"
  ...
}
...

问答

编辑

问答(或抽取式问答)任务可以通过从提供的文本中提取信息来获取某些问题的答案。

模型对通常很长的非结构化文本字符串进行标记化,然后尝试从文本中提取问题的答案。逻辑由以下示例显示

{
    "docs": [{"text_field": "The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."}],
    "inference_config": {"question_answering": {"question": "Which name is also used to describe the Amazon rainforest in English?"}}
}
...

答案由下面的对象显示

...
{
  "predicted_value": "Amazonia"
  ...
}
...