提取信息
编辑提取信息
编辑这些 NLP 任务使您能够从非结构化文本中提取信息
命名实体识别
编辑命名实体识别 (NER) 任务可以识别和分类非结构化文本中的特定实体,通常是专有名词。 命名实体通常指的是现实世界中的对象,例如人、地点、组织以及其他由专有名词持续引用的杂项实体。
NER 是一种有用的工具,可以识别关键信息,添加结构并深入了解您的内容。它在处理和浏览大量文本(如新闻文章、维基页面或网站)时特别有用。它使理解文本的主题和将相似的内容分组在一起变得更加容易。
在以下示例中,分析短文本以查找任何命名实体,模型不仅提取构成实体的单个词,还提取由多个词组成的短语。
{ "docs": [{"text_field": "Elastic is headquartered in Mountain View, California."}] } ...
该任务返回以下结果
{ "inference_results": [{ ... entities: [ { "entity": "Elastic", "class": "organization" }, { "entity": "Mountain View", "class": "location" }, { "entity": "California", "class": "location" } ] } ] } ...
完形填空
编辑完形填空任务的目标是从文本序列中预测缺失的单词。模型使用被屏蔽单词的上下文来预测最有可能完成文本的单词。
完形填空任务可用于快速轻松地测试您的模型。
在以下示例中,特殊词“[MASK]”用作占位符,告知模型要预测哪个词。
{ docs: [{"text_field": "The capital city of France is [MASK]."}] } ...
该任务返回以下结果
... { "predicted_value": "Paris" ... } ...
问答
编辑问答(或抽取式问答)任务可以通过从提供的文本中提取信息来获得对某些问题的答案。
模型标记化字符串的(通常很长的)非结构化文本,然后尝试从文本中提取问题的答案。 逻辑由以下示例显示
{ "docs": [{"text_field": "The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."}], "inference_config": {"question_answering": {"question": "Which name is also used to describe the Amazon rainforest in English?"}} } ...
答案由以下对象显示
... { "predicted_value": "Amazonia" ... } ...