提取信息
Elastic Stack Serverless
这些自然语言处理任务使您能够从非结构化文本中提取信息
命名实体识别 (NER) 任务可以识别和分类非结构化文本中的某些实体——通常是专有名词。命名实体通常指的是现实世界中的对象,例如人、地点、组织以及其他通常由专有名词引用的各种实体。
NER 是识别关键信息、添加结构和深入了解内容的有用工具。它在处理和探索大量文本集合(如新闻文章、维基页面或网站)时尤其有用。它可以更轻松地理解文本的主题并将相似的内容分组在一起。
在以下示例中,对短文本进行分析以查找任何命名实体,并且该模型不仅提取构成实体的单个单词,还提取由多个单词组成的短语。
{
"docs": [{"text_field": "Elastic is headquartered in Mountain View, California."}]
}
...
该任务返回以下结果
{
"inference_results": [{
...
entities: [
{
"entity": "Elastic",
"class": "organization"
},
{
"entity": "Mountain View",
"class": "location"
},
{
"entity": "California",
"class": "location"
}
]
}
]
}
...
填空任务的目标是从文本序列中预测缺失的单词。该模型使用被屏蔽单词的上下文来预测最有可能完成文本的单词。
填空任务可用于快速轻松地测试您的模型。
在以下示例中,特殊单词“[MASK]”用作占位符,以告诉模型要预测哪个单词。
{
docs: [{"text_field": "The capital city of France is [MASK]."}]
}
...
该任务返回以下结果
...
{
"predicted_value": "Paris"
...
}
...
问答(或提取式问答)任务可以通过从提供的文本中提取信息来获取某些问题的答案。
该模型对字符串的(通常是长的)非结构化文本进行分词,然后尝试从文本中提取问题的答案。逻辑如下例所示
{
"docs": [{"text_field": "The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."}],
"inference_config": {"question_answering": {"question": "Which name is also used to describe the Amazon rainforest in English?"}}
}
...
答案显示在下面的对象中
...
{
"predicted_value": "Amazonia"
...
}
...