将静态相关性信号纳入评分
编辑将静态相关性信号纳入评分
编辑许多领域都有已知的与相关性相关的静态信号。例如,PageRank 和 URL 长度是网络搜索中常用的两个特征,用于独立于查询来调整网页的分数。
有两种主要的查询允许将静态评分贡献与文本相关性相结合,例如,使用 BM25 计算。
例如,假设您有一个 pagerank
字段,您希望将其与 BM25 分数结合,使最终分数等于 score = bm25_score + pagerank / (10 + pagerank)
。
使用 script_score
查询,查询将如下所示
resp = client.search( index="index", query={ "script_score": { "query": { "match": { "body": "elasticsearch" } }, "script": { "source": "_score * saturation(doc['pagerank'].value, 10)" } } }, ) print(resp)
response = client.search( index: 'index', body: { query: { script_score: { query: { match: { body: 'elasticsearch' } }, script: { source: "_score * saturation(doc['pagerank'].value, 10)" } } } } ) puts response
const response = await client.search({ index: "index", query: { script_score: { query: { match: { body: "elasticsearch", }, }, script: { source: "_score * saturation(doc['pagerank'].value, 10)", }, }, }, }); console.log(response);
GET index/_search { "query": { "script_score": { "query": { "match": { "body": "elasticsearch" } }, "script": { "source": "_score * saturation(doc['pagerank'].value, 10)" } } } }
|
而使用 rank_feature
查询,它将如下所示
resp = client.search( query={ "bool": { "must": { "match": { "body": "elasticsearch" } }, "should": { "rank_feature": { "field": "pagerank", "saturation": { "pivot": 10 } } } } }, ) print(resp)
response = client.search( body: { query: { bool: { must: { match: { body: 'elasticsearch' } }, should: { rank_feature: { field: 'pagerank', saturation: { pivot: 10 } } } } } } ) puts response
const response = await client.search({ query: { bool: { must: { match: { body: "elasticsearch", }, }, should: { rank_feature: { field: "pagerank", saturation: { pivot: 10, }, }, }, }, }, }); console.log(response);
GET _search { "query": { "bool": { "must": { "match": { "body": "elasticsearch" } }, "should": { "rank_feature": { "field": "pagerank", "saturation": { "pivot": 10 } } } } } }
|
虽然这两种选择都会返回相似的分数,但也有一些权衡:script_score 提供了很大的灵活性,使您可以根据自己的喜好将文本相关性分数与静态信号结合。另一方面,rank_feature
查询 只公开了几种将静态信号纳入评分的方式。但是,它依赖于 rank_feature
和 rank_features
字段,这些字段以一种特殊的方式索引值,允许 rank_feature
查询 跳过非竞争性文档,并更快地获得查询的顶级匹配结果。