将静态相关性信号纳入评分

编辑

许多领域都有已知的与相关性相关的静态信号。例如,PageRank 和 URL 长度是网络搜索中常用的两个特征,用于独立于查询来调整网页的分数。

有两种主要的查询允许将静态评分贡献与文本相关性相结合,例如,使用 BM25 计算。

例如,假设您有一个 pagerank 字段,您希望将其与 BM25 分数结合,使最终分数等于 score = bm25_score + pagerank / (10 + pagerank)

使用 script_score 查询,查询将如下所示

resp = client.search(
    index="index",
    query={
        "script_score": {
            "query": {
                "match": {
                    "body": "elasticsearch"
                }
            },
            "script": {
                "source": "_score * saturation(doc['pagerank'].value, 10)"
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'index',
  body: {
    query: {
      script_score: {
        query: {
          match: {
            body: 'elasticsearch'
          }
        },
        script: {
          source: "_score * saturation(doc['pagerank'].value, 10)"
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "index",
  query: {
    script_score: {
      query: {
        match: {
          body: "elasticsearch",
        },
      },
      script: {
        source: "_score * saturation(doc['pagerank'].value, 10)",
      },
    },
  },
});
console.log(response);
GET index/_search
{
  "query": {
    "script_score": {
      "query": {
        "match": { "body": "elasticsearch" }
      },
      "script": {
        "source": "_score * saturation(doc['pagerank'].value, 10)" 
      }
    }
  }
}

pagerank 必须映射为 数值

而使用 rank_feature 查询,它将如下所示

resp = client.search(
    query={
        "bool": {
            "must": {
                "match": {
                    "body": "elasticsearch"
                }
            },
            "should": {
                "rank_feature": {
                    "field": "pagerank",
                    "saturation": {
                        "pivot": 10
                    }
                }
            }
        }
    },
)
print(resp)
response = client.search(
  body: {
    query: {
      bool: {
        must: {
          match: {
            body: 'elasticsearch'
          }
        },
        should: {
          rank_feature: {
            field: 'pagerank',
            saturation: {
              pivot: 10
            }
          }
        }
      }
    }
  }
)
puts response
const response = await client.search({
  query: {
    bool: {
      must: {
        match: {
          body: "elasticsearch",
        },
      },
      should: {
        rank_feature: {
          field: "pagerank",
          saturation: {
            pivot: 10,
          },
        },
      },
    },
  },
});
console.log(response);
GET _search
{
  "query": {
    "bool": {
      "must": {
        "match": { "body": "elasticsearch" }
      },
      "should": {
        "rank_feature": {
          "field": "pagerank", 
          "saturation": {
            "pivot": 10
          }
        }
      }
    }
  }
}

pagerank 必须映射为 rank_feature 字段

虽然这两种选择都会返回相似的分数,但也有一些权衡:script_score 提供了很大的灵活性,使您可以根据自己的喜好将文本相关性分数与静态信号结合。另一方面,rank_feature 查询 只公开了几种将静态信号纳入评分的方式。但是,它依赖于 rank_featurerank_features 字段,这些字段以一种特殊的方式索引值,允许 rank_feature 查询 跳过非竞争性文档,并更快地获得查询的顶级匹配结果。