排序特征查询

编辑

根据 rank_featurerank_features 字段的数值,提升文档的相关性评分

rank_feature 查询通常用于 bool 查询的 should 子句中,以便将其相关性评分添加到 bool 查询的其他评分中。

如果 rank_featurerank_features 字段的 positive_score_impact 设置为 false,我们建议参与查询的每个文档都应该具有该字段的值。否则,如果在 should 子句中使用 rank_feature 查询,它不会对缺少值的文档的评分添加任何内容,但会为包含特征的文档添加一些提升。这与我们想要的结果相反——因为我们认为这些特征是负面的,我们希望对包含它们的文档的排名低于缺少它们的文档。

function_score 查询或其他更改相关性评分的方式不同,当 track_total_hits 参数 true 时,rank_feature 查询可以有效地跳过不具竞争力的命中。这可以显著提高查询速度。

排序特征函数

编辑

为了根据排序特征字段计算相关性评分,rank_feature 查询支持以下数学函数:

如果您不知道从哪里开始,我们建议使用 saturation 函数。如果未提供任何函数,则 rank_feature 查询默认使用 saturation 函数。

示例请求

编辑

索引设置

编辑

要使用 rank_feature 查询,您的索引必须包含 rank_featurerank_features 字段映射。要了解如何为 rank_feature 查询设置索引,请尝试以下示例。

创建一个带有以下字段映射的 test 索引:

  • pagerank,一个 rank_feature 字段,用于衡量网站的重要性。
  • url_length,一个 rank_feature 字段,包含网站 URL 的长度。在此示例中,较长的 URL 与相关性呈负相关,由 positive_score_impact 值为 false 表示。
  • topics,一个 rank_features 字段,包含主题列表以及衡量每个文档与该主题的关联程度。
resp = client.indices.create(
    index="test",
    mappings={
        "properties": {
            "pagerank": {
                "type": "rank_feature"
            },
            "url_length": {
                "type": "rank_feature",
                "positive_score_impact": False
            },
            "topics": {
                "type": "rank_features"
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'test',
  body: {
    mappings: {
      properties: {
        pagerank: {
          type: 'rank_feature'
        },
        url_length: {
          type: 'rank_feature',
          positive_score_impact: false
        },
        topics: {
          type: 'rank_features'
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "test",
  mappings: {
    properties: {
      pagerank: {
        type: "rank_feature",
      },
      url_length: {
        type: "rank_feature",
        positive_score_impact: false,
      },
      topics: {
        type: "rank_features",
      },
    },
  },
});
console.log(response);
PUT /test
{
  "mappings": {
    "properties": {
      "pagerank": {
        "type": "rank_feature"
      },
      "url_length": {
        "type": "rank_feature",
        "positive_score_impact": false
      },
      "topics": {
        "type": "rank_features"
      }
    }
  }
}

将多个文档索引到 test 索引。

resp = client.index(
    index="test",
    id="1",
    refresh=True,
    document={
        "url": "https://en.wikipedia.org/wiki/2016_Summer_Olympics",
        "content": "Rio 2016",
        "pagerank": 50.3,
        "url_length": 42,
        "topics": {
            "sports": 50,
            "brazil": 30
        }
    },
)
print(resp)

resp1 = client.index(
    index="test",
    id="2",
    refresh=True,
    document={
        "url": "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
        "content": "Formula One motor race held on 13 November 2016",
        "pagerank": 50.3,
        "url_length": 47,
        "topics": {
            "sports": 35,
            "formula one": 65,
            "brazil": 20
        }
    },
)
print(resp1)

resp2 = client.index(
    index="test",
    id="3",
    refresh=True,
    document={
        "url": "https://en.wikipedia.org/wiki/Deadpool_(film)",
        "content": "Deadpool is a 2016 American superhero film",
        "pagerank": 50.3,
        "url_length": 37,
        "topics": {
            "movies": 60,
            "super hero": 65
        }
    },
)
print(resp2)
response = client.index(
  index: 'test',
  id: 1,
  refresh: true,
  body: {
    url: 'https://en.wikipedia.org/wiki/2016_Summer_Olympics',
    content: 'Rio 2016',
    pagerank: 50.3,
    url_length: 42,
    topics: {
      sports: 50,
      brazil: 30
    }
  }
)
puts response

response = client.index(
  index: 'test',
  id: 2,
  refresh: true,
  body: {
    url: 'https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix',
    content: 'Formula One motor race held on 13 November 2016',
    pagerank: 50.3,
    url_length: 47,
    topics: {
      sports: 35,
      "formula one": 65,
      brazil: 20
    }
  }
)
puts response

response = client.index(
  index: 'test',
  id: 3,
  refresh: true,
  body: {
    url: 'https://en.wikipedia.org/wiki/Deadpool_(film)',
    content: 'Deadpool is a 2016 American superhero film',
    pagerank: 50.3,
    url_length: 37,
    topics: {
      movies: 60,
      "super hero": 65
    }
  }
)
puts response
const response = await client.index({
  index: "test",
  id: 1,
  refresh: "true",
  document: {
    url: "https://en.wikipedia.org/wiki/2016_Summer_Olympics",
    content: "Rio 2016",
    pagerank: 50.3,
    url_length: 42,
    topics: {
      sports: 50,
      brazil: 30,
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "test",
  id: 2,
  refresh: "true",
  document: {
    url: "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
    content: "Formula One motor race held on 13 November 2016",
    pagerank: 50.3,
    url_length: 47,
    topics: {
      sports: 35,
      "formula one": 65,
      brazil: 20,
    },
  },
});
console.log(response1);

const response2 = await client.index({
  index: "test",
  id: 3,
  refresh: "true",
  document: {
    url: "https://en.wikipedia.org/wiki/Deadpool_(film)",
    content: "Deadpool is a 2016 American superhero film",
    pagerank: 50.3,
    url_length: 37,
    topics: {
      movies: 60,
      "super hero": 65,
    },
  },
});
console.log(response2);
PUT /test/_doc/1?refresh
{
  "url": "https://en.wikipedia.org/wiki/2016_Summer_Olympics",
  "content": "Rio 2016",
  "pagerank": 50.3,
  "url_length": 42,
  "topics": {
    "sports": 50,
    "brazil": 30
  }
}

PUT /test/_doc/2?refresh
{
  "url": "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  "content": "Formula One motor race held on 13 November 2016",
  "pagerank": 50.3,
  "url_length": 47,
  "topics": {
    "sports": 35,
    "formula one": 65,
    "brazil": 20
  }
}

PUT /test/_doc/3?refresh
{
  "url": "https://en.wikipedia.org/wiki/Deadpool_(film)",
  "content": "Deadpool is a 2016 American superhero film",
  "pagerank": 50.3,
  "url_length": 37,
  "topics": {
    "movies": 60,
    "super hero": 65
  }
}

示例查询

编辑

以下查询搜索 2016,并根据 pagerankurl_lengthsports 主题提升相关性评分。

resp = client.search(
    index="test",
    query={
        "bool": {
            "must": [
                {
                    "match": {
                        "content": "2016"
                    }
                }
            ],
            "should": [
                {
                    "rank_feature": {
                        "field": "pagerank"
                    }
                },
                {
                    "rank_feature": {
                        "field": "url_length",
                        "boost": 0.1
                    }
                },
                {
                    "rank_feature": {
                        "field": "topics.sports",
                        "boost": 0.4
                    }
                }
            ]
        }
    },
)
print(resp)
const response = await client.search({
  index: "test",
  query: {
    bool: {
      must: [
        {
          match: {
            content: "2016",
          },
        },
      ],
      should: [
        {
          rank_feature: {
            field: "pagerank",
          },
        },
        {
          rank_feature: {
            field: "url_length",
            boost: 0.1,
          },
        },
        {
          rank_feature: {
            field: "topics.sports",
            boost: 0.4,
          },
        },
      ],
    },
  },
});
console.log(response);
GET /test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "2016"
          }
        }
      ],
      "should": [
        {
          "rank_feature": {
            "field": "pagerank"
          }
        },
        {
          "rank_feature": {
            "field": "url_length",
            "boost": 0.1
          }
        },
        {
          "rank_feature": {
            "field": "topics.sports",
            "boost": 0.4
          }
        }
      ]
    }
  }
}

rank_feature 的顶级参数

编辑
field
(必需,字符串)用于提升相关性评分rank_featurerank_features 字段。
boost

(可选,浮点数)用于降低或增加相关性评分的浮点数。默认为 1.0

提升值相对于默认值 1.0。介于 01.0 之间的提升值会降低相关性评分。大于 1.0 的值会增加相关性评分。

saturation

(可选,函数对象)饱和度函数,用于根据排序特征 field 的值来提升相关性评分。如果未提供任何函数,则 rank_feature 查询默认使用 saturation 函数。有关更多信息,请参见饱和度

只能提供一个函数:saturationlogsigmoidlinear

log

(可选,函数对象)对数函数,用于根据排序特征 field 的值来提升相关性评分。有关更多信息,请参见对数

只能提供一个函数:saturationlogsigmoidlinear

sigmoid

(可选,函数对象)Sigmoid 函数,用于根据排序特征 field 的值来提升相关性评分。有关更多信息,请参见Sigmoid

只能提供一个函数:saturationlogsigmoidlinear

linear

(可选,函数对象)线性函数,用于根据排序特征 field 的值来提升相关性评分。有关更多信息,请参见线性

只能提供一个函数:saturationlogsigmoidlinear

注意事项

编辑

饱和度

编辑

saturation 函数给出的分数等于 S / (S + pivot),其中 S 是排序特征字段的值,pivot 是一个可配置的支点值,以便如果 S 小于支点,则结果将小于 0.5,否则大于 0.5。分数始终为 (0,1)

如果排序特征具有负分数影响,则该函数将计算为 pivot / (S + pivot),当 S 增加时,该值会减小。

resp = client.search(
    index="test",
    query={
        "rank_feature": {
            "field": "pagerank",
            "saturation": {
                "pivot": 8
            }
        }
    },
)
print(resp)
const response = await client.search({
  index: "test",
  query: {
    rank_feature: {
      field: "pagerank",
      saturation: {
        pivot: 8,
      },
    },
  },
});
console.log(response);
GET /test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "saturation": {
        "pivot": 8
      }
    }
  }
}

如果未提供 pivot 值,则 Elasticsearch 会计算一个默认值,该值等于索引中所有排序特征值的近似几何平均值。如果您没有机会训练一个好的支点值,我们建议使用此默认值。

resp = client.search(
    index="test",
    query={
        "rank_feature": {
            "field": "pagerank",
            "saturation": {}
        }
    },
)
print(resp)
const response = await client.search({
  index: "test",
  query: {
    rank_feature: {
      field: "pagerank",
      saturation: {},
    },
  },
});
console.log(response);
GET /test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "saturation": {}
    }
  }
}

对数

编辑

log 函数给出的分数等于 log(scaling_factor + S),其中 S 是排序特征字段的值,scaling_factor 是一个可配置的缩放因子。分数是无界的。

此函数仅支持具有正分数影响的排序特征。

resp = client.search(
    index="test",
    query={
        "rank_feature": {
            "field": "pagerank",
            "log": {
                "scaling_factor": 4
            }
        }
    },
)
print(resp)
const response = await client.search({
  index: "test",
  query: {
    rank_feature: {
      field: "pagerank",
      log: {
        scaling_factor: 4,
      },
    },
  },
});
console.log(response);
GET /test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "log": {
        "scaling_factor": 4
      }
    }
  }
}

Sigmoid

编辑

sigmoid 函数是 saturation 的扩展,它添加了一个可配置的指数。分数计算为 S^exp^ / (S^exp^ + pivot^exp^)。与 saturation 函数一样,pivot 是使分数为 0.5S 值,分数是 (0,1)

exponent 必须为正数,通常在 [0.5, 1] 之间。应通过训练计算出一个好的值。如果您没有机会这样做,我们建议您改用 saturation 函数。

resp = client.search(
    index="test",
    query={
        "rank_feature": {
            "field": "pagerank",
            "sigmoid": {
                "pivot": 7,
                "exponent": 0.6
            }
        }
    },
)
print(resp)
const response = await client.search({
  index: "test",
  query: {
    rank_feature: {
      field: "pagerank",
      sigmoid: {
        pivot: 7,
        exponent: 0.6,
      },
    },
  },
});
console.log(response);
GET /test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "sigmoid": {
        "pivot": 7,
        "exponent": 0.6
      }
    }
  }
}

线性

编辑

linear 函数是最简单的函数,给出的分数等于 S 的索引值,其中 S 是排序特征字段的值。如果使用 "positive_score_impact": true 对排序特征字段进行索引,则其索引值等于 S,并四舍五入以仅保留 9 位有效位以获得精度。如果使用 "positive_score_impact": false 对排序特征字段进行索引,则其索引值等于 1/S,并四舍五入以仅保留 9 位有效位以获得精度。

resp = client.search(
    index="test",
    query={
        "rank_feature": {
            "field": "pagerank",
            "linear": {}
        }
    },
)
print(resp)
const response = await client.search({
  index: "test",
  query: {
    rank_feature: {
      field: "pagerank",
      linear: {},
    },
  },
});
console.log(response);
GET /test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "linear": {}
    }
  }
}