脚本评分查询
编辑脚本评分查询编辑
使用 脚本 为返回的文档提供自定义评分。
例如,如果评分函数很昂贵,而你只需要计算一组过滤后的文档的评分,那么 script_score
查询很有用。
示例请求编辑
以下 script_score
查询将每个返回的文档的评分设置为等于 my-int
字段值除以 10
。
response = client.search( body: { query: { script_score: { query: { match: { message: 'elasticsearch' } }, script: { source: "doc['my-int'].value / 10 " } } } } ) puts response
GET /_search { "query": { "script_score": { "query": { "match": { "message": "elasticsearch" } }, "script": { "source": "doc['my-int'].value / 10 " } } } }
script_score
的顶级参数编辑
-
query
- (必需,查询对象) 用于返回文档的查询。
-
script
-
(必需,脚本对象) 用于计算
query
返回的文档评分的脚本。来自
script_score
查询的最终相关性评分不能为负。为了支持某些搜索优化,Lucene 要求评分为正数或0
。 -
min_score
- (可选,浮点数) 评分低于此浮点数的文档将从搜索结果中排除。
-
boost
- (可选,浮点数) 由
script
生成的文档评分将乘以boost
以生成最终文档评分。默认为1.0
。
说明编辑
预定义函数编辑
你可以在你的 script
中使用任何可用的 无痛函数。你还可以使用以下预定义函数来自定义评分
我们建议使用这些预定义函数,而不是编写自己的函数。这些函数利用了 Elasticsearch 内部机制的效率。
饱和度编辑
saturation(value,k) = value/(k + value)
"script" : { "source" : "saturation(doc['my-int'].value, 1)" }
S 型函数编辑
sigmoid(value, k, a) = value^a/ (k^a + value^a)
"script" : { "source" : "sigmoid(doc['my-int'].value, 2, 1)" }
随机评分函数编辑
random_score
函数生成从 0 到 1(不包括 1)的均匀分布的评分。
randomScore
函数具有以下语法:randomScore(<seed>, <fieldName>)
。它有一个必需的参数 - seed
作为整数,以及一个可选参数 - fieldName
作为字符串。
"script" : { "source" : "randomScore(100, '_seq_no')" }
如果省略了 fieldName
参数,则内部 Lucene 文档 ID 将用作随机性的来源。这非常有效,但不幸的是不可重现,因为文档可能会在合并时重新编号。
"script" : { "source" : "randomScore(100)" }
请注意,在同一个分片中且具有相同字段值的文档将获得相同的评分,因此通常希望使用一个字段,该字段在整个分片中对所有文档都有唯一的值。一个好的默认选择可能是使用 _seq_no
字段,其唯一的缺点是如果文档被更新,评分将发生变化,因为更新操作也会更新 _seq_no
字段的值。
数值字段的衰减函数编辑
你可以在这里 了解更多关于衰减函数的信息。
-
double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)
-
double decayNumericExp(double origin, double scale, double offset, double decay, double docValue)
-
double decayNumericGauss(double origin, double scale, double offset, double decay, double docValue)
地理字段的衰减函数编辑
-
double decayGeoLinear(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
-
double decayGeoExp(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
-
double decayGeoGauss(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
"script" : { "source" : "decayGeoExp(params.origin, params.scale, params.offset, params.decay, doc['location'].value)", "params": { "origin": "40, -70.12", "scale": "200km", "offset": "0km", "decay" : 0.2 } }
日期字段的衰减函数编辑
-
double decayDateLinear(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
-
double decayDateExp(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
-
double decayDateGauss(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
"script" : { "source" : "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['date'].value)", "params": { "origin": "2008-01-01T01:00:00Z", "scale": "1h", "offset" : "0", "decay" : 0.5 } }
日期上的衰减函数仅限于默认格式和默认时区的日期。此外,不支持使用 now
进行计算。
允许昂贵的查询编辑
如果 search.allow_expensive_queries
设置为 false,则不会执行脚本评分查询。
更快的替代方案编辑
script_score
查询会计算每个匹配文档或命中的评分。有一些更快的替代查询类型可以有效地跳过非竞争性命中
- 如果你想对某些静态字段上的文档进行提升,请使用
rank_feature
查询。 - 如果你想提升更接近日期或地理位置的文档,请使用
distance_feature
查询。
从函数评分查询过渡编辑
我们建议使用 script_score
查询,而不是 function_score
查询,因为 script_score
查询更简单。
你可以使用 script_score
查询来实现 function_score
查询的以下函数
script_score
编辑
你在函数评分查询的 script_score
中使用的内容,可以复制到脚本评分查询中。这里不需要更改。
weight
编辑
weight
函数可以通过以下脚本在脚本评分查询中实现
"script" : { "source" : "params.weight * _score", "params": { "weight": 2 } }
field_value_factor
编辑
field_value_factor
函数可以通过脚本轻松实现
"script" : { "source" : "Math.log10(doc['field'].value * params.factor)", "params" : { "factor" : 5 } }
要检查文档是否缺少值,可以使用 doc['field'].size() == 0
。例如,此脚本将在文档没有字段 field
时使用值 1
"script" : { "source" : "Math.log10((doc['field'].size() == 0 ? 1 : doc['field'].value()) * params.factor)", "params" : { "factor" : 5 } }
此表列出了如何通过脚本实现 field_value_factor
修饰符
修饰符 | 在脚本评分中的实现 |
---|---|
|
- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
向量字段的函数编辑
在向量函数计算期间,将线性扫描所有匹配的文档。因此,预计查询时间将随着匹配文档数量的线性增长而增长。出于这个原因,我们建议使用 query
参数限制匹配文档的数量。
以下是可用的向量函数和向量访问方法列表
-
cosineSimilarity
– 计算余弦相似度 -
dotProduct
– 计算点积 -
l1norm
– 计算 L1 距离 -
l2norm
- 计算 L2 距离 -
doc[<field>].vectorValue
– 将向量的值作为浮点数数组返回 -
doc[<field>].magnitude
– 返回向量的模长
推荐使用 cosineSimilarity
、dotProduct
、l1norm
或 l2norm
函数来访问密集向量。但请注意,您应该在每个脚本中只调用这些函数一次。例如,不要在循环中使用这些函数来计算文档向量与多个其他向量之间的相似度。如果您需要该功能,请通过 直接访问向量值 来自己重新实现这些函数。
让我们创建一个具有 dense_vector
映射的索引,并将一些文档索引到其中。
response = client.indices.create( index: 'my-index-000001', body: { mappings: { properties: { my_dense_vector: { type: 'dense_vector', dims: 3 }, status: { type: 'keyword' } } } } ) puts response response = client.index( index: 'my-index-000001', id: 1, body: { my_dense_vector: [ 0.5, 10, 6 ], status: 'published' } ) puts response response = client.index( index: 'my-index-000001', id: 2, body: { my_dense_vector: [ -0.5, 10, 10 ], status: 'published' } ) puts response response = client.indices.refresh( index: 'my-index-000001' ) puts response
PUT my-index-000001 { "mappings": { "properties": { "my_dense_vector": { "type": "dense_vector", "dims": 3 }, "status" : { "type" : "keyword" } } } } PUT my-index-000001/_doc/1 { "my_dense_vector": [0.5, 10, 6], "status" : "published" } PUT my-index-000001/_doc/2 { "my_dense_vector": [-0.5, 10, 10], "status" : "published" } POST my-index-000001/_refresh
余弦相似度edit
The cosineSimilarity
function calculates the measure of cosine similarity between a given query vector and document vectors.
response = client.search( index: 'my-index-000001', body: { query: { script_score: { query: { bool: { filter: { term: { status: 'published' } } } }, script: { source: "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", params: { query_vector: [ 4, 3.4, -0.2 ] } } } } } ) puts response
GET my-index-000001/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", "params": { "query_vector": [4, 3.4, -0.2] } } } } }
To restrict the number of documents on which script score calculation is applied, provide a filter. |
|
The script adds 1.0 to the cosine similarity to prevent the score from being negative. |
|
To take advantage of the script optimizations, provide a query vector as a script parameter. |
If a document’s dense vector field has a number of dimensions different from the query’s vector, an error will be thrown.
点积edit
The dotProduct
function calculates the measure of dot product between a given query vector and document vectors.
response = client.search( index: 'my-index-000001', body: { query: { script_score: { query: { bool: { filter: { term: { status: 'published' } } } }, script: { source: "\n double value = dotProduct(params.query_vector, 'my_dense_vector');\n return sigmoid(1, Math.E, -value); \n ", params: { query_vector: [ 4, 3.4, -0.2 ] } } } } } ) puts response
GET my-index-000001/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": """ double value = dotProduct(params.query_vector, 'my_dense_vector'); return sigmoid(1, Math.E, -value); """, "params": { "query_vector": [4, 3.4, -0.2] } } } } }
L1 距离(曼哈顿距离)edit
The l1norm
function calculates L1 distance (Manhattan distance) between a given query vector and document vectors.
response = client.search( index: 'my-index-000001', body: { query: { script_score: { query: { bool: { filter: { term: { status: 'published' } } } }, script: { source: "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", params: { "queryVector": [ 4, 3.4, -0.2 ] } } } } } ) puts response
GET my-index-000001/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", "params": { "queryVector": [4, 3.4, -0.2] } } } } }
Unlike |
L2 距离(欧几里得距离)edit
The l2norm
function calculates L2 distance (Euclidean distance) between a given query vector and document vectors.
response = client.search( index: 'my-index-000001', body: { query: { script_score: { query: { bool: { filter: { term: { status: 'published' } } } }, script: { source: "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))", params: { "queryVector": [ 4, 3.4, -0.2 ] } } } } } ) puts response
GET my-index-000001/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))", "params": { "queryVector": [4, 3.4, -0.2] } } } } }
检查缺失值edit
If a document doesn’t have a value for a vector field on which a vector function is executed, an error will be thrown.
You can check if a document has a value for the field my_vector
with doc['my_vector'].size() == 0
. Your overall script can look like this
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
直接访问向量edit
You can access vector values directly through the following functions
-
doc[<field>].vectorValue
– 将向量的值作为浮点数数组返回 -
doc[<field>].magnitude
– 返回向量的模长(对于在 7.5 版本之前创建的向量,模长不会存储。因此,此函数每次调用时都会重新计算它)。
For example, the script below implements a cosine similarity using these two functions
response = client.search( index: 'my-index-000001', body: { query: { script_score: { query: { bool: { filter: { term: { status: 'published' } } } }, script: { source: "\n float[] v = doc['my_dense_vector'].vectorValue;\n float vm = doc['my_dense_vector'].magnitude;\n float dotProduct = 0;\n for (int i = 0; i < v.length; i++) {\n dotProduct += v[i] * params.queryVector[i];\n }\n return dotProduct / (vm * (float) params.queryVectorMag);\n ", params: { "queryVector": [ 4, 3.4, -0.2 ], "queryVectorMag": 5.25357 } } } } } ) puts response
GET my-index-000001/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": """ float[] v = doc['my_dense_vector'].vectorValue; float vm = doc['my_dense_vector'].magnitude; float dotProduct = 0; for (int i = 0; i < v.length; i++) { dotProduct += v[i] * params.queryVector[i]; } return dotProduct / (vm * (float) params.queryVectorMag); """, "params": { "queryVector": [4, 3.4, -0.2], "queryVectorMag": 5.25357 } } } } }
解释请求edit
Using an explain request provides an explanation of how the parts of a score were computed. The script_score
query can add its own explanation by setting the explanation
parameter
response = client.explain( index: 'my-index-000001', id: 0, body: { query: { script_score: { query: { match: { message: 'elasticsearch' } }, script: { source: "\n long count = doc['count'].value;\n double normalizedCount = count / 10;\n if (explanation != nil) {\n explanation.set('normalized count = count / 10 = ' + count + ' / 10 = ' + normalizedCount);\n }\n return normalizedCount;\n " } } } } ) puts response
GET /my-index-000001/_explain/0 { "query": { "script_score": { "query": { "match": { "message": "elasticsearch" } }, "script": { "source": """ long count = doc['count'].value; double normalizedCount = count / 10; if (explanation != null) { explanation.set('normalized count = count / 10 = ' + count + ' / 10 = ' + normalizedCount); } return normalizedCount; """ } } } }
Note that the explanation
will be null when using in a normal _search
request, so having a conditional guard is best practice.