脚本评分查询编辑

使用 脚本 为返回的文档提供自定义评分。

例如,如果评分函数很昂贵,而你只需要计算一组过滤后的文档的评分,那么 script_score 查询很有用。

示例请求编辑

以下 script_score 查询将每个返回的文档的评分设置为等于 my-int 字段值除以 10

response = client.search(
  body: {
    query: {
      script_score: {
        query: {
          match: {
            message: 'elasticsearch'
          }
        },
        script: {
          source: "doc['my-int'].value / 10 "
        }
      }
    }
  }
)
puts response
GET /_search
{
  "query": {
    "script_score": {
      "query": {
        "match": { "message": "elasticsearch" }
      },
      "script": {
        "source": "doc['my-int'].value / 10 "
      }
    }
  }
}

script_score 的顶级参数编辑

query
(必需,查询对象) 用于返回文档的查询。
script

(必需,脚本对象) 用于计算 query 返回的文档评分的脚本。

来自 script_score 查询的最终相关性评分不能为负。为了支持某些搜索优化,Lucene 要求评分为正数或 0

min_score
(可选,浮点数) 评分低于此浮点数的文档将从搜索结果中排除。
boost
(可选,浮点数) 由 script 生成的文档评分将乘以 boost 以生成最终文档评分。默认为 1.0

说明编辑

在脚本中使用相关性评分编辑

在脚本中,你可以 访问 代表文档当前相关性评分的 _score 变量。

预定义函数编辑

你可以在你的 script 中使用任何可用的 无痛函数。你还可以使用以下预定义函数来自定义评分

我们建议使用这些预定义函数,而不是编写自己的函数。这些函数利用了 Elasticsearch 内部机制的效率。

饱和度编辑

saturation(value,k) = value/(k + value)

"script" : {
    "source" : "saturation(doc['my-int'].value, 1)"
}
S 型函数编辑

sigmoid(value, k, a) = value^a/ (k^a + value^a)

"script" : {
    "source" : "sigmoid(doc['my-int'].value, 2, 1)"
}
随机评分函数编辑

random_score 函数生成从 0 到 1(不包括 1)的均匀分布的评分。

randomScore 函数具有以下语法:randomScore(<seed>, <fieldName>)。它有一个必需的参数 - seed 作为整数,以及一个可选参数 - fieldName 作为字符串。

"script" : {
    "source" : "randomScore(100, '_seq_no')"
}

如果省略了 fieldName 参数,则内部 Lucene 文档 ID 将用作随机性的来源。这非常有效,但不幸的是不可重现,因为文档可能会在合并时重新编号。

"script" : {
    "source" : "randomScore(100)"
}

请注意,在同一个分片中且具有相同字段值的文档将获得相同的评分,因此通常希望使用一个字段,该字段在整个分片中对所有文档都有唯一的值。一个好的默认选择可能是使用 _seq_no 字段,其唯一的缺点是如果文档被更新,评分将发生变化,因为更新操作也会更新 _seq_no 字段的值。

数值字段的衰减函数编辑

你可以在这里 了解更多关于衰减函数的信息

  • double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)
  • double decayNumericExp(double origin, double scale, double offset, double decay, double docValue)
  • double decayNumericGauss(double origin, double scale, double offset, double decay, double docValue)
"script" : {
    "source" : "decayNumericLinear(params.origin, params.scale, params.offset, params.decay, doc['dval'].value)",
    "params": { 
        "origin": 20,
        "scale": 10,
        "decay" : 0.5,
        "offset" : 0
    }
}

使用 params 允许仅编译一次脚本,即使参数发生变化。

地理字段的衰减函数编辑
  • double decayGeoLinear(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
  • double decayGeoExp(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
  • double decayGeoGauss(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
"script" : {
    "source" : "decayGeoExp(params.origin, params.scale, params.offset, params.decay, doc['location'].value)",
    "params": {
        "origin": "40, -70.12",
        "scale": "200km",
        "offset": "0km",
        "decay" : 0.2
    }
}
日期字段的衰减函数编辑
  • double decayDateLinear(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
  • double decayDateExp(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
  • double decayDateGauss(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
"script" : {
    "source" : "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['date'].value)",
    "params": {
        "origin": "2008-01-01T01:00:00Z",
        "scale": "1h",
        "offset" : "0",
        "decay" : 0.5
    }
}

日期上的衰减函数仅限于默认格式和默认时区的日期。此外,不支持使用 now 进行计算。

向量字段的函数编辑

向量字段的函数 可通过 script_score 查询访问。

允许昂贵的查询编辑

如果 search.allow_expensive_queries 设置为 false,则不会执行脚本评分查询。

更快的替代方案编辑

script_score 查询会计算每个匹配文档或命中的评分。有一些更快的替代查询类型可以有效地跳过非竞争性命中

  • 如果你想对某些静态字段上的文档进行提升,请使用 rank_feature 查询。
  • 如果你想提升更接近日期或地理位置的文档,请使用 distance_feature 查询。

从函数评分查询过渡编辑

我们建议使用 script_score 查询,而不是 function_score 查询,因为 script_score 查询更简单。

你可以使用 script_score 查询来实现 function_score 查询的以下函数

script_score编辑

你在函数评分查询的 script_score 中使用的内容,可以复制到脚本评分查询中。这里不需要更改。

weight编辑

weight 函数可以通过以下脚本在脚本评分查询中实现

"script" : {
    "source" : "params.weight * _score",
    "params": {
        "weight": 2
    }
}
random_score编辑

使用 randomScore 函数,如 随机评分函数 中所述。

field_value_factor编辑

field_value_factor 函数可以通过脚本轻松实现

"script" : {
    "source" : "Math.log10(doc['field'].value * params.factor)",
    "params" : {
        "factor" : 5
    }
}

要检查文档是否缺少值,可以使用 doc['field'].size() == 0。例如,此脚本将在文档没有字段 field 时使用值 1

"script" : {
    "source" : "Math.log10((doc['field'].size() == 0 ? 1 : doc['field'].value()) * params.factor)",
    "params" : {
        "factor" : 5
    }
}

此表列出了如何通过脚本实现 field_value_factor 修饰符

修饰符 在脚本评分中的实现

none

-

log

Math.log10(doc['f'].value)

log1p

Math.log10(doc['f'].value + 1)

log2p

Math.log10(doc['f'].value + 2)

ln

Math.log(doc['f'].value)

ln1p

Math.log(doc['f'].value + 1)

ln2p

Math.log(doc['f'].value + 2)

square

Math.pow(doc['f'].value, 2)

sqrt

Math.sqrt(doc['f'].value)

reciprocal

1.0 / doc['f'].value

decay 函数编辑

script_score 查询具有等效的 衰减函数,这些函数可以在脚本中使用。

向量字段的函数编辑

在向量函数计算期间,将线性扫描所有匹配的文档。因此,预计查询时间将随着匹配文档数量的线性增长而增长。出于这个原因,我们建议使用 query 参数限制匹配文档的数量。

以下是可用的向量函数和向量访问方法列表

  1. cosineSimilarity – 计算余弦相似度
  2. dotProduct – 计算点积
  3. l1norm – 计算 L1 距离
  4. l2norm - 计算 L2 距离
  5. doc[<field>].vectorValue – 将向量的值作为浮点数数组返回
  6. doc[<field>].magnitude – 返回向量的模长

推荐使用 cosineSimilaritydotProductl1norml2norm 函数来访问密集向量。但请注意,您应该在每个脚本中只调用这些函数一次。例如,不要在循环中使用这些函数来计算文档向量与多个其他向量之间的相似度。如果您需要该功能,请通过 直接访问向量值 来自己重新实现这些函数。

让我们创建一个具有 dense_vector 映射的索引,并将一些文档索引到其中。

response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        my_dense_vector: {
          type: 'dense_vector',
          dims: 3
        },
        status: {
          type: 'keyword'
        }
      }
    }
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 1,
  body: {
    my_dense_vector: [
      0.5,
      10,
      6
    ],
    status: 'published'
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 2,
  body: {
    my_dense_vector: [
      -0.5,
      10,
      10
    ],
    status: 'published'
  }
)
puts response

response = client.indices.refresh(
  index: 'my-index-000001'
)
puts response
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_dense_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "status" : {
        "type" : "keyword"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "my_dense_vector": [0.5, 10, 6],
  "status" : "published"
}

PUT my-index-000001/_doc/2
{
  "my_dense_vector": [-0.5, 10, 10],
  "status" : "published"
}

POST my-index-000001/_refresh
余弦相似度edit

The cosineSimilarity function calculates the measure of cosine similarity between a given query vector and document vectors.

response = client.search(
  index: 'my-index-000001',
  body: {
    query: {
      script_score: {
        query: {
          bool: {
            filter: {
              term: {
                status: 'published'
              }
            }
          }
        },
        script: {
          source: "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0",
          params: {
            query_vector: [
              4,
              3.4,
              -0.2
            ]
          }
        }
      }
    }
  }
)
puts response
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published" 
            }
          }
        }
      },
      "script": {
        "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", 
        "params": {
          "query_vector": [4, 3.4, -0.2]  
        }
      }
    }
  }
}

To restrict the number of documents on which script score calculation is applied, provide a filter.

The script adds 1.0 to the cosine similarity to prevent the score from being negative.

To take advantage of the script optimizations, provide a query vector as a script parameter.

If a document’s dense vector field has a number of dimensions different from the query’s vector, an error will be thrown.

点积edit

The dotProduct function calculates the measure of dot product between a given query vector and document vectors.

response = client.search(
  index: 'my-index-000001',
  body: {
    query: {
      script_score: {
        query: {
          bool: {
            filter: {
              term: {
                status: 'published'
              }
            }
          }
        },
        script: {
          source: "\n          double value = dotProduct(params.query_vector, 'my_dense_vector');\n          return sigmoid(1, Math.E, -value); \n        ",
          params: {
            query_vector: [
              4,
              3.4,
              -0.2
            ]
          }
        }
      }
    }
  }
)
puts response
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": """
          double value = dotProduct(params.query_vector, 'my_dense_vector');
          return sigmoid(1, Math.E, -value); 
        """,
        "params": {
          "query_vector": [4, 3.4, -0.2]
        }
      }
    }
  }
}

Using the standard sigmoid function prevents scores from being negative.

L1 距离(曼哈顿距离)edit

The l1norm function calculates L1 distance (Manhattan distance) between a given query vector and document vectors.

response = client.search(
  index: 'my-index-000001',
  body: {
    query: {
      script_score: {
        query: {
          bool: {
            filter: {
              term: {
                status: 'published'
              }
            }
          }
        },
        script: {
          source: "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))",
          params: {
            "queryVector": [
              4,
              3.4,
              -0.2
            ]
          }
        }
      }
    }
  }
)
puts response
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", 
        "params": {
          "queryVector": [4, 3.4, -0.2]
        }
      }
    }
  }
}

Unlike cosineSimilarity that represent similarity, l1norm and l2norm shown below represent distances or differences. This means, that the more similar the vectors are, the lower the scores will be that are produced by the l1norm and l2norm functions. Thus, as we need more similar vectors to score higher, we reversed the output from l1norm and l2norm. Also, to avoid division by 0 when a document vector matches the query exactly, we added 1 in the denominator.

L2 距离(欧几里得距离)edit

The l2norm function calculates L2 distance (Euclidean distance) between a given query vector and document vectors.

response = client.search(
  index: 'my-index-000001',
  body: {
    query: {
      script_score: {
        query: {
          bool: {
            filter: {
              term: {
                status: 'published'
              }
            }
          }
        },
        script: {
          source: "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
          params: {
            "queryVector": [
              4,
              3.4,
              -0.2
            ]
          }
        }
      }
    }
  }
)
puts response
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
        "params": {
          "queryVector": [4, 3.4, -0.2]
        }
      }
    }
  }
}
检查缺失值edit

If a document doesn’t have a value for a vector field on which a vector function is executed, an error will be thrown.

You can check if a document has a value for the field my_vector with doc['my_vector'].size() == 0. Your overall script can look like this

"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
直接访问向量edit

You can access vector values directly through the following functions

  • doc[<field>].vectorValue – 将向量的值作为浮点数数组返回
  • doc[<field>].magnitude – 返回向量的模长(对于在 7.5 版本之前创建的向量,模长不会存储。因此,此函数每次调用时都会重新计算它)。

For example, the script below implements a cosine similarity using these two functions

response = client.search(
  index: 'my-index-000001',
  body: {
    query: {
      script_score: {
        query: {
          bool: {
            filter: {
              term: {
                status: 'published'
              }
            }
          }
        },
        script: {
          source: "\n          float[] v = doc['my_dense_vector'].vectorValue;\n          float vm = doc['my_dense_vector'].magnitude;\n          float dotProduct = 0;\n          for (int i = 0; i < v.length; i++) {\n            dotProduct += v[i] * params.queryVector[i];\n          }\n          return dotProduct / (vm * (float) params.queryVectorMag);\n        ",
          params: {
            "queryVector": [
              4,
              3.4,
              -0.2
            ],
            "queryVectorMag": 5.25357
          }
        }
      }
    }
  }
)
puts response
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": """
          float[] v = doc['my_dense_vector'].vectorValue;
          float vm = doc['my_dense_vector'].magnitude;
          float dotProduct = 0;
          for (int i = 0; i < v.length; i++) {
            dotProduct += v[i] * params.queryVector[i];
          }
          return dotProduct / (vm * (float) params.queryVectorMag);
        """,
        "params": {
          "queryVector": [4, 3.4, -0.2],
          "queryVectorMag": 5.25357
        }
      }
    }
  }
}

解释请求edit

Using an explain request provides an explanation of how the parts of a score were computed. The script_score query can add its own explanation by setting the explanation parameter

response = client.explain(
  index: 'my-index-000001',
  id: 0,
  body: {
    query: {
      script_score: {
        query: {
          match: {
            message: 'elasticsearch'
          }
        },
        script: {
          source: "\n          long count = doc['count'].value;\n          double normalizedCount = count / 10;\n          if (explanation != nil) {\n            explanation.set('normalized count = count / 10 = ' + count + ' / 10 = ' + normalizedCount);\n          }\n          return normalizedCount;\n        "
        }
      }
    }
  }
)
puts response
GET /my-index-000001/_explain/0
{
  "query": {
    "script_score": {
      "query": {
        "match": { "message": "elasticsearch" }
      },
      "script": {
        "source": """
          long count = doc['count'].value;
          double normalizedCount = count / 10;
          if (explanation != null) {
            explanation.set('normalized count = count / 10 = ' + count + ' / 10 = ' + normalizedCount);
          }
          return normalizedCount;
        """
      }
    }
  }
}

Note that the explanation will be null when using in a normal _search request, so having a conditional guard is best practice.