推理桶聚合
编辑推理桶聚合
编辑一个父管道聚合,它加载预训练模型并对父桶聚合的整理结果字段执行推理。
要使用推理桶聚合,您需要具有使用 获取训练模型 API 所需的相同安全权限。
语法
编辑一个独立的 inference
聚合看起来像这样
{ "inference": { "model_id": "a_model_for_inference", "inference_config": { "regression_config": { "num_top_feature_importance_values": 2 } }, "buckets_path": { "avg_cost": "avg_agg", "max_cost": "max_agg" } } }
表 63. inference
参数
参数名称 | 描述 | 必需 | 默认值 |
---|---|---|---|
|
训练模型的 ID 或别名。 |
必需 |
- |
|
包含推理类型及其选项。有两种类型: |
可选 |
- |
|
定义输入聚合的路径,并将聚合名称映射到模型期望的字段名称。有关更多详细信息,请参见 |
必需 |
- |
推理模型的配置选项
编辑inference_config
设置是可选的,通常不需要,因为预训练模型配备了合理的默认值。在聚合的上下文中,可以为两种类型的模型覆盖一些选项。
回归模型的配置选项
编辑-
num_top_feature_importance_values
- (可选,整数) 指定每个文档的 特征重要性 值的最大数量。默认情况下,它为零,并且不进行特征重要性计算。
分类模型的配置选项
编辑-
num_top_classes
- (可选,整数) 指定要返回的顶级类预测的数量。默认为 0。
-
num_top_feature_importance_values
- (可选,整数) 指定每个文档的 特征重要性 值的最大数量。默认为 0,这意味着不进行特征重要性计算。
-
prediction_field_type
- (可选,字符串) 指定要写入的预测字段的类型。有效值是:
string
,number
,boolean
。当提供boolean
时,1.0
将转换为true
,0.0
将转换为false
。
示例
编辑以下代码片段按 client_ip
聚合 Web 日志,并通过指标和桶子聚合提取一些特征,作为推理聚合的输入,该推理聚合配置了一个训练好的模型来识别可疑的客户端 IP
resp = client.search( index="kibana_sample_data_logs", size=0, aggs={ "client_ip": { "composite": { "sources": [ { "client_ip": { "terms": { "field": "clientip" } } } ] }, "aggs": { "url_dc": { "cardinality": { "field": "url.keyword" } }, "bytes_sum": { "sum": { "field": "bytes" } }, "geo_src_dc": { "cardinality": { "field": "geo.src" } }, "geo_dest_dc": { "cardinality": { "field": "geo.dest" } }, "responses_total": { "value_count": { "field": "timestamp" } }, "success": { "filter": { "term": { "response": "200" } } }, "error404": { "filter": { "term": { "response": "404" } } }, "error503": { "filter": { "term": { "response": "503" } } }, "malicious_client_ip": { "inference": { "model_id": "malicious_clients_model", "buckets_path": { "response_count": "responses_total", "url_dc": "url_dc", "bytes_sum": "bytes_sum", "geo_src_dc": "geo_src_dc", "geo_dest_dc": "geo_dest_dc", "success": "success._count", "error404": "error404._count", "error503": "error503._count" } } } } } }, ) print(resp)
response = client.search( index: 'kibana_sample_data_logs', body: { size: 0, aggregations: { client_ip: { composite: { sources: [ { client_ip: { terms: { field: 'clientip' } } } ] }, aggregations: { url_dc: { cardinality: { field: 'url.keyword' } }, bytes_sum: { sum: { field: 'bytes' } }, geo_src_dc: { cardinality: { field: 'geo.src' } }, geo_dest_dc: { cardinality: { field: 'geo.dest' } }, responses_total: { value_count: { field: 'timestamp' } }, success: { filter: { term: { response: '200' } } }, "error404": { filter: { term: { response: '404' } } }, "error503": { filter: { term: { response: '503' } } }, malicious_client_ip: { inference: { model_id: 'malicious_clients_model', buckets_path: { response_count: 'responses_total', url_dc: 'url_dc', bytes_sum: 'bytes_sum', geo_src_dc: 'geo_src_dc', geo_dest_dc: 'geo_dest_dc', success: 'success._count', "error404": 'error404._count', "error503": 'error503._count' } } } } } } } ) puts response
const response = await client.search({ index: "kibana_sample_data_logs", size: 0, aggs: { client_ip: { composite: { sources: [ { client_ip: { terms: { field: "clientip", }, }, }, ], }, aggs: { url_dc: { cardinality: { field: "url.keyword", }, }, bytes_sum: { sum: { field: "bytes", }, }, geo_src_dc: { cardinality: { field: "geo.src", }, }, geo_dest_dc: { cardinality: { field: "geo.dest", }, }, responses_total: { value_count: { field: "timestamp", }, }, success: { filter: { term: { response: "200", }, }, }, error404: { filter: { term: { response: "404", }, }, }, error503: { filter: { term: { response: "503", }, }, }, malicious_client_ip: { inference: { model_id: "malicious_clients_model", buckets_path: { response_count: "responses_total", url_dc: "url_dc", bytes_sum: "bytes_sum", geo_src_dc: "geo_src_dc", geo_dest_dc: "geo_dest_dc", success: "success._count", error404: "error404._count", error503: "error503._count", }, }, }, }, }, }, }); console.log(response);
GET kibana_sample_data_logs/_search { "size": 0, "aggs": { "client_ip": { "composite": { "sources": [ { "client_ip": { "terms": { "field": "clientip" } } } ] }, "aggs": { "url_dc": { "cardinality": { "field": "url.keyword" } }, "bytes_sum": { "sum": { "field": "bytes" } }, "geo_src_dc": { "cardinality": { "field": "geo.src" } }, "geo_dest_dc": { "cardinality": { "field": "geo.dest" } }, "responses_total": { "value_count": { "field": "timestamp" } }, "success": { "filter": { "term": { "response": "200" } } }, "error404": { "filter": { "term": { "response": "404" } } }, "error503": { "filter": { "term": { "response": "503" } } }, "malicious_client_ip": { "inference": { "model_id": "malicious_clients_model", "buckets_path": { "response_count": "responses_total", "url_dc": "url_dc", "bytes_sum": "bytes_sum", "geo_src_dc": "geo_src_dc", "geo_dest_dc": "geo_dest_dc", "success": "success._count", "error404": "error404._count", "error503": "error503._count" } } } } } } }