推理桶聚合

编辑

一个父管道聚合,它加载预训练模型并对父桶聚合的整理结果字段执行推理。

要使用推理桶聚合,您需要具有使用 获取训练模型 API 所需的相同安全权限。

语法

编辑

一个独立的 inference 聚合看起来像这样

{
  "inference": {
    "model_id": "a_model_for_inference", 
    "inference_config": { 
      "regression_config": {
        "num_top_feature_importance_values": 2
      }
    },
    "buckets_path": {
      "avg_cost": "avg_agg", 
      "max_cost": "max_agg"
    }
  }
}

训练模型的唯一标识符或别名。

可选的推理配置,它会覆盖模型的默认设置

avg_agg 的值映射到模型的输入字段 avg_cost

表 63. inference 参数

参数名称 描述 必需 默认值

model_id

训练模型的 ID 或别名。

必需

-

inference_config

包含推理类型及其选项。有两种类型:regressionclassification

可选

-

buckets_path

定义输入聚合的路径,并将聚合名称映射到模型期望的字段名称。有关更多详细信息,请参见 buckets_path 语法

必需

-

推理模型的配置选项

编辑

inference_config 设置是可选的,通常不需要,因为预训练模型配备了合理的默认值。在聚合的上下文中,可以为两种类型的模型覆盖一些选项。

回归模型的配置选项
编辑
num_top_feature_importance_values
(可选,整数) 指定每个文档的 特征重要性 值的最大数量。默认情况下,它为零,并且不进行特征重要性计算。
分类模型的配置选项
编辑
num_top_classes
(可选,整数) 指定要返回的顶级类预测的数量。默认为 0。
num_top_feature_importance_values
(可选,整数) 指定每个文档的 特征重要性 值的最大数量。默认为 0,这意味着不进行特征重要性计算。
prediction_field_type
(可选,字符串) 指定要写入的预测字段的类型。有效值是:stringnumberboolean。当提供 boolean 时,1.0 将转换为 true0.0 将转换为 false

示例

编辑

以下代码片段按 client_ip 聚合 Web 日志,并通过指标和桶子聚合提取一些特征,作为推理聚合的输入,该推理聚合配置了一个训练好的模型来识别可疑的客户端 IP

resp = client.search(
    index="kibana_sample_data_logs",
    size=0,
    aggs={
        "client_ip": {
            "composite": {
                "sources": [
                    {
                        "client_ip": {
                            "terms": {
                                "field": "clientip"
                            }
                        }
                    }
                ]
            },
            "aggs": {
                "url_dc": {
                    "cardinality": {
                        "field": "url.keyword"
                    }
                },
                "bytes_sum": {
                    "sum": {
                        "field": "bytes"
                    }
                },
                "geo_src_dc": {
                    "cardinality": {
                        "field": "geo.src"
                    }
                },
                "geo_dest_dc": {
                    "cardinality": {
                        "field": "geo.dest"
                    }
                },
                "responses_total": {
                    "value_count": {
                        "field": "timestamp"
                    }
                },
                "success": {
                    "filter": {
                        "term": {
                            "response": "200"
                        }
                    }
                },
                "error404": {
                    "filter": {
                        "term": {
                            "response": "404"
                        }
                    }
                },
                "error503": {
                    "filter": {
                        "term": {
                            "response": "503"
                        }
                    }
                },
                "malicious_client_ip": {
                    "inference": {
                        "model_id": "malicious_clients_model",
                        "buckets_path": {
                            "response_count": "responses_total",
                            "url_dc": "url_dc",
                            "bytes_sum": "bytes_sum",
                            "geo_src_dc": "geo_src_dc",
                            "geo_dest_dc": "geo_dest_dc",
                            "success": "success._count",
                            "error404": "error404._count",
                            "error503": "error503._count"
                        }
                    }
                }
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'kibana_sample_data_logs',
  body: {
    size: 0,
    aggregations: {
      client_ip: {
        composite: {
          sources: [
            {
              client_ip: {
                terms: {
                  field: 'clientip'
                }
              }
            }
          ]
        },
        aggregations: {
          url_dc: {
            cardinality: {
              field: 'url.keyword'
            }
          },
          bytes_sum: {
            sum: {
              field: 'bytes'
            }
          },
          geo_src_dc: {
            cardinality: {
              field: 'geo.src'
            }
          },
          geo_dest_dc: {
            cardinality: {
              field: 'geo.dest'
            }
          },
          responses_total: {
            value_count: {
              field: 'timestamp'
            }
          },
          success: {
            filter: {
              term: {
                response: '200'
              }
            }
          },
          "error404": {
            filter: {
              term: {
                response: '404'
              }
            }
          },
          "error503": {
            filter: {
              term: {
                response: '503'
              }
            }
          },
          malicious_client_ip: {
            inference: {
              model_id: 'malicious_clients_model',
              buckets_path: {
                response_count: 'responses_total',
                url_dc: 'url_dc',
                bytes_sum: 'bytes_sum',
                geo_src_dc: 'geo_src_dc',
                geo_dest_dc: 'geo_dest_dc',
                success: 'success._count',
                "error404": 'error404._count',
                "error503": 'error503._count'
              }
            }
          }
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "kibana_sample_data_logs",
  size: 0,
  aggs: {
    client_ip: {
      composite: {
        sources: [
          {
            client_ip: {
              terms: {
                field: "clientip",
              },
            },
          },
        ],
      },
      aggs: {
        url_dc: {
          cardinality: {
            field: "url.keyword",
          },
        },
        bytes_sum: {
          sum: {
            field: "bytes",
          },
        },
        geo_src_dc: {
          cardinality: {
            field: "geo.src",
          },
        },
        geo_dest_dc: {
          cardinality: {
            field: "geo.dest",
          },
        },
        responses_total: {
          value_count: {
            field: "timestamp",
          },
        },
        success: {
          filter: {
            term: {
              response: "200",
            },
          },
        },
        error404: {
          filter: {
            term: {
              response: "404",
            },
          },
        },
        error503: {
          filter: {
            term: {
              response: "503",
            },
          },
        },
        malicious_client_ip: {
          inference: {
            model_id: "malicious_clients_model",
            buckets_path: {
              response_count: "responses_total",
              url_dc: "url_dc",
              bytes_sum: "bytes_sum",
              geo_src_dc: "geo_src_dc",
              geo_dest_dc: "geo_dest_dc",
              success: "success._count",
              error404: "error404._count",
              error503: "error503._count",
            },
          },
        },
      },
    },
  },
});
console.log(response);
GET kibana_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "client_ip": { 
      "composite": {
        "sources": [
          {
            "client_ip": {
              "terms": {
                "field": "clientip"
              }
            }
          }
        ]
      },
      "aggs": { 
        "url_dc": {
          "cardinality": {
            "field": "url.keyword"
          }
        },
        "bytes_sum": {
          "sum": {
            "field": "bytes"
          }
        },
        "geo_src_dc": {
          "cardinality": {
            "field": "geo.src"
          }
        },
        "geo_dest_dc": {
          "cardinality": {
            "field": "geo.dest"
          }
        },
        "responses_total": {
          "value_count": {
            "field": "timestamp"
          }
        },
        "success": {
          "filter": {
            "term": {
              "response": "200"
            }
          }
        },
        "error404": {
          "filter": {
            "term": {
              "response": "404"
            }
          }
        },
        "error503": {
          "filter": {
            "term": {
              "response": "503"
            }
          }
        },
        "malicious_client_ip": { 
          "inference": {
            "model_id": "malicious_clients_model",
            "buckets_path": {
              "response_count": "responses_total",
              "url_dc": "url_dc",
              "bytes_sum": "bytes_sum",
              "geo_src_dc": "geo_src_dc",
              "geo_dest_dc": "geo_dest_dc",
              "success": "success._count",
              "error404": "error404._count",
              "error503": "error503._count"
            }
          }
        }
      }
    }
  }
}

一个复合桶聚合,它按 client_ip 聚合数据。

一系列指标和桶子聚合。

推理桶聚合,指定训练的模型,并将聚合名称映射到模型的输入字段。