› › ›

评估数据框分析 API

编辑

评估数据框分析 API

编辑

评估带注释索引的数据框分析。

请求

编辑

POST _ml/data_frame/_evaluate

先决条件

编辑

需要以下权限

集群：monitor_ml（machine_learning_user 内置角色授予此权限）
目标索引：read

描述

编辑

该 API 将各种类型机器学习功能的常用评估指标打包在一起。它被设计用于数据框分析创建的索引。评估需要同时存在真实值字段和分析结果字段。

请求主体

编辑

evaluation

（必需，对象）定义您要执行的评估类型。请参阅数据框分析评估资源。

可用的评估类型

outlier_detection
regression
classification

index

（必需，对象）定义将在其中执行评估的 index。

query

（可选，对象）一个查询子句，用于从源索引检索数据子集。请参阅查询 DSL。

数据框分析评估资源

编辑

异常值检测评估对象

编辑

异常值检测评估异常值检测分析的结果，该分析输出每个文档是异常值的概率。

actual_field

（必需，字符串）index 中包含 真实值 的字段。此字段的数据类型可以是布尔值或整数。如果数据类型是整数，则该值必须为 0 (false) 或 1 (true)。

predicted_probability_field

（必需，字符串）index 中定义项目是否属于相关类的概率的字段。它是包含分析结果的字段。

metrics

（可选，对象）指定用于评估的指标。如果未指定任何指标，则默认返回以下指标

auc_roc (include_curve: false),
precision (at: [0.25, 0.5, 0.75]),
recall (at: [0.25, 0.5, 0.75]),
confusion_matrix (at: [0.25, 0.5, 0.75]).

auc_roc

（可选，对象）AUC ROC（接收器操作特性曲线下的面积）得分，以及可选的曲线。默认值为 {"include_curve": false}。

confusion_matrix

（可选，对象）设置计算指标（tp - 真阳性，fp - 假阳性，tn - 真阴性，fn - 假阴性）的异常值分数不同阈值。默认值为 {"at": [0.25, 0.50, 0.75]}。

precision

（可选，对象）设置计算指标的异常值分数不同阈值。默认值为 {"at": [0.25, 0.50, 0.75]}。

recall

（可选，对象）设置计算指标的异常值分数不同阈值。默认值为 {"at": [0.25, 0.50, 0.75]}。

回归评估对象

编辑

回归评估评估回归分析的结果，该分析输出值的预测。

actual_field

（必需，字符串）index 中包含 真实值 的字段。此字段的数据类型必须是数值。

predicted_field

（必需，字符串）index 中包含预测值的字段，换句话说，是回归分析的结果。

metrics

（可选，对象）指定用于评估的指标。有关 mse、msle 和 huber 的更多信息，请查阅关于回归损失函数的 Jupyter 笔记本。如果未指定任何指标，则默认返回以下指标

mse,
r_squared,
huber (delta: 1.0).

mse

（可选，对象）预测值与实际（真实值）值之间的平均平方差。有关更多信息，请阅读此维基文章。

msle

（可选，对象）预测值的对数与实际（真实值）值的对数之间的平均平方差。

offset

（可选，双精度浮点数）定义从最小化二次误差切换到最小化二次对数误差的过渡点。默认为 1。

huber

（可选，对象）伪 Huber 损失函数。有关更多信息，请阅读此维基文章。

delta

（可选，双精度浮点数）对于远小于 delta 的值，近似为 1/2 (prediction - actual)²，对于远大于 delta 的值，近似为斜率为 delta 的直线。默认为 1。Delta 必须大于 0。

r_squared

（可选，对象）可从自变量预测的因变量方差的比例。有关更多信息，请阅读此维基文章。

分类评估对象

编辑

分类评估评估分类分析的结果，该分析输出一个预测，标识每个文档属于哪个类。

actual_field

（必需，字符串）index 中包含 真实值 的字段。此字段的数据类型必须是类别。

predicted_field

（可选，字符串）index 中包含预测值的字段，换句话说，是分类分析的结果。

top_classes_field

（可选，字符串）index 的字段，该字段是形式为 { "class_name": XXX, "class_probability": YYY } 的文档数组。此字段必须在映射中定义为 nested。

metrics

（可选，对象）指定用于评估的指标。如果未指定任何指标，则默认返回以下指标

accuracy,
multiclass_confusion_matrix,
precision,
recall.

accuracy

（可选，对象）预测的准确性（每个类和总体）。

auc_roc

（可选，对象）AUC ROC（接收器操作特性曲线下的面积）得分，以及可选的曲线。它是针对特定类（提供为“class_name”）作为正类计算的。

class_name

（必需，字符串）在 AUC ROC 计算期间被视为正类的唯一类的名称。其他类被视为负类（“一对多”策略）。所有被评估的文档的顶级类列表中都必须包含 class_name。

include_curve

（可选，布尔值）是否应返回曲线以及得分。默认值为 false。

multiclass_confusion_matrix

（可选，对象）多类混淆矩阵。

size

（可选，双精度浮点数）指定多类混淆矩阵的大小。默认为 10，这将生成一个大小为 10x10 的矩阵。

precision

（可选，对象）预测的精度（每个类和平均值）。

recall

（可选，对象）预测的召回率（每个类和平均值）。

示例

编辑

异常值检测

编辑

resp = client.ml.evaluate_data_frame(
    index="my_analytics_dest_index",
    evaluation={
        "outlier_detection": {
            "actual_field": "is_outlier",
            "predicted_probability_field": "ml.outlier_score"
        }
    },
)
print(resp)

response = client.ml.evaluate_data_frame(
  body: {
    index: 'my_analytics_dest_index',
    evaluation: {
      outlier_detection: {
        actual_field: 'is_outlier',
        predicted_probability_field: 'ml.outlier_score'
      }
    }
  }
)
puts response

const response = await client.ml.evaluateDataFrame({
  index: "my_analytics_dest_index",
  evaluation: {
    outlier_detection: {
      actual_field: "is_outlier",
      predicted_probability_field: "ml.outlier_score",
    },
  },
});
console.log(response);

POST _ml/data_frame/_evaluate
{
  "index": "my_analytics_dest_index",
  "evaluation": {
    "outlier_detection": {
      "actual_field": "is_outlier",
      "predicted_probability_field": "ml.outlier_score"
    }
  }
}

该 API 返回以下结果

{
  "outlier_detection": {
    "auc_roc": {
      "value": 0.92584757746414444
    },
    "confusion_matrix": {
      "0.25": {
          "tp": 5,
          "fp": 9,
          "tn": 204,
          "fn": 5
      },
      "0.5": {
          "tp": 1,
          "fp": 5,
          "tn": 208,
          "fn": 9
      },
      "0.75": {
          "tp": 0,
          "fp": 4,
          "tn": 209,
          "fn": 10
      }
    },
    "precision": {
        "0.25": 0.35714285714285715,
        "0.5": 0.16666666666666666,
        "0.75": 0
    },
    "recall": {
        "0.25": 0.5,
        "0.5": 0.1,
        "0.75": 0
    }
  }
}

回归

编辑

resp = client.ml.evaluate_data_frame(
    index="house_price_predictions",
    query={
        "bool": {
            "filter": [
                {
                    "term": {
                        "ml.is_training": False
                    }
                }
            ]
        }
    },
    evaluation={
        "regression": {
            "actual_field": "price",
            "predicted_field": "ml.price_prediction",
            "metrics": {
                "r_squared": {},
                "mse": {},
                "msle": {
                    "offset": 10
                },
                "huber": {
                    "delta": 1.5
                }
            }
        }
    },
)
print(resp)

response = client.ml.evaluate_data_frame(
  body: {
    index: 'house_price_predictions',
    query: {
      bool: {
        filter: [
          {
            term: {
              'ml.is_training' => false
            }
          }
        ]
      }
    },
    evaluation: {
      regression: {
        actual_field: 'price',
        predicted_field: 'ml.price_prediction',
        metrics: {
          r_squared: {},
          mse: {},
          msle: {
            offset: 10
          },
          huber: {
            delta: 1.5
          }
        }
      }
    }
  }
)
puts response

const response = await client.ml.evaluateDataFrame({
  index: "house_price_predictions",
  query: {
    bool: {
      filter: [
        {
          term: {
            "ml.is_training": false,
          },
        },
      ],
    },
  },
  evaluation: {
    regression: {
      actual_field: "price",
      predicted_field: "ml.price_prediction",
      metrics: {
        r_squared: {},
        mse: {},
        msle: {
          offset: 10,
        },
        huber: {
          delta: 1.5,
        },
      },
    },
  },
});
console.log(response);

POST _ml/data_frame/_evaluate
{
  "index": "house_price_predictions", 
  "query": {
      "bool": {
        "filter": [
          { "term":  { "ml.is_training": false } } 
        ]
      }
  },
  "evaluation": {
    "regression": {
      "actual_field": "price", 
      "predicted_field": "ml.price_prediction", 
      "metrics": {
        "r_squared": {},
        "mse": {},
        "msle": {"offset": 10},
        "huber": {"delta": 1.5}
      }
    }
  }
}

	数据框分析回归分析的输出目标索引。
	在此示例中，为回归分析定义了测试/训练拆分（`training_percent`）。此查询限制仅对测试拆分执行评估。
	实际房价的真实值。这是评估结果所必需的。
	回归分析计算出的房价的预测值。

以下示例计算训练误差

resp = client.ml.evaluate_data_frame(
    index="student_performance_mathematics_reg",
    query={
        "term": {
            "ml.is_training": {
                "value": True
            }
        }
    },
    evaluation={
        "regression": {
            "actual_field": "G3",
            "predicted_field": "ml.G3_prediction",
            "metrics": {
                "r_squared": {},
                "mse": {},
                "msle": {},
                "huber": {}
            }
        }
    },
)
print(resp)

response = client.ml.evaluate_data_frame(
  body: {
    index: 'student_performance_mathematics_reg',
    query: {
      term: {
        'ml.is_training' => {
          value: true
        }
      }
    },
    evaluation: {
      regression: {
        actual_field: 'G3',
        predicted_field: 'ml.G3_prediction',
        metrics: {
          r_squared: {},
          mse: {},
          msle: {},
          huber: {}
        }
      }
    }
  }
)
puts response

const response = await client.ml.evaluateDataFrame({
  index: "student_performance_mathematics_reg",
  query: {
    term: {
      "ml.is_training": {
        value: true,
      },
    },
  },
  evaluation: {
    regression: {
      actual_field: "G3",
      predicted_field: "ml.G3_prediction",
      metrics: {
        r_squared: {},
        mse: {},
        msle: {},
        huber: {},
      },
    },
  },
});
console.log(response);

POST _ml/data_frame/_evaluate
{
  "index": "student_performance_mathematics_reg",
  "query": {
    "term": {
      "ml.is_training": {
        "value": true 
      }
    }
  },
  "evaluation": {
    "regression": {
      "actual_field": "G3", 
      "predicted_field": "ml.G3_prediction", 
      "metrics": {
        "r_squared": {},
        "mse": {},
        "msle": {},
        "huber": {}
      }
    }
  }
}

	在此示例中，为回归分析定义了测试/训练拆分（`training_percent`）。此查询限制仅对训练拆分执行评估。这意味着将计算训练误差。
	包含学生实际表现的真实值的字段。这是评估结果所必需的。
	包含回归分析计算出的学生表现预测值的字段。

下一个示例计算测试误差。与上一个示例相比，唯一的区别是这次将 ml.is_training 设置为 false，因此查询将训练拆分排除在评估之外。

resp = client.ml.evaluate_data_frame(
    index="student_performance_mathematics_reg",
    query={
        "term": {
            "ml.is_training": {
                "value": False
            }
        }
    },
    evaluation={
        "regression": {
            "actual_field": "G3",
            "predicted_field": "ml.G3_prediction",
            "metrics": {
                "r_squared": {},
                "mse": {},
                "msle": {},
                "huber": {}
            }
        }
    },
)
print(resp)

response = client.ml.evaluate_data_frame(
  body: {
    index: 'student_performance_mathematics_reg',
    query: {
      term: {
        'ml.is_training' => {
          value: false
        }
      }
    },
    evaluation: {
      regression: {
        actual_field: 'G3',
        predicted_field: 'ml.G3_prediction',
        metrics: {
          r_squared: {},
          mse: {},
          msle: {},
          huber: {}
        }
      }
    }
  }
)
puts response

const response = await client.ml.evaluateDataFrame({
  index: "student_performance_mathematics_reg",
  query: {
    term: {
      "ml.is_training": {
        value: false,
      },
    },
  },
  evaluation: {
    regression: {
      actual_field: "G3",
      predicted_field: "ml.G3_prediction",
      metrics: {
        r_squared: {},
        mse: {},
        msle: {},
        huber: {},
      },
    },
  },
});
console.log(response);

POST _ml/data_frame/_evaluate
{
  "index": "student_performance_mathematics_reg",
  "query": {
    "term": {
      "ml.is_training": {
        "value": false 
      }
    }
  },
  "evaluation": {
    "regression": {
      "actual_field": "G3", 
      "predicted_field": "ml.G3_prediction", 
      "metrics": {
        "r_squared": {},
        "mse": {},
        "msle": {},
        "huber": {}
      }
    }
  }
}

	在此示例中，为回归分析定义了测试/训练拆分（`training_percent`）。此查询限制仅对测试拆分执行评估。这意味着将计算测试误差。
	包含学生实际表现的真实值的字段。这是评估结果所必需的。
	包含回归分析计算出的学生表现预测值的字段。

分类

编辑

resp = client.ml.evaluate_data_frame(
    index="animal_classification",
    evaluation={
        "classification": {
            "actual_field": "animal_class",
            "predicted_field": "ml.animal_class_prediction",
            "metrics": {
                "multiclass_confusion_matrix": {}
            }
        }
    },
)
print(resp)

response = client.ml.evaluate_data_frame(
  body: {
    index: 'animal_classification',
    evaluation: {
      classification: {
        actual_field: 'animal_class',
        predicted_field: 'ml.animal_class_prediction',
        metrics: {
          multiclass_confusion_matrix: {}
        }
      }
    }
  }
)
puts response

const response = await client.ml.evaluateDataFrame({
  index: "animal_classification",
  evaluation: {
    classification: {
      actual_field: "animal_class",
      predicted_field: "ml.animal_class_prediction",
      metrics: {
        multiclass_confusion_matrix: {},
      },
    },
  },
});
console.log(response);

POST _ml/data_frame/_evaluate
{
   "index": "animal_classification",
   "evaluation": {
      "classification": { 
         "actual_field": "animal_class", 
         "predicted_field": "ml.animal_class_prediction", 
         "metrics": {
           "multiclass_confusion_matrix" : {} 
         }
      }
   }
}

	评估类型。
	包含实际动物分类的真实值的字段。这是评估结果所必需的。
	该字段包含分类分析预测的动物分类值。
	指定评估的指标。

API 返回以下结果

{
   "classification" : {
      "multiclass_confusion_matrix" : {
         "confusion_matrix" : [
         {
            "actual_class" : "cat", 
            "actual_class_doc_count" : 12, 
            "predicted_classes" : [ 
              {
                "predicted_class" : "cat",
                "count" : 12 
              },
              {
                "predicted_class" : "dog",
                "count" : 0 
              }
            ],
            "other_predicted_class_doc_count" : 0 
          },
          {
            "actual_class" : "dog",
            "actual_class_doc_count" : 11,
            "predicted_classes" : [
              {
                "predicted_class" : "dog",
                "count" : 7
              },
              {
                "predicted_class" : "cat",
                "count" : 4
              }
            ],
            "other_predicted_class_doc_count" : 0
          }
        ],
        "other_actual_class_count" : 0
      }
    }
  }

	分析尝试预测的实际类别名称。
	索引中属于 `actual_class` 的文档数量。
	此对象包含预测类别列表以及与该类别关联的预测数量。
	数据集中被正确识别为猫的猫的数量。
	数据集中被错误分类为狗的猫的数量。
	被分类为未在 `predicted_class` 中列出的类别的文档数量。

resp = client.ml.evaluate_data_frame(
    index="animal_classification",
    evaluation={
        "classification": {
            "actual_field": "animal_class",
            "metrics": {
                "auc_roc": {
                    "class_name": "dog"
                }
            }
        }
    },
)
print(resp)

response = client.ml.evaluate_data_frame(
  body: {
    index: 'animal_classification',
    evaluation: {
      classification: {
        actual_field: 'animal_class',
        metrics: {
          auc_roc: {
            class_name: 'dog'
          }
        }
      }
    }
  }
)
puts response

const response = await client.ml.evaluateDataFrame({
  index: "animal_classification",
  evaluation: {
    classification: {
      actual_field: "animal_class",
      metrics: {
        auc_roc: {
          class_name: "dog",
        },
      },
    },
  },
});
console.log(response);

POST _ml/data_frame/_evaluate
{
   "index": "animal_classification",
   "evaluation": {
      "classification": { 
         "actual_field": "animal_class", 
         "metrics": {
            "auc_roc" : { 
              "class_name": "dog" 
            }
         }
      }
   }
}

	评估类型。
	包含实际动物分类的真实值的字段。这是评估结果所必需的。
	指定评估的指标。
	指定在评估期间被视为正类的类别名称，所有其他类别都被视为负类。

API 返回以下结果

{
  "classification" : {
    "auc_roc" : {
      "value" : 0.8941788639536681
    }
  }
}

« 删除数据框分析作业 API 解释数据框分析 API »