corrected_resampled_ttest

corrected_resampled_ttest#

mlproject.postprocess.t_test.corrected_resampled_ttest(scores_model_a, scores_model_b, n_train_list, n_test_list, alpha=0.05, alternative='two-sided')[source]#

Nadeau & Bengio corrected resampled paired t-test with varying fold sizes.

This is the same idea as the standard corrected resampled t-test, but instead of a single n_test/n_train ratio, it uses the average ratio across splits:

r_bar = (1/m) * sum_i (n_test_i / n_train_i)

Then:

Var(d_bar) ≈ (1/m + r_bar) * s^2

Parameters:
  • scores_model_a (list[float]) – List of test errors (e.g., MAE) for model A across splits.

  • scores_model_b (list[float]) – List of test errors (e.g., MAE) for model B across splits.

  • n_train_list (list[int]) – List of training set sizes for each split.

  • n_test_list (list[int]) – List of test set sizes for each split.

  • alpha (float, optional) – Significance level for the test (default is 0.05).

  • alternative (str, optional) – The alternative hypothesis to test. Options are “two-sided”, “greater”, or “less

Returns:

A dict that consists of the t-statistic, degrees of freedom, critical value, p-value, and average test/train ratio across splits with following keys

  • t_stat: The t-statistic value for the test

  • df: degrees of freedom

  • critical_value: Critical value for the given alpha and alternative hypothesis

  • p_value: p-value as per the specified alternative hypothesis

  • r_bar: average test/train ratio across splits

Return type:

dict

References

Nadeau, C., Bengio, Y. Inference for the Generalization Error. Machine Learning 52, 239–281 (2003). https://doi.org/10.1023/A:1024068626366