corrected_resampled_ttest#
- mlproject.postprocess.t_test.corrected_resampled_ttest(scores_model_a, scores_model_b, n_train_list, n_test_list, alpha=0.05, alternative='two-sided')[source]#
Nadeau & Bengio corrected resampled paired t-test with varying fold sizes.
This is the same idea as the standard corrected resampled t-test, but instead of a single n_test/n_train ratio, it uses the average ratio across splits:
r_bar = (1/m) * sum_i (n_test_i / n_train_i)
Then:
Var(d_bar) ≈ (1/m + r_bar) * s^2
- Parameters:
scores_model_a (list[float]) – List of test errors (e.g., MAE) for model A across splits.
scores_model_b (list[float]) – List of test errors (e.g., MAE) for model B across splits.
n_train_list (list[int]) – List of training set sizes for each split.
n_test_list (list[int]) – List of test set sizes for each split.
alpha (float, optional) – Significance level for the test (default is 0.05).
alternative (str, optional) – The alternative hypothesis to test. Options are “two-sided”, “greater”, or “less
- Returns:
A dict that consists of the t-statistic, degrees of freedom, critical value, p-value, and average test/train ratio across splits with following keys
t_stat: The t-statistic value for the test
df: degrees of freedom
critical_value: Critical value for the given alpha and alternative hypothesis
p_value: p-value as per the specified alternative hypothesis
r_bar: average test/train ratio across splits
- Return type:
dict
References
Nadeau, C., Bengio, Y. Inference for the Generalization Error. Machine Learning 52, 239–281 (2003). https://doi.org/10.1023/A:1024068626366