corrected_resampled_ttest

corrected_resampled_ttest#

mlproject.postprocess.t_test.corrected_resampled_ttest(scores_model_a, scores_model_b, n_train_list, n_test_list, alpha=0.05, alternative='two-sided')[source]#

Nadeau & Bengio corrected resampled paired t-test with varying fold sizes.

This is the same idea as the standard corrected resampled t-test, but instead of a single n_test/n_train ratio, it uses the average ratio across splits:

r_bar = (1/m) * sum_i (n_test_i / n_train_i)

Then:

Var(d_bar) ≈ (1/m + r_bar) * s^2

Parameters:

scores_model_a (list[float]) – List of test errors (e.g., MAE) for model A across splits.
scores_model_b (list[float]) – List of test errors (e.g., MAE) for model B across splits.
n_train_list (list[int]) – List of training set sizes for each split.
n_test_list (list[int]) – List of test set sizes for each split.
alpha (float, optional) – Significance level for the test (default is 0.05).
alternative (str, optional) – The alternative hypothesis to test. Options are “two-sided”, “greater”, or “less

Returns:

A dict that consists of the t-statistic, degrees of freedom, critical value, p-value, and average test/train ratio across splits with following keys

t_stat: The t-statistic value for the test

df: degrees of freedom

critical_value: Critical value for the given alpha and alternative hypothesis

p_value: p-value as per the specified alternative hypothesis

r_bar: average test/train ratio across splits

Return type:

dict

References

Nadeau, C., Bengio, Y. Inference for the Generalization Error. Machine Learning 52, 239–281 (2003). https://doi.org/10.1023/A:1024068626366

corrected_resampled_ttest

Contents

corrected_resampled_ttest#