paired_ftest_5x2cv_feat_sets#
- mlproject.training.f_test.paired_ftest_5x2cv_feat_sets(model_type, X, y, num_jobs, target_name, random_seed=None, grootcv_n_iter=50, **est_kwargs)[source]#
Runs adapted 5x2cv paired F-test to compare the performance of two feature sets on a model.
- Parameters:
model_type (str) – Type of model to train (“modnet”, “rf”)
X (pd.DataFrame) – Combined feature set
y (pd.DataFrame) – Target variable
n_jobs (int) – Number of parallel jobs.
target_name (str) – Name of the target variable.
random_seed (int or None (default: None)) – Random seed for creating the test/train splits.
**est_kwargs – Additional keyword arguments for the specific model training function.
num_jobs (int)
grootcv_n_iter (int)
- Returns:
A dict that consists of the F-statistic, p-value, paired MAE differences and MAEs list for each feature set with following keys
f_stat: The F-statistic
p_value: Two-tailed p-value
diffs: paired MAE differences from each iteration
- results_mae: dict with MAEs with keys baseline and extended
baseline == Matminer feature set
extended == Matiner+Lobster feature set
- Return type:
dict
References
Code has been adapted using mlextend combined_ftest_5x2cv implementation as reference