paired_ftest_5x2cv_feat_sets

paired_ftest_5x2cv_feat_sets#

mlproject.training.f_test.paired_ftest_5x2cv_feat_sets(model_type, X, y, num_jobs, target_name, random_seed=None, grootcv_n_iter=50, **est_kwargs)[source]#

Runs adapted 5x2cv paired F-test to compare the performance of two feature sets on a model.

Parameters:
  • model_type (str) – Type of model to train (“modnet”, “rf”)

  • X (pd.DataFrame) – Combined feature set

  • y (pd.DataFrame) – Target variable

  • n_jobs (int) – Number of parallel jobs.

  • target_name (str) – Name of the target variable.

  • random_seed (int or None (default: None)) – Random seed for creating the test/train splits.

  • **est_kwargs – Additional keyword arguments for the specific model training function.

  • num_jobs (int)

  • grootcv_n_iter (int)

Returns:

A dict that consists of the F-statistic, p-value, paired MAE differences and MAEs list for each feature set with following keys

  • f_stat: The F-statistic

  • p_value: Two-tailed p-value

  • diffs: paired MAE differences from each iteration

  • results_mae: dict with MAEs with keys baseline and extended
    • baseline == Matminer feature set

    • extended == Matiner+Lobster feature set

Return type:

dict

References

Code has been adapted using mlextend combined_ftest_5x2cv implementation as reference