evaluate_feature_set_relationships

evaluate_feature_set_relationships#

mlproject.corr_analysis.dependency_graph.evaluate_feature_set_relationships(X_lob, X_matminer, y, model=None, n_splits=5, random_state=42, n_jobs=-1, scoring=None)[source]#

Computes cross-validated regression metrics (mean ± std) between two feature sets and a target.

Evaluates four relationships:
  1. X_lob → y

  2. X_matminer → y

  3. X_lob → X_matminer

  4. X_matminer → X_lob

Parameters:
  • X_lob (pd.DataFrame or np.ndarray) – Feature matrix for the first feature group (e.g. Lobster).

  • X_matminer (pd.DataFrame or np.ndarray) – Feature matrix for the second feature group (e.g. Matminer).

  • y (pd.Series or np.ndarray) – Target variable.

  • model (estimator, optional) – Base regressor (default: RandomForestRegressor).

  • n_splits (int, optional) – Number of CV splits (default: 5).

  • random_state (int, optional) – Random seed (default: 42).

  • n_jobs (int, optional) – Number of parallel jobs (default: -1).

  • scoring (dict, optional) – Dict of scoring functions (name -> scorer). Default: R², MAE, RMSE, MAPE.

Returns:

Summary DataFrame with mean and std for each metric and relationship.

Return type:

pd.DataFrame