get_dataset

Contents

get_dataset#

mlproject.data.preprocessing.get_dataset(data_parent_dir, target_name='last_phdos_peak', feat_type='matminer', rename_features=True)[source]#

Load target and feature datasets for a given target and feature type.

Parameters:
  • target_name (str, default="last_phdos_peak") – Name of the target dataset to load.

  • feat_type ({"matminer", "matminer_lob"}, default="matminer") – Feature type used to select feature data.

  • data_parent_dir (str | Path) – Parent directory containing targets and features subdirectories.

  • rename_features (bool, default=True) – If True, clean feature column names and append physical units.

Returns:

  • target_df (pd.DataFrame) – DataFrame containing target values.

  • feature_df (pd.DataFrame) – DataFrame containing feature values.

Raises:

ValueError – If feat_type is not one of the allowed values.

Return type:

tuple[DataFrame, DataFrame]