get_dataset#
- mlproject.data.preprocessing.get_dataset(data_parent_dir, target_name='last_phdos_peak', feat_type='matminer', rename_features=True)[source]#
Load target and feature datasets for a given target and feature type.
- Parameters:
target_name (str, default="last_phdos_peak") – Name of the target dataset to load.
feat_type ({"matminer", "matminer_lob"}, default="matminer") – Feature type used to select feature data.
data_parent_dir (str | Path) – Parent directory containing targets and features subdirectories.
rename_features (bool, default=True) – If True, clean feature column names and append physical units.
- Returns:
target_df (pd.DataFrame) – DataFrame containing target values.
feature_df (pd.DataFrame) – DataFrame containing feature values.
- Raises:
ValueError – If feat_type is not one of the allowed values.
- Return type:
tuple[DataFrame, DataFrame]