Skip to content

split_split

CLI module for splitting YAML configuration files into unique files for each split.

This module provides functionality to split a single YAML configuration file into multiple YAML files, each containing a unique split. The resulting YAML files can be used as input configurations for the stimulus package.

Functions:

  • split_split

    Reads a YAML config file and generates a file per unique split.

split_split

split_split(config_yaml: str, out_dir_path: str) -> None

Reads a YAML config file and generates a file per unique split.

This script reads a YAML with a defined structure and creates all the YAML files ready to be passed to the stimulus package.

The structure of the YAML is described here -> TODO paste here link to documentation. This YAML and its structure summarize how to generate unique splits and all the transformations associated to this split.

This script will always generate at least one YAML file that represent the combination that does not touch the data (no transform) and uses the default split behavior.

Source code in src/stimulus/cli/split_split.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def split_split(config_yaml: str, out_dir_path: str) -> None:
    """Reads a YAML config file and generates a file per unique split.

    This script reads a YAML with a defined structure and creates all the YAML files ready to be passed to
    the stimulus package.

    The structure of the YAML is described here -> TODO paste here link to documentation.
    This YAML and its structure summarize how to generate unique splits and all the transformations associated to this split.

    This script will always generate at least one YAML file that represent the combination that does not touch the data (no transform)
    and uses the default split behavior.
    """
    # read the yaml experiment config and load its to dictionary
    yaml_config: dict[str, Any] = {}
    with open(config_yaml) as conf_file:
        yaml_config = yaml.safe_load(conf_file)

    yaml_config_dict = data_config_parser.ConfigDict(**yaml_config)

    logger.info("YAML config loaded successfully.")

    # generate the yaml files per split
    split_configs = data_config_parser.generate_split_configs(yaml_config_dict)

    logger.info("Splits generated successfully.")

    # dump all the YAML configs into files
    data_config_parser.dump_yaml_list_into_files(split_configs, out_dir_path, "test_split")

    logger.info("YAML files saved successfully.")