Skip to content

split_yaml

CLI module for splitting YAML configuration files.

This module provides functionality to split a single YAML configuration file into multiple YAML files, each containing a specific combination of data transformations and splits. The resulting YAML files can be used as input configurations for the stimulus package.

Functions:

  • get_args

    Get the arguments when using from the command line.

  • main

    Reads a YAML config file and generates all possible data configurations.

get_args

get_args() -> Namespace

Get the arguments when using from the command line.

Source code in src/stimulus/cli/split_yaml.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def get_args() -> argparse.Namespace:
    """Get the arguments when using from the command line."""
    parser = argparse.ArgumentParser(description="")
    parser.add_argument(
        "-j",
        "--yaml",
        type=str,
        required=True,
        metavar="FILE",
        help="The YAML config file that hold all transform - split - parameter info",
    )
    parser.add_argument(
        "-d",
        "--out_dir",
        type=str,
        required=False,
        nargs="?",
        const="./",
        default="./",
        metavar="DIR",
        help="The output dir where all the YAMLs are written to. Output YAML will be called split-#[number].yaml transform-#[number].yaml. Default -> ./",
    )

    return parser.parse_args()

main

main(config_yaml: str, out_dir_path: str) -> None

Reads a YAML config file and generates all possible data configurations.

This script reads a YAML with a defined structure and creates all the YAML files ready to be passed to the stimulus package.

The structure of the YAML is described here -> TODO paste here link to documentation. This YAML and it's structure summarize how to generate all the transform - split and respective parameter combinations. Each resulting YAML will hold only one combination of the above three things.

This script will always generate at least one YAML file that represent the combination that does not touch the data (no transform) and uses the default split behavior.

Source code in src/stimulus/cli/split_yaml.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def main(config_yaml: str, out_dir_path: str) -> None:
    """Reads a YAML config file and generates all possible data configurations.

    This script reads a YAML with a defined structure and creates all the YAML files ready to be passed to
    the stimulus package.

    The structure of the YAML is described here -> TODO paste here link to documentation.
    This YAML and it's structure summarize how to generate all the transform - split and respective parameter combinations.
    Each resulting YAML will hold only one combination of the above three things.

    This script will always generate at least one YAML file that represent the combination that does not touch the data (no transform)
    and uses the default split behavior.
    """
    # read the yaml experiment config and load it to dictionary
    yaml_config: dict[str, Any] = {}
    with open(config_yaml) as conf_file:
        yaml_config = yaml.safe_load(conf_file)

    yaml_config_dict: YamlConfigDict = YamlConfigDict(**yaml_config)
    # check if the yaml schema is correct
    check_yaml_schema(yaml_config_dict)

    # generate all the YAML configs
    data_configs = generate_data_configs(yaml_config_dict)

    # dump all the YAML configs into files
    dump_yaml_list_into_files(data_configs, out_dir_path, "test")