Skip to content

transform_csv

CLI module for transforming CSV data files.

Functions:

  • get_args

    Get the arguments when using from the commandline.

  • main

    Connect CSV and YAML configuration and handle sanity checks.

  • run

    Run the CSV transformation script.

get_args

get_args() -> Namespace

Get the arguments when using from the commandline.

Source code in src/stimulus/cli/transform_csv.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def get_args() -> argparse.Namespace:
    """Get the arguments when using from the commandline."""
    parser = argparse.ArgumentParser(description="CLI for transforming CSV data files using YAML configuration.")
    parser.add_argument(
        "-c",
        "--csv",
        type=str,
        required=True,
        metavar="FILE",
        help="The file path for the csv containing all data",
    )
    parser.add_argument(
        "-y",
        "--yaml",
        type=str,
        required=True,
        metavar="FILE",
        help="The YAML config file that holds all parameter info",
    )
    parser.add_argument(
        "-o",
        "--output",
        type=str,
        required=True,
        metavar="FILE",
        help="The output file path to write the noised csv",
    )

    return parser.parse_args()

main

main(
    data_csv: str, config_yaml: str, out_path: str
) -> None

Connect CSV and YAML configuration and handle sanity checks.

This launcher will be the connection between the csv and one YAML configuration. It should also handle some sanity checks.

Source code in src/stimulus/cli/transform_csv.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
def main(data_csv: str, config_yaml: str, out_path: str) -> None:
    """Connect CSV and YAML configuration and handle sanity checks.

    This launcher will be the connection between the csv and one YAML configuration.
    It should also handle some sanity checks.
    """
    # initialize the csv processing class, it open and reads the csv in automatic
    processor = DatasetProcessor(config_path=config_yaml, csv_path=data_csv)

    # initialize the transform manager
    transform_config = processor.dataset_manager.config.transforms
    with open(config_yaml) as f:
        yaml_config = YamlSubConfigDict(**yaml.safe_load(f))
    transform_loader = TransformLoader(seed=yaml_config.global_params.seed)
    transform_loader.initialize_column_data_transformers_from_config(transform_config)
    transform_manager = TransformManager(transform_loader)

    # apply the transformations to the data
    processor.apply_transformation_group(transform_manager)

    # write the transformed data to a new csv
    processor.save(out_path)

run

run() -> None

Run the CSV transformation script.

Source code in src/stimulus/cli/transform_csv.py
68
69
70
71
def run() -> None:
    """Run the CSV transformation script."""
    args = get_args()
    main(args.csv, args.yaml, args.output)