Skip to content

shuffle_csv

CLI module for shuffling CSV data files.

Functions:

  • get_args

    Get the arguments when using from the commandline.

  • main

    Shuffle the data and split it according to the default split method.

  • run

    Run the CSV shuffling script.

get_args

get_args() -> Namespace

Get the arguments when using from the commandline.

Returns:

  • Namespace

    Parsed command line arguments.

Source code in src/stimulus/cli/shuffle_csv.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def get_args() -> argparse.Namespace:
    """Get the arguments when using from the commandline.

    Returns:
        Parsed command line arguments.
    """
    parser = argparse.ArgumentParser(description="Shuffle rows in a CSV data file.")
    parser.add_argument(
        "-c",
        "--csv",
        type=str,
        required=True,
        metavar="FILE",
        help="The file path for the csv containing all data",
    )
    parser.add_argument(
        "-y",
        "--yaml",
        type=str,
        required=True,
        metavar="FILE",
        help="The YAML config file that hold all parameter info",
    )
    parser.add_argument(
        "-o",
        "--output",
        type=str,
        required=True,
        metavar="FILE",
        help="The output file path to write the noised csv",
    )

    return parser.parse_args()

main

main(
    data_csv: str, config_yaml: str, out_path: str
) -> None

Shuffle the data and split it according to the default split method.

Parameters:

  • data_csv (str) –

    Path to input CSV file.

  • config_yaml (str) –

    Path to config YAML file.

  • out_path (str) –

    Path to output shuffled CSV.

TODO major changes when this is going to select a given shuffle method and integration with split.

Source code in src/stimulus/cli/shuffle_csv.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def main(data_csv: str, config_yaml: str, out_path: str) -> None:
    """Shuffle the data and split it according to the default split method.

    Args:
        data_csv: Path to input CSV file.
        config_yaml: Path to config YAML file.
        out_path: Path to output shuffled CSV.

    TODO major changes when this is going to select a given shuffle method and integration with split.
    """
    # create a DatasetProcessor object from the config and the csv
    processor = DatasetProcessor(config_path=config_yaml, csv_path=data_csv)

    # shuffle the data with a default seed. TODO get the seed for the config if and when that is going to be set there.
    processor.shuffle_labels(seed=42)

    # save the modified csv
    processor.save(out_path)

run

run() -> None

Run the CSV shuffling script.

Source code in src/stimulus/cli/shuffle_csv.py
64
65
66
67
def run() -> None:
    """Run the CSV shuffling script."""
    args = get_args()
    main(args.csv, args.yaml, args.output)