transform_csv ¶
CLI module for transforming CSV data files.
Functions:
-
load_transforms_from_config
–Load the data config from a path.
-
main
–Transform the data according to the configuration.
-
transform_batch
–Transform a batch of data.
load_transforms_from_config ¶
Load the data config from a path.
Parameters:
-
data_config_path
(str
) –Path to the data config file.
Returns:
Source code in src/stimulus/cli/transform_csv.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
main ¶
Transform the data according to the configuration.
Parameters:
-
data_csv
(str
) –Path to input CSV file.
-
config_yaml
(str
) –Path to config YAML file.
-
out_path
(str
) –Path to output transformed CSV.
Source code in src/stimulus/cli/transform_csv.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
transform_batch ¶
Transform a batch of data.
This function applies a series of configured transformations to specified columns within a batch. It assumes that each transformation's transform_all
method returns a list of the same length as its input.
For 'remove_row' transforms, np.nan
is expected in the output list for removed items. The 'add_row' flag's effect on overall dataset structure (like row duplication) is handled outside this function, based on its output.
Parameters:
-
batch
(LazyBatch
) –The input batch of data (a Hugging Face LazyBatch).
-
transforms_config
(dict[str, list[Any]]
) –A dictionary where keys are column names and values are lists of transform objects to be applied to that column.
Returns:
-
dict[str, list]
–A dictionary representing the transformed batch, with all original columns
-
dict[str, list]
–present and modified columns updated according to the transforms.
Source code in src/stimulus/cli/transform_csv.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|