splitting ¶
This package provides splitter classes for splitting data into train, validation, and test sets.
Modules:
-
splitters
–This file contains the splitter classes for splitting data accordingly.
Classes:
-
AbstractSplitter
–Abstract class for splitters.
-
RandomSplit
–This splitter randomly splits the data.
AbstractSplitter ¶
AbstractSplitter(seed: float = 42)
Bases: ABC
Abstract class for splitters.
A splitter splits the data into train, validation, and test sets.
Methods:
-
get_split_indexes
–calculates split indices for the data
-
distance
–calculates the distance between two elements of the data
Parameters:
-
seed
(float
, default:42
) –Random seed for reproducibility
Source code in src/stimulus/data/splitting/splitters.py
22 23 24 25 26 27 28 |
|
distance abstractmethod
¶
Calculates the distance between two elements of the data.
This is an abstract method that should be implemented by the child class.
Parameters:
Returns:
-
distance
(float
) –the distance between the two data points
Source code in src/stimulus/data/splitting/splitters.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
get_split_indexes abstractmethod
¶
Splits the data. Always return indices mapping to the original list.
This is an abstract method that should be implemented by the child class.
Parameters:
-
data
(DataFrame
) –the data to be split
Returns:
-
split_indices
(list
) –the indices for train, validation, and test sets
Source code in src/stimulus/data/splitting/splitters.py
30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
RandomSplit ¶
Bases: AbstractSplitter
This splitter randomly splits the data.
Parameters:
-
split
(Optional[list]
, default:None
) –List of proportions for train/val/test splits
-
seed
(int
, default:42
) –Random seed for reproducibility
Methods:
-
distance
–Calculate distance between two data points.
-
get_split_indexes
–Splits the data indices into train, validation, and test sets.
Source code in src/stimulus/data/splitting/splitters.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
distance ¶
Calculate distance between two data points.
Parameters:
Returns:
-
float
–Distance between the points
Source code in src/stimulus/data/splitting/splitters.py
123 124 125 126 127 128 129 130 131 132 133 |
|
get_split_indexes ¶
Splits the data indices into train, validation, and test sets.
One can use these lists of indices to parse the data afterwards.
Parameters:
-
data
(dict
) –Dictionary mapping column names to lists of data values.
Returns:
-
train
(list
) –The indices for the training set.
-
validation
(list
) –The indices for the validation set.
-
test
(list
) –The indices for the test set.
Raises:
-
ValueError
–If the split argument is not a list with length 3.
-
ValueError
–If the sum of the split proportions is not 1.
Source code in src/stimulus/data/splitting/splitters.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|