splitting ¶
This package provides splitter classes for splitting data into train, validation, and test sets.
Modules:
-
splitters–This file contains the splitter classes for splitting data accordingly.
Classes:
-
AbstractSplitter–Abstract class for splitters.
-
RandomSplit–This splitter randomly splits the data.
AbstractSplitter ¶
AbstractSplitter(seed: float = 42)
Bases: ABC
Abstract class for splitters.
A splitter splits the data into train and test sets.
Methods:
-
get_split_indexes–calculates split indices for the data
-
distance–calculates the distance between two elements of the data
Parameters:
-
seed(float, default:42) –Random seed for reproducibility
Source code in src/stimulus/data/splitting/splitters.py
22 23 24 25 26 27 28 | |
distance abstractmethod ¶
Calculates the distance between two elements of the data.
This is an abstract method that should be implemented by the child class.
Parameters:
Returns:
-
distance(float) –the distance between the two data points
Source code in src/stimulus/data/splitting/splitters.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
get_split_indexes abstractmethod ¶
Splits the data. Always return indices mapping to the original list.
This is an abstract method that should be implemented by the child class.
Parameters:
-
data(dict) –the data to be split
Returns:
-
split_indices(list) –the indices for train and test sets
Source code in src/stimulus/data/splitting/splitters.py
30 31 32 33 34 35 36 37 38 39 40 41 42 | |
RandomSplit ¶
Bases: AbstractSplitter
This splitter randomly splits the data.
Parameters:
-
split(Optional[list], default:None) –List of proportions for train/val/test splits
-
seed(int, default:42) –Random seed for reproducibility
Methods:
-
distance–Calculate distance between two data points.
-
get_split_indexes–Splits the data indices into train and test sets.
Source code in src/stimulus/data/splitting/splitters.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
distance ¶
Calculate distance between two data points.
Parameters:
Returns:
-
float–Distance between the points
Source code in src/stimulus/data/splitting/splitters.py
121 122 123 124 125 126 127 128 129 130 131 | |
get_split_indexes ¶
Splits the data indices into train and test sets.
One can use these lists of indices to parse the data afterwards.
Parameters:
-
data(dict) –Dictionary mapping column names to lists of data values.
Returns:
Raises:
-
ValueError–If the split argument is not a list with length 3.
-
ValueError–If the sum of the split proportions is not 1.
Source code in src/stimulus/data/splitting/splitters.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |