-
class
dynn.data.batching.numpy_batching.
NumpyBatches
(data, targets, batch_size=32, shuffle=True)¶ Bases:
object
Wraps a list of numpy arrays and a list of targets as a batch iterator.
You can then iterate over this object and get tuples of
batch_data, batch_targets
ready for use in your computation graph.Example for classification:
# 1000 10-dimensional inputs data = np.random.uniform(size=(1000, 10)) # Class labels labels = np.random.randint(10, size=1000) # Iterator batched_dataset = NumpyBatches(data, labels, batch_size=20) # Training loop for x, y in batched_dataset: # x has shape (10, 20) while y has shape (20,) # Do something with x and y
Example for multidimensional regression:
# 1000 10-dimensional inputs data = np.random.uniform(size=(1000, 10)) # 5-dimensional outputs labels = np.random.uniform(size=(1000, 5)) # Iterator batched_dataset = NumpyBatches(data, labels, batch_size=20) # Training loop for x, y in batched_dataset: # x has shape (10, 20) while y has shape (5, 20) # Do something with x and y
Parameters: -
__getitem__
(index)¶ Returns the
index
th sampleThis returns something different every time the data is shuffled.
If index is a list or a slice this will return a batch.
The result is a tuple
batch_data, batch_target
where each of those is a numpy array in Fortran layout (for more efficient input in dynet). The batch size is always the last dimension.Parameters: index (int, slice) – Index or slice Returns: batch_data, batch_target
Return type: tuple
-
__init__
(data, targets, batch_size=32, shuffle=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()¶ This returns the number of batches in the dataset (not the total number of samples)
Returns: - Number of batches in the dataset
ceil(len(data)/batch_size)
Return type: int
-
__weakref__
¶ list of weak references to the object (if defined)
-
just_passed_multiple
(batch_number)¶ Checks whether the current number of batches processed has just passed a multiple of
batch_number
.For example you can use this to report at regular interval (eg. every 10 batches)
Parameters: batch_number (int) – [description] Returns: True
if \(\fraccurrent_batch\)Return type: bool
-
percentage_done
()¶ What percent of the data has been covered in the current epoch
-
reset
()¶ Reset the iterator and shuffle the dataset if applicable
-