<no title>

class dynn.data.batching.numpy_batching.NumpyBatches(data, targets, batch_size=32, shuffle=True)¶

Bases: object

Wraps a list of numpy arrays and a list of targets as a batch iterator.

You can then iterate over this object and get tuples of batch_data, batch_targets ready for use in your computation graph.

Example for classification:

# 1000 10-dimensional inputs
data = np.random.uniform(size=(1000, 10))
# Class labels
labels = np.random.randint(10, size=1000)
# Iterator
batched_dataset = NumpyBatches(data, labels, batch_size=20)
# Training loop
for x, y in batched_dataset:
    # x has shape (10, 20) while y has shape (20,)
    # Do something with x and y

Example for multidimensional regression:

# 1000 10-dimensional inputs
data = np.random.uniform(size=(1000, 10))
# 5-dimensional outputs
labels = np.random.uniform(size=(1000, 5))
# Iterator
batched_dataset = NumpyBatches(data, labels, batch_size=20)
# Training loop
for x, y in batched_dataset:
    # x has shape (10, 20) while y has shape (5, 20)
    # Do something with x and y

Parameters:	data (list) – List of numpy arrays containing the data targets (list) – List of targets batch_size (int, optional) – Batch size (default: `32`) shuffle (bool, optional) – Shuffle the dataset whenever starting a new iteration (default: `True`)

__getitem__(index)¶

Returns the index th sample

This returns something different every time the data is shuffled.

If index is a list or a slice this will return a batch.

The result is a tuple batch_data, batch_target where each of those is a numpy array in Fortran layout (for more efficient input in dynet). The batch size is always the last dimension.

Parameters:	index (int, slice) – Index or slice
Returns:	`batch_data, batch_target`
Return type:	tuple

__init__(data, targets, batch_size=32, shuffle=True)¶: Initialize self. See help(type(self)) for accurate signature.

__len__()¶

This returns the number of batches in the dataset (not the total number of samples)

Returns:	Number of batches in the dataset `ceil(len(data)/batch_size)`
Return type:	int

__weakref__¶: list of weak references to the object (if defined)

just_passed_multiple(batch_number)¶

Checks whether the current number of batches processed has just passed a multiple of batch_number.

For example you can use this to report at regular interval (eg. every 10 batches)

Parameters:	batch_number (int) – [description]
Returns:	`True` if \(\fraccurrent_batch\)
Return type:	bool

percentage_done()¶: What percent of the data has been covered in the current epoch

reset()¶: Reset the iterator and shuffle the dataset if applicable