class dynn.data.batching.sequence_batch.SequenceBatch(sequences, original_idxs=None, pad_idx=None, left_aligned=True)

Bases: object

Batched sequence object with padding

This wraps a list of integer sequences into a nice array padded to the longest sequence. The batch dimension (number of sequences) is the last dimension.

By default the sequences are padded to the right which means that they are aligned to the left (they all start at index 0)

Parameters:
  • sequences (list) – List of list of integers
  • original_idxs (list) – This list should point to the original position of each sequence in the data (before shuffling/reordering). This is useful when you want to access information that has been discarded during preprocessing (eg original sentence before numberizing and <unk> ing in MT).
  • pad_idx (int) – Default index for padding
  • left_aligned (bool, optional) – Align to the left (all sequences start at the same position).
__init__(sequences, original_idxs=None, pad_idx=None, left_aligned=True)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

collate(sequences)

Pad and concatenate sequences to an array

Args: sequences (list): List of list of integers pad_idx (int): Default index for padding

Returns:max_len x batch_size array
Return type:np.ndarray
get_mask(base_val=1, mask_val=0)

Return a mask expression with specific values for padding tokens.

This will return an expression of the same shape as self.sequences where the i th element of batch b is base_val iff i<=lengths[b] (and mask_val otherwise).

For example, if size is 4 and lengths is [1,2,4] then the returned mask will be:

(here each row is a batch element)

Parameters:
  • base_val (int, optional) – Value of the mask for non-masked indices (typically 1 for multiplicative masks and 0 for additive masks). Defaults to 1.
  • mask_val (int, optional) – Value of the mask for masked indices (typically 0 for multiplicative masks and -inf for additive masks). Defaults to 0.