-
class
dynn.data.batching.sequence_batch.
SequenceBatch
(sequences, original_idxs=None, pad_idx=None, left_aligned=True)¶ Bases:
object
Batched sequence object with padding
This wraps a list of integer sequences into a nice array padded to the longest sequence. The batch dimension (number of sequences) is the last dimension.
By default the sequences are padded to the right which means that they are aligned to the left (they all start at index 0)
Parameters: - sequences (list) – List of list of integers
- original_idxs (list) – This list should point to the original position
of each sequence in the data (before shuffling/reordering). This is
useful when you want to access information that has been discarded
during preprocessing (eg original sentence before numberizing and
<unk>
ing in MT). - pad_idx (int) – Default index for padding
- left_aligned (bool, optional) – Align to the left (all sequences start at the same position).
-
__init__
(sequences, original_idxs=None, pad_idx=None, left_aligned=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
collate
(sequences)¶ Pad and concatenate sequences to an array
Args: sequences (list): List of list of integers pad_idx (int): Default index for padding
Returns: max_len x batch_size
arrayReturn type: np.ndarray
-
get_mask
(base_val=1, mask_val=0)¶ Return a mask expression with specific values for padding tokens.
This will return an expression of the same shape as
self.sequences
where thei
th element of batchb
isbase_val
iffi<=lengths[b]
(andmask_val
otherwise).For example, if
size
is4
andlengths
is[1,2,4]
then the returned mask will be:(here each row is a batch element)
Parameters: