Welcome to DyNN’s documentation!

DyNN

dynn package

DyNN

Subpackages

dynn.data package
Data

This module contains helper functions and classes to manage data. This includes code for minibatching as well as functions for downloading common datasets.

Supported datasets are:

class dynn.data.Tree(label, children=None)

Bases: object

Tree object for syntax trees

__init__(label, children=None)

Initialize self. See help(type(self)) for accurate signature.

__str__()

Return str(self).

__weakref__

list of weak references to the object (if defined)

static from_string(string, labelled=True)

Reads linearized tree from string

Parameters:string (str) – Linearized tree
Returns:Tree object
Return type:Tree
Subpackages
dynn.data.batching package
Batching procedures

Iterators implementing common batching strategies.

class dynn.data.batching.NumpyBatches(data, targets, batch_size=32, shuffle=True)

Bases: object

Wraps a list of numpy arrays and a list of targets as a batch iterator.

You can then iterate over this object and get tuples of batch_data, batch_targets ready for use in your computation graph.

Example for classification:

# 1000 10-dimensional inputs
data = np.random.uniform(size=(1000, 10))
# Class labels
labels = np.random.randint(10, size=1000)
# Iterator
batched_dataset = NumpyBatches(data, labels, batch_size=20)
# Training loop
for x, y in batched_dataset:
    # x has shape (10, 20) while y has shape (20,)
    # Do something with x and y

Example for multidimensional regression:

# 1000 10-dimensional inputs
data = np.random.uniform(size=(1000, 10))
# 5-dimensional outputs
labels = np.random.uniform(size=(1000, 5))
# Iterator
batched_dataset = NumpyBatches(data, labels, batch_size=20)
# Training loop
for x, y in batched_dataset:
    # x has shape (10, 20) while y has shape (5, 20)
    # Do something with x and y
Parameters:
  • data (list) – List of numpy arrays containing the data
  • targets (list) – List of targets
  • batch_size (int, optional) – Batch size (default: 32)
  • shuffle (bool, optional) – Shuffle the dataset whenever starting a new iteration (default: True)
__getitem__(index)

Returns the index th sample

This returns something different every time the data is shuffled.

If index is a list or a slice this will return a batch.

The result is a tuple batch_data, batch_target where each of those is a numpy array in Fortran layout (for more efficient input in dynet). The batch size is always the last dimension.

Parameters:index (int, slice) – Index or slice
Returns:batch_data, batch_target
Return type:tuple
__init__(data, targets, batch_size=32, shuffle=True)

Initialize self. See help(type(self)) for accurate signature.

__len__()

This returns the number of batches in the dataset (not the total number of samples)

Returns:
Number of batches in the dataset
ceil(len(data)/batch_size)
Return type:int
__weakref__

list of weak references to the object (if defined)

just_passed_multiple(batch_number)

Checks whether the current number of batches processed has just passed a multiple of batch_number.

For example you can use this to report at regular interval (eg. every 10 batches)

Parameters:batch_number (int) – [description]
Returns:True if \(\fraccurrent_batch\)
Return type:bool
percentage_done()

What percent of the data has been covered in the current epoch

reset()

Reset the iterator and shuffle the dataset if applicable

class dynn.data.batching.SequenceBatch(sequences, original_idxs=None, pad_idx=None, left_aligned=True)

Bases: object

Batched sequence object with padding

This wraps a list of integer sequences into a nice array padded to the longest sequence. The batch dimension (number of sequences) is the last dimension.

By default the sequences are padded to the right which means that they are aligned to the left (they all start at index 0)

Parameters:
  • sequences (list) – List of list of integers
  • original_idxs (list) – This list should point to the original position of each sequence in the data (before shuffling/reordering). This is useful when you want to access information that has been discarded during preprocessing (eg original sentence before numberizing and <unk> ing in MT).
  • pad_idx (int) – Default index for padding
  • left_aligned (bool, optional) – Align to the left (all sequences start at the same position).
__init__(sequences, original_idxs=None, pad_idx=None, left_aligned=True)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

collate(sequences)

Pad and concatenate sequences to an array

Args: sequences (list): List of list of integers pad_idx (int): Default index for padding

Returns:max_len x batch_size array
Return type:np.ndarray
get_mask(base_val=1, mask_val=0)

Return a mask expression with specific values for padding tokens.

This will return an expression of the same shape as self.sequences where the i th element of batch b is base_val iff i<=lengths[b] (and mask_val otherwise).

For example, if size is 4 and lengths is [1,2,4] then the returned mask will be:

(here each row is a batch element)

Parameters:
  • base_val (int, optional) – Value of the mask for non-masked indices (typically 1 for multiplicative masks and 0 for additive masks). Defaults to 1.
  • mask_val (int, optional) – Value of the mask for masked indices (typically 0 for multiplicative masks and -inf for additive masks). Defaults to 0.
class dynn.data.batching.PaddedSequenceBatches(data, targets=None, max_samples=32, pad_idx=0, max_tokens=inf, strict_token_limit=False, shuffle=True, group_by_length=True, left_aligned=True)

Bases: object

Wraps a list of sequences and a list of targets as a batch iterator.

You can then iterate over this object and get tuples of batch_data, batch_targets ready for use in your computation graph.

Example:

# Dictionary
dic = dynn.data.dictionary.Dictionary(symbols="abcde".split())
# 1000 sequences of various lengths up to 10
data = [np.random.randint(len(dic), size=np.random.randint(10))
        for _ in range(1000)]
# Class labels
labels = np.random.randint(10, size=1000)
# Iterator with at most 20 samples or 50 tokens per batch
batched_dataset = PaddedSequenceBatches(
    data,
    targets=labels,
    max_samples=20,
    pad_idx=dic.pad_idx,
)
# Training loop
for x, y in batched_dataset:
    # x is a SequenceBatch object
    # and y has shape (batch_size,)
    # Do something with x and y

# Without labels
batched_dataset = PaddedSequenceBatches(
    data,
    max_samples=20,
    pad_idx=dic.pad_idx,
)
for x in batched_dataset:
    # x is a SequenceBatch object
    # Do something with x
Parameters:
  • data (list) – List of numpy arrays containing the data
  • targets (list) – List of targets
  • pad_value (number) – Value at padded position
  • max_samples (int, optional) – Maximum number of samples per batch
  • max_tokens (int, optional) – Maximum number of tokens per batch. This count doesn’t include padding tokens
  • strict_token_limit (bool, optional) – Padding tokens will count towards the max_tokens limit
  • shuffle (bool, optional) – Shuffle the dataset whenever starting a new iteration (default: True)
  • group_by_length (bool, optional) – Group sequences by length. This minimizes the number of padding tokens. The batches are not strictly IID though.
  • left_aligned (bool, optional) – Align the sequences to the left
__getitem__(index)

Returns the index th sample

The result is a tuple batch_data, batch_target where the first is a batch of sequences and the other is is a numpy array in Fortran layout (for more efficient input in dynet).

batch_data is a SequenceBatch object

Parameters:index (int, slice) – Index or slice
Returns:batch_data, batch_target
Return type:tuple
__init__(data, targets=None, max_samples=32, pad_idx=0, max_tokens=inf, strict_token_limit=False, shuffle=True, group_by_length=True, left_aligned=True)

Initialize self. See help(type(self)) for accurate signature.

__len__()

This returns the number of batches in the dataset (not the total number of samples)

Returns:
Number of batches in the dataset
ceil(len(data)/batch_size)
Return type:int
__weakref__

list of weak references to the object (if defined)

just_passed_multiple(batch_number)

Checks whether the current number of batches processed has just passed a multiple of batch_number.

For example you can use this to report at regular interval (eg. every 10 batches)

Parameters:batch_number (int) – [description]
Returns:True if \(\fraccurrent_batch\)
Return type:bool
percentage_done()

What percent of the data has been covered in the current epoch

reset()

Reset the iterator and shuffle the dataset if applicable

class dynn.data.batching.BPTTBatches(data, batch_size=32, seq_length=30)

Bases: object

Wraps a list of sequences as a contiguous batch iterator.

This will iterate over batches of contiguous subsequences of size seq_length. TODO: elaborate

Example:

# Dictionary
# Sequence of length 1000
data = np.random.randint(10, size=1000)
# Iterator with over subsequences of length 20 with batch size 5
batched_dataset = BPTTBatches(data, batch_size=5, seq_length=20)
# Training loop
for x, y in batched_dataset:
    # x has and y have shape (seq_length, batch_size)
    # y[i+1] == x[i]
    # Do something with x
Parameters:
  • data (list) – List of numpy arrays containing the data
  • targets (list) – List of targets
  • batch_size (int, optional) – Batch size
  • seq_length (int, optional) – BPTT length
__getitem__(index)

Returns the index th sample

The result is a tuple x, next_x of numpy arrays of shape seq_len x batch_size seq_length is determined by the range specified by index, and next_x[t]=x[t+1] for all t

Parameters:index (int, slice) – Index or slice
Returns:x, next_x
Return type:tuple
__init__(data, batch_size=32, seq_length=30)

Initialize self. See help(type(self)) for accurate signature.

__len__()

This returns the number of batches in the dataset (not the total number of samples)

Returns:
Number of batches in the dataset
ceil(len(data)/batch_size)
Return type:int
__weakref__

list of weak references to the object (if defined)

just_passed_multiple(batch_number)

Checks whether the current number of batches processed has just passed a multiple of batch_number.

For example you can use this to report at regular interval (eg. every 10 batches)

Parameters:batch_number (int) – [description]
Returns:True if \(\fraccurrent_batch\)
Return type:bool
percentage_done()

What percent of the data has been covered in the current epoch

reset()

Reset the iterator and shuffle the dataset if applicable

class dynn.data.batching.SequencePairsBatches(src_data, tgt_data, src_dictionary, tgt_dictionary=None, labels=None, max_samples=32, max_tokens=99999999, strict_token_limit=False, shuffle=True, group_by_length='source', src_left_aligned=True, tgt_left_aligned=True)

Bases: object

Wraps two lists of sequences as a batch iterator.

This is useful for sequence-to-sequence problems or sentence pairs classification (entailment, paraphrase detection…). Following seq2seq conventions the first sequence is referred to as the “source” and the second as the “target”.

You can then iterate over this object and get tuples of src_batch, tgt_batch ready for use in your computation graph.

Example:

# Dictionary
dic = dynn.data.dictionary.Dictionary(symbols="abcde".split())
# 1000 source sequences of various lengths up to 10
src_data = [np.random.randint(len(dic), size=np.random.randint(10))
            for _ in range(1000)]
# 1000 target sequences of various lengths up to 10
tgt_data = [np.random.randint(len(dic), size=np.random.randint(10))
            for _ in range(1000)]
# Iterator with at most 20 samples or 50 tokens per batch
batched_dataset = SequencePairsBatches(
    src_data, tgt_data, max_samples=20
)
# Training loop
for x, y in batched_dataset:
    # x and y are SequenceBatch objects
Parameters:
  • src_data (list) – List of source sequences (list of int iterables)
  • tgt_data (list) – List of target sequences (list of int iterables)
  • src_dictionary (Dictionary) – Source dictionary
  • tgt_dictionary (Dictionary) – Target dictionary
  • max_samples (int, optional) – Maximum number of samples per batch (one sample is a pair of sentences)
  • max_tokens (int, optional) – Maximum number of total tokens per batch (source + target tokens)
  • strict_token_limit (bool, optional) – Padding tokens will count towards the max_tokens limit
  • shuffle (bool, optional) – Shuffle the dataset whenever starting a new iteration (default: True)
  • group_by_length (str, optional) – Group sequences by length. One of "source" or "target". This minimizes the number of padding tokens. The batches are not strictly IID though.
  • src_left_aligned (bool, optional) – Align the source sequences to the left
  • tgt_left_aligned (bool, optional) – Align the target sequences to the left
__getitem__(index)

Returns the index th sample

The result is a tuple src_batch, tgt_batch where each is a batch_data is a SequenceBatch object

Parameters:index (int, slice) – Index or slice
Returns:src_batch, tgt_batch
Return type:tuple
__init__(src_data, tgt_data, src_dictionary, tgt_dictionary=None, labels=None, max_samples=32, max_tokens=99999999, strict_token_limit=False, shuffle=True, group_by_length='source', src_left_aligned=True, tgt_left_aligned=True)

Initialize self. See help(type(self)) for accurate signature.

__len__()

This returns the number of batches in the dataset (not the total number of samples)

Returns:
Number of batches in the dataset
ceil(len(data)/batch_size)
Return type:int
__weakref__

list of weak references to the object (if defined)

just_passed_multiple(batch_number)

Checks whether the current number of batches processed has just passed a multiple of batch_number.

For example you can use this to report at regular interval (eg. every 10 batches)

Parameters:batch_number (int) – [description]
Returns:True if \(\fraccurrent_batch\)
Return type:bool
percentage_done()

What percent of the data has been covered in the current epoch

reset()

Reset the iterator and shuffle the dataset if applicable

Submodules
class dynn.data.batching.bptt_batching.BPTTBatches(data, batch_size=32, seq_length=30)

Bases: object

Wraps a list of sequences as a contiguous batch iterator.

This will iterate over batches of contiguous subsequences of size seq_length. TODO: elaborate

Example:

# Dictionary
# Sequence of length 1000
data = np.random.randint(10, size=1000)
# Iterator with over subsequences of length 20 with batch size 5
batched_dataset = BPTTBatches(data, batch_size=5, seq_length=20)
# Training loop
for x, y in batched_dataset:
    # x has and y have shape (seq_length, batch_size)
    # y[i+1] == x[i]
    # Do something with x
Parameters:
  • data (list) – List of numpy arrays containing the data
  • targets (list) – List of targets
  • batch_size (int, optional) – Batch size
  • seq_length (int, optional) – BPTT length
__getitem__(index)

Returns the index th sample

The result is a tuple x, next_x of numpy arrays of shape seq_len x batch_size seq_length is determined by the range specified by index, and next_x[t]=x[t+1] for all t

Parameters:index (int, slice) – Index or slice
Returns:x, next_x
Return type:tuple
__init__(data, batch_size=32, seq_length=30)

Initialize self. See help(type(self)) for accurate signature.

__len__()

This returns the number of batches in the dataset (not the total number of samples)

Returns:
Number of batches in the dataset
ceil(len(data)/batch_size)
Return type:int
__weakref__

list of weak references to the object (if defined)

just_passed_multiple(batch_number)

Checks whether the current number of batches processed has just passed a multiple of batch_number.

For example you can use this to report at regular interval (eg. every 10 batches)

Parameters:batch_number (int) – [description]
Returns:True if \(\fraccurrent_batch\)
Return type:bool
percentage_done()

What percent of the data has been covered in the current epoch

reset()

Reset the iterator and shuffle the dataset if applicable

class dynn.data.batching.numpy_batching.NumpyBatches(data, targets, batch_size=32, shuffle=True)

Bases: object

Wraps a list of numpy arrays and a list of targets as a batch iterator.

You can then iterate over this object and get tuples of batch_data, batch_targets ready for use in your computation graph.

Example for classification:

# 1000 10-dimensional inputs
data = np.random.uniform(size=(1000, 10))
# Class labels
labels = np.random.randint(10, size=1000)
# Iterator
batched_dataset = NumpyBatches(data, labels, batch_size=20)
# Training loop
for x, y in batched_dataset:
    # x has shape (10, 20) while y has shape (20,)
    # Do something with x and y

Example for multidimensional regression:

# 1000 10-dimensional inputs
data = np.random.uniform(size=(1000, 10))
# 5-dimensional outputs
labels = np.random.uniform(size=(1000, 5))
# Iterator
batched_dataset = NumpyBatches(data, labels, batch_size=20)
# Training loop
for x, y in batched_dataset:
    # x has shape (10, 20) while y has shape (5, 20)
    # Do something with x and y
Parameters:
  • data (list) – List of numpy arrays containing the data
  • targets (list) – List of targets
  • batch_size (int, optional) – Batch size (default: 32)
  • shuffle (bool, optional) – Shuffle the dataset whenever starting a new iteration (default: True)
__getitem__(index)

Returns the index th sample

This returns something different every time the data is shuffled.

If index is a list or a slice this will return a batch.

The result is a tuple batch_data, batch_target where each of those is a numpy array in Fortran layout (for more efficient input in dynet). The batch size is always the last dimension.

Parameters:index (int, slice) – Index or slice
Returns:batch_data, batch_target
Return type:tuple
__init__(data, targets, batch_size=32, shuffle=True)

Initialize self. See help(type(self)) for accurate signature.

__len__()

This returns the number of batches in the dataset (not the total number of samples)

Returns:
Number of batches in the dataset
ceil(len(data)/batch_size)
Return type:int
__weakref__

list of weak references to the object (if defined)

just_passed_multiple(batch_number)

Checks whether the current number of batches processed has just passed a multiple of batch_number.

For example you can use this to report at regular interval (eg. every 10 batches)

Parameters:batch_number (int) – [description]
Returns:True if \(\fraccurrent_batch\)
Return type:bool
percentage_done()

What percent of the data has been covered in the current epoch

reset()

Reset the iterator and shuffle the dataset if applicable

class dynn.data.batching.padded_sequence_batching.PaddedSequenceBatches(data, targets=None, max_samples=32, pad_idx=0, max_tokens=inf, strict_token_limit=False, shuffle=True, group_by_length=True, left_aligned=True)

Bases: object

Wraps a list of sequences and a list of targets as a batch iterator.

You can then iterate over this object and get tuples of batch_data, batch_targets ready for use in your computation graph.

Example:

# Dictionary
dic = dynn.data.dictionary.Dictionary(symbols="abcde".split())
# 1000 sequences of various lengths up to 10
data = [np.random.randint(len(dic), size=np.random.randint(10))
        for _ in range(1000)]
# Class labels
labels = np.random.randint(10, size=1000)
# Iterator with at most 20 samples or 50 tokens per batch
batched_dataset = PaddedSequenceBatches(
    data,
    targets=labels,
    max_samples=20,
    pad_idx=dic.pad_idx,
)
# Training loop
for x, y in batched_dataset:
    # x is a SequenceBatch object
    # and y has shape (batch_size,)
    # Do something with x and y

# Without labels
batched_dataset = PaddedSequenceBatches(
    data,
    max_samples=20,
    pad_idx=dic.pad_idx,
)
for x in batched_dataset:
    # x is a SequenceBatch object
    # Do something with x
Parameters:
  • data (list) – List of numpy arrays containing the data
  • targets (list) – List of targets
  • pad_value (number) – Value at padded position
  • max_samples (int, optional) – Maximum number of samples per batch
  • max_tokens (int, optional) – Maximum number of tokens per batch. This count doesn’t include padding tokens
  • strict_token_limit (bool, optional) – Padding tokens will count towards the max_tokens limit
  • shuffle (bool, optional) – Shuffle the dataset whenever starting a new iteration (default: True)
  • group_by_length (bool, optional) – Group sequences by length. This minimizes the number of padding tokens. The batches are not strictly IID though.
  • left_aligned (bool, optional) – Align the sequences to the left
__getitem__(index)

Returns the index th sample

The result is a tuple batch_data, batch_target where the first is a batch of sequences and the other is is a numpy array in Fortran layout (for more efficient input in dynet).

batch_data is a SequenceBatch object

Parameters:index (int, slice) – Index or slice
Returns:batch_data, batch_target
Return type:tuple
__init__(data, targets=None, max_samples=32, pad_idx=0, max_tokens=inf, strict_token_limit=False, shuffle=True, group_by_length=True, left_aligned=True)

Initialize self. See help(type(self)) for accurate signature.

__len__()

This returns the number of batches in the dataset (not the total number of samples)

Returns:
Number of batches in the dataset
ceil(len(data)/batch_size)
Return type:int
__weakref__

list of weak references to the object (if defined)

just_passed_multiple(batch_number)

Checks whether the current number of batches processed has just passed a multiple of batch_number.

For example you can use this to report at regular interval (eg. every 10 batches)

Parameters:batch_number (int) – [description]
Returns:True if \(\fraccurrent_batch\)
Return type:bool
percentage_done()

What percent of the data has been covered in the current epoch

reset()

Reset the iterator and shuffle the dataset if applicable

class dynn.data.batching.parallel_sequences_batching.SequencePairsBatches(src_data, tgt_data, src_dictionary, tgt_dictionary=None, labels=None, max_samples=32, max_tokens=99999999, strict_token_limit=False, shuffle=True, group_by_length='source', src_left_aligned=True, tgt_left_aligned=True)

Bases: object

Wraps two lists of sequences as a batch iterator.

This is useful for sequence-to-sequence problems or sentence pairs classification (entailment, paraphrase detection…). Following seq2seq conventions the first sequence is referred to as the “source” and the second as the “target”.

You can then iterate over this object and get tuples of src_batch, tgt_batch ready for use in your computation graph.

Example:

# Dictionary
dic = dynn.data.dictionary.Dictionary(symbols="abcde".split())
# 1000 source sequences of various lengths up to 10
src_data = [np.random.randint(len(dic), size=np.random.randint(10))
            for _ in range(1000)]
# 1000 target sequences of various lengths up to 10
tgt_data = [np.random.randint(len(dic), size=np.random.randint(10))
            for _ in range(1000)]
# Iterator with at most 20 samples or 50 tokens per batch
batched_dataset = SequencePairsBatches(
    src_data, tgt_data, max_samples=20
)
# Training loop
for x, y in batched_dataset:
    # x and y are SequenceBatch objects
Parameters:
  • src_data (list) – List of source sequences (list of int iterables)
  • tgt_data (list) – List of target sequences (list of int iterables)
  • src_dictionary (Dictionary) – Source dictionary
  • tgt_dictionary (Dictionary) – Target dictionary
  • max_samples (int, optional) – Maximum number of samples per batch (one sample is a pair of sentences)
  • max_tokens (int, optional) – Maximum number of total tokens per batch (source + target tokens)
  • strict_token_limit (bool, optional) – Padding tokens will count towards the max_tokens limit
  • shuffle (bool, optional) – Shuffle the dataset whenever starting a new iteration (default: True)
  • group_by_length (str, optional) – Group sequences by length. One of "source" or "target". This minimizes the number of padding tokens. The batches are not strictly IID though.
  • src_left_aligned (bool, optional) – Align the source sequences to the left
  • tgt_left_aligned (bool, optional) – Align the target sequences to the left
__getitem__(index)

Returns the index th sample

The result is a tuple src_batch, tgt_batch where each is a batch_data is a SequenceBatch object

Parameters:index (int, slice) – Index or slice
Returns:src_batch, tgt_batch
Return type:tuple
__init__(src_data, tgt_data, src_dictionary, tgt_dictionary=None, labels=None, max_samples=32, max_tokens=99999999, strict_token_limit=False, shuffle=True, group_by_length='source', src_left_aligned=True, tgt_left_aligned=True)

Initialize self. See help(type(self)) for accurate signature.

__len__()

This returns the number of batches in the dataset (not the total number of samples)

Returns:
Number of batches in the dataset
ceil(len(data)/batch_size)
Return type:int
__weakref__

list of weak references to the object (if defined)

just_passed_multiple(batch_number)

Checks whether the current number of batches processed has just passed a multiple of batch_number.

For example you can use this to report at regular interval (eg. every 10 batches)

Parameters:batch_number (int) – [description]
Returns:True if \(\fraccurrent_batch\)
Return type:bool
percentage_done()

What percent of the data has been covered in the current epoch

reset()

Reset the iterator and shuffle the dataset if applicable

class dynn.data.batching.sequence_batch.SequenceBatch(sequences, original_idxs=None, pad_idx=None, left_aligned=True)

Bases: object

Batched sequence object with padding

This wraps a list of integer sequences into a nice array padded to the longest sequence. The batch dimension (number of sequences) is the last dimension.

By default the sequences are padded to the right which means that they are aligned to the left (they all start at index 0)

Parameters:
  • sequences (list) – List of list of integers
  • original_idxs (list) – This list should point to the original position of each sequence in the data (before shuffling/reordering). This is useful when you want to access information that has been discarded during preprocessing (eg original sentence before numberizing and <unk> ing in MT).
  • pad_idx (int) – Default index for padding
  • left_aligned (bool, optional) – Align to the left (all sequences start at the same position).
__init__(sequences, original_idxs=None, pad_idx=None, left_aligned=True)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

collate(sequences)

Pad and concatenate sequences to an array

Args: sequences (list): List of list of integers pad_idx (int): Default index for padding

Returns:max_len x batch_size array
Return type:np.ndarray
get_mask(base_val=1, mask_val=0)

Return a mask expression with specific values for padding tokens.

This will return an expression of the same shape as self.sequences where the i th element of batch b is base_val iff i<=lengths[b] (and mask_val otherwise).

For example, if size is 4 and lengths is [1,2,4] then the returned mask will be:

(here each row is a batch element)

Parameters:
  • base_val (int, optional) – Value of the mask for non-masked indices (typically 1 for multiplicative masks and 0 for additive masks). Defaults to 1.
  • mask_val (int, optional) – Value of the mask for masked indices (typically 0 for multiplicative masks and -inf for additive masks). Defaults to 0.
Submodules
Amazon elec dataset

Various functions for accessing the Amazon Reviews dataset.

dynn.data.amazon.download_amazon(path='.', force=False)

Downloads the Amazon from “http://riejohnson.com/software/

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.amazon.load_amazon(path, tok=True, size='200k')

Loads the Amazon dataset

Returns the train, dev and test sets in a dictionary, each as a tuple of containing the reviews and the labels.

Parameters:path (str) – Path to the folder containing the elec2.tar.gz file
Returns:
Dictionary containing the train and test sets
(dictionary of review/labels tuples)
Return type:dict
dynn.data.amazon.read_amazon(split, path, tok=True, size='200k')

Iterates over the Amazon dataset

Example:

for review, label in read_amazon("train", "/path/to/amazon"):
    train(review, label)
Parameters:
  • split (str) – Either "train", "dev" or "test"
  • path (str) – Path to the folder containing the elec2.tar.gz files
Returns:

review, label

Return type:

tuple

Functions for Dataset Caching
dynn.data.caching.cached_to_file(filename)

Decorator to cache the output of a function to a file

Sometimes your workflow will contain functions that are executed once but take a lot of time (typically data preprocessing). This can be annoying when eg. running multiple experiments different parameters. This decorator provides a solution by running the function once, then saving its output to a file. Next time you called this function, and unless the file in question has been deleted, the function will just read its result from the file instead of recomputing everything.

Caveats: - By default if you call the decorated function with different arguments,

this will still load the cached output from the first function call with the original arguments. You need to add the update_cache=True keyword argument to force the function to be rerun. Incidentally the decorated function should not have an argument named update_cache.
  • The serialization is done with pickle, so:
    1. it isn’t super secure (if you care about these things)
    2. it only handles functions where the outputs can be pickled (for now). Typically this wouldn’t work for dynet objects.

Example usage:

@cached_to_file("preprocessed_data.bin")
def preprocess(raw_data):
    # do a lot of preprocessing

# [...] do something else

# This first call will run the function and pickle its output to
# "preprocessed_data.bin" (and return the output)
data = preprocess(raw_data)

# [...] do something else, or maybe rerun the program

# This will just load the output from "preprocessed_data.bin"
data = preprocess(raw_data)

# [...] do something else, or maybe rerun the program

# This will force the function to be rerun and the cached output to be
# updated. You should to that if for example the arguments of
# `preprocess` are expected to change
data = preprocess(raw_data, update_cache=True)
Parameters:filename (str) – Name of the file where the cached output should be saved to.
CIFAR10

Various functions for accessing the CIFAR10 dataset.

dynn.data.cifar10.download_cifar10(path='.', force=False)

Downloads CIFAR10 from “https://www.cs.toronto.edu/~kriz/cifar.html

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.cifar10.load_cifar10(path)

Loads the CIFAR10 dataset

Returns the train and test set, each as a list of images and a list of labels. The images are represented as numpy arrays and the labels as integers.

Parameters:path (str) – Path to the folder containing the *-ubyte.gz files
Returns:train and test sets
Return type:tuple
dynn.data.cifar10.read_cifar10(split, path)

Iterates over the CIFAR10 dataset

Example:

for image in read_cifar10("train", "/path/to/cifar10"):
    train(image)
Parameters:
  • split (str) – Either "training" or "test"
  • path (str) – Path to the folder containing the *-ubyte files
Returns:

image, label

Return type:

tuple

Data utilities

Helper functions to download and manage datasets.

dynn.data.data_util.download_if_not_there(file, url, path, force=False, local_file=None)

Downloads a file from the given url if and only if the file doesn’t already exist in the provided path or force=True

Parameters:
  • file (str) – File name
  • url (str) – Url where the file can be found (without the filename)
  • path (str) – Path to the local folder where the file should be stored
  • force (bool, optional) – Force the file download (useful if you suspect that the file might have changed)
  • file – File name for the local file (defaults to file)
Dictionary

Dictionary object for holding string to index mappings

IWSLT

Various functions for accessing the IWSLT translation datasets

dynn.data.iwslt.download_iwslt(path='.', year='2016', langpair='de-en', force=False)

Downloads the IWSLT from “https://wit3.fbk.eu/archive/

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • year (str, optional) – IWSLT year (for now only 2016 is supported)
  • langpair (str, optional) – src-tgt language pair (for now only {de,fr}-en are supported)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.iwslt.load_iwslt(path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')

Loads the IWSLT dataset

Returns the train, dev and test set, each as lists of source and target sentences.

Parameters:
  • path (str) – Path to the folder containing the .tgz file
  • year (str, optional) – IWSLT year (for now only 2016 is supported)
  • langpair (str, optional) – src-tgt language pair (for now only {de,fr}-en are supported)
  • src_eos (str, optional) – Optionally append an end of sentence token to each source line.
  • tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns:

train, dev and test sets

Return type:

tuple

dynn.data.iwslt.read_iwslt(split, path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')

Iterates over the IWSLT dataset

Example:

for src, tgt in read_iwslt("train", "/path/to/iwslt"):
    train(src, tgt)
Parameters:
  • split (str) – Either "train", "dev" or "test"
  • path (str) – Path to the folder containing the .tgz file
  • year (str, optional) – IWSLT year (for now only 2016 is supported)
  • langpair (str, optional) – src-tgt language pair (for now only {de,fr}-en are supported)
  • src_eos (str, optional) – Optionally append an end of sentence token to each source line.
  • tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns:

Source sentence, Target sentence

Return type:

tuple

MNIST

Various functions for accessing the MNIST dataset.

dynn.data.mnist.download_mnist(path='.', force=False)

Downloads MNIST from “http://yann.lecun.com/exdb/mnist/

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.mnist.load_mnist(path)

Loads the MNIST dataset

Returns MNIST as a dictionary.

Example:

mnist = load_mnist(".")
# Train images and labels
train_imgs, train_labels = mnist["train"]
# Test images and labels
test_imgs, test_labels = mnist["test"]

The images are represented as numpy arrays and the labels as integers.

Parameters:path (str) – Path to the folder containing the *-ubyte.gz files
Returns:MNIST dataset
Return type:dict
dynn.data.mnist.read_mnist(split, path)

Iterates over the MNIST dataset

Example:

for image in read_mnist("train", "/path/to/mnist"):
    train(image)
Parameters:
  • split (str) – Either "training" or "test"
  • path (str) – Path to the folder containing the *-ubyte files
Returns:

image, label

Return type:

tuple

Preprocessing functions

Usful functions for preprocessing data

dynn.data.preprocess.lowercase(data)

Lowercase text

Parameters:data (list,str) – Data to lowercase (either a string or a list [of lists..] of strings)
Returns:Lowercased data
Return type:list, str
dynn.data.preprocess.normalize(data)

Normalize the data to mean 0 std 1

Parameters:data (list,np.ndarray) – data to normalize
Returns:Normalized data
Return type:list,np.array
dynn.data.preprocess.tokenize(data, tok='space', lang='en')

Tokenize text data.

There are 5 tokenizers supported:

  • “space”: split along whitespaces
  • “char”: split in characters
  • “13a”: Official WMT tokenization
  • “zh”: Chinese tokenization (See sacrebleu doc)
  • “moses”: Moses tokenizer (you can specify lthe language).
    Uses the sacremoses
Parameters:
  • data (list, str) – String or list (of lists…) of strings.
  • tok (str, optional) – Tokenization. Defaults to “space”.
  • lang (str, optional) – Language (only useful for the moses tokenizer). Defaults to “en”.
Returns:

Tokenized data

Return type:

list, str

Penn TreeBank

Various functions for accessing the PTB dataset used by Mikolov et al., 2010.

dynn.data.ptb.download_ptb(path='.', force=False)

Downloads the PTB from “http://www.fit.vutbr.cz/~imikolov/rnnlm

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.ptb.load_ptb(path, eos=None)

Loads the PTB dataset

Returns the train and test set, each as a list of images and a list of labels. The images are represented as numpy arrays and the labels as integers.

Parameters:
  • path (str) – Path to the folder containing the simple-examples.tar.gz file
  • eos (str, optional) – Optionally append an end of sentence token to each line
Returns:

dictionary mapping the split name to a list of strings

Return type:

dict

dynn.data.ptb.read_ptb(split, path, eos=None)

Iterates over the PTB dataset

Example:

for sent in read_ptb("train", "/path/to/ptb"):
    train(sent)
Parameters:
  • split (str) – Either "train", "dev" or "test"
  • path (str) – Path to the folder containing the simple-examples.tar.gz file
  • eos (str, optional) – Optionally append an end of sentence token to each line
Returns:

tree, label

Return type:

tuple

Stanford Natural Language Inference

Various functions for accessing the SNLI dataset.

dynn.data.snli.download_snli(path='.', force=False)

Downloads the SNLI from “https://nlp.stanford.edu/projects/snli/

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.snli.load_snli(path, terminals_only=True, binary=False)

Loads the SNLI dataset

Returns the train, dev and test sets in a dictionary, each as a tuple of containing the trees and the labels.

Parameters:
  • path (str) – Path to the folder containing the snli_1.0.zip file
  • terminals_only (bool) – Only return the terminals and not the trees
Returns:

Dictionary containing the train, dev and test sets

(tuple of tree/labels tuples)

Return type:

dict

dynn.data.snli.read_snli(split, path, terminals_only=True, binary=False)

Iterates over the SNLI dataset

Example:

for tree, label in read_snli("train", "/path/to/snli"):
    train(tree, label)
Parameters:
  • split (str) – Either "train", "dev" or "test"
  • path (str) – Path to the folder containing the snli_1.0.zip files
  • terminals_only (bool) – Only return the terminals and not the trees
Returns:

tree, label

Return type:

tuple

Stanford Sentiment TreeBank

Various functions for accessing the SST dataset.

dynn.data.sst.download_sst(path='.', force=False)

Downloads the SST from “https://nlp.stanford.edu/sentiment/

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.sst.load_sst(path, terminals_only=True, binary=False)

Loads the SST dataset

Returns the train, dev and test sets in a dictionary, each as a tuple of containing the trees and the labels.

Parameters:
  • path (str) – Path to the folder containing the trainDevTestTrees_PTB.zip file
  • terminals_only (bool) – Only return the terminals and not the tree
  • binary (bool) – Binary SST (only positive and negative labels). Neutral lables are discarded
Returns:

Dictionary containing the train, dev and test sets

(tuple of tree/labels tuples)

Return type:

dict

dynn.data.sst.read_sst(split, path, terminals_only=True, binary=False)

Iterates over the SST dataset

Example:

for tree, label in read_sst("train", "/path/to/sst"):
    train(tree, label)
Parameters:
  • split (str) – Either "train", "dev" or "test"
  • path (str) – Path to the folder containing the trainDevTestTrees_PTB.zip files
  • terminals_only (bool) – Only return the terminals and not the tree
  • binary (bool) – Binary SST (only positive and negative labels). Neutral lables are discarded
Returns:

tree, label

Return type:

tuple

Trees

Helper functions to handle tree-structured data

class dynn.data.trees.Tree(label, children=None)

Bases: object

Tree object for syntax trees

__init__(label, children=None)

Initialize self. See help(type(self)) for accurate signature.

__str__()

Return str(self).

__weakref__

list of weak references to the object (if defined)

static from_string(string, labelled=True)

Reads linearized tree from string

Parameters:string (str) – Linearized tree
Returns:Tree object
Return type:Tree
WikiText

Various functions for accessing the WikiText datasets (WikiText-2 and WikiText-103).

dynn.data.wikitext.download_wikitext(path='.', name='2', force=False)

Downloads the WikiText from “http://www.fit.vutbr.cz/~imikolov/rnnlm

Parameters:
  • path (str, optional) – Local folder (defaults to “.”)
  • force (bool, optional) – Force the redownload even if the files are already at path
dynn.data.wikitext.load_wikitext(path, name='2', eos=None)

Loads the WikiText dataset

Returns the train, validation test set, each as a list of sentences (each sentence is a list of words)

Parameters:
  • path (str) – Path to the folder containing the wikitext-{2|103}-v1.zip file
  • name (str) – Either "2" or "103"
  • eos (str, optional) – Optionally append an end of sentence token to each line
Returns:

dictionary mapping the split name to a list of strings

Return type:

dict

dynn.data.wikitext.read_wikitext(split, path, name='2', eos=None)

Iterates over the WikiText dataset

Example:

for sent in read_wikitext("train", "/path/to/wikitext"):
    train(sent)
Parameters:
  • split (str) – Either "train", "valid" or "test"
  • path (str) – Path to the folder containing the wikitext-{2|103}-v1.zip files
  • eos (str, optional) – Optionally append an end of sentence token to each line
Returns:

list of words

Return type:

list

dynn.layers package
Layers

Layers are the standard unit of neural models in DyNN. Layers are typically used like this:

# Instantiate layer
layer = Layer(parameter_collection, *args, **kwargs)
# [...]
# Renew computation graph
dy.renew_cg()
# Initialize layer
layer.init(*args, **kwargs)
# Apply layer forward pass
y = layer(x)
class dynn.layers.BaseLayer(name)

Bases: object

Base layer interface

__call__(*args, **kwargs)

Execute forward pass

__init__(name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

init(test=True, update=False)

Initialize the layer before performing computation

For example setup dropout, freeze some parameters, etc…

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

sublayers

Returns all attributes of the layer which are layers themselves

class dynn.layers.ParametrizedLayer(pc, name)

Bases: dynn.layers.base_layers.BaseLayer

This is the base class for layers with trainable parameters

When implementing a ParametrizedLayer, use self.add_parameters / self.add_lookup_parameters to add parameters to the layer.

__init__(pc, name)

Creates a subcollection for this layer with a custom name

add_lookup_parameters(name, dim, lookup_param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)

This adds a parameter to this layer’s parametercollection

The layer will have 1 new attribute: self.[name] which will contain the lookup parameter object (which you should use in __call__).

You can provide an existing lookup parameter with the lookup_param argument, in which case this parameter will be reused.

The other arguments are the same as dynet.ParameterCollection.add_lookup_parameters

add_parameters(name, dim, param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)

This adds a parameter to this layer’s ParameterCollection.

The layer will have 1 new attribute: self.[name] which will contain the expression for this parameter (which you should use in __call__).

You can provide an existing parameter with the param argument, in which case this parameter will be reused.

The other arguments are the same as dynet.ParameterCollection.add_parameters

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

lookup_parameters

Return all lookup parameters specific to this layer

parameters

Return all parameters specific to this layer

class dynn.layers.Lambda(function)

Bases: dynn.layers.base_layers.BaseLayer

This layer applies an arbitrary function to its input.

Lambda(f)(x) == f(x)

This is useful if you want to wrap activation functions as layers. The unary operation should be a function taking dynet.Expression to dynet.Expression.

You shouldn’t use this to stack layers though, op oughtn’t be a layer. If you want to stack layers, use combination_layers.Sequential.

Parameters:
__call__(*args, **kwargs)

Returns function(*args, **kwargs)

__init__(function)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Affine(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Densely connected layer

\(y=f(Wx+b)\)

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output dimension
  • activation (function, optional) – activation function (default: :py:function:`identity`)
  • dropout (float, optional) – Dropout rate (default 0)
  • nobias (bool, optional) – Omit the bias (default False)
__call__(x)

Forward pass.

Parameters:x (dynet.Expression) – Input expression (a vector)
Returns:\(y=f(Wx+b)\)
Return type:dynet.Expression
__init__(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Embeddings(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Layer for embedding elements of a dictionary

Example:

# Dictionary
dic = dynn.data.dictionary.Dictionary(symbols=["a", "b"])
# Parameter collection
pc = dy.ParameterCollection()
# Embedding layer of dimension 10
embed = Embeddings(pc,dic, 10)
# Initialize
dy.renew_cg()
embed.init()
# Return a batch of 2 10-dimensional vectors
vectors = embed([dic.index("b"), dic.index("a")])
Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • dictionary (dynn.data.dictionary.Dictionary) – Mapping from symbols to indices
  • embed_dim (int) – Embedding dimension
  • init (dynet.PyInitializer, optional) – How to initialize the parameters. By default this will initialize to \(\mathcal N(0, \frac{\)}{sqrt{textt{embed_dim}}})`
  • pad_mask (float, optional) – If provided, embeddings of the dictionary.pad_idx index will be masked with this value
__call__(idxs, length_dim=0)

Returns the input’s embedding

If idxs is a list this returns a batch of embeddings. If it’s a numpy array of shape N x b it returns a batch of b N x embed_dim matrices

Parameters:idxs (list,int) – Index or list of indices to embed
Returns:Batch of embeddings
Return type:dynet.Expression
__init__(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)

Creates a subcollection for this layer with a custom name

weights

Numpy array containing the embeddings

The first dimension is the lookup dimension

class dynn.layers.Residual(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)

Bases: dynn.layers.base_layers.BaseLayer

Adds residual connections to a layer

__call__(*args, **kwargs)

Execute forward pass

__init__(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.RecurrentCell(*args, **kwargs)

Bases: object

Base recurrent cell interface

Recurrent cells must provide a default initial value for their recurrent state (eg. all zeros)

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

get_output(state)

Get the cell’s output from the list of states.

For example this would return h from h,c in the case of the LSTM

initial_value(batch_size=1)

Initial value of the recurrent state. Should return a list.

class dynn.layers.StackedRecurrentCells(*cells)

Bases: dynn.layers.base_layers.BaseLayer, dynn.layers.recurrent_layers.RecurrentCell

This implements a stack of recurrent layers

The recurrent state of the resulting cell is the list of the states of all the sub-cells. For example for a stack of 2 LSTM cells the resulting state will be [h_1, c_1, h_2, c_2]

Example:

# Parameter collection
pc = dy.ParameterCollection()
# Stacked recurrent cell
stacked_cell = StackedRecurrentCells(
    LSTM(pc, 10, 15),
    LSTM(pc, 15, 5),
    ElmanRNN(pc, 5, 20),
)
# Inputs
dy.renew_cg()
x = dy.random_uniform(10, batch_size=5)
# Initialize layer
stacked_cell.init(test=False)
# Initial state: [h_1, c_1, h_2, c_2, h_3] of sizes [15, 15, 5, 5, 20]
init_state = stacked_cell.initial_value()
# Run the cell on the input.
new_state = stacked_cell(x, *init_state)
# Get the final output (h_3 of size 20)
h = stacked_cell.get_output(new_state)
__call__(x, *state)

Compute the cell’s output from the list of states and an input expression

Parameters:x (dynet.Expression) – Input vector
Returns:new recurrent state
Return type:list
__init__(*cells)

Initialize self. See help(type(self)) for accurate signature.

get_output(state)

Get the output of the last cell

initial_value(batch_size=1)

Initial value of the recurrent state.

class dynn.layers.ElmanRNN(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer, dynn.layers.recurrent_layers.RecurrentCell

The standard Elman RNN cell:

\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output (hidden) dimension
  • activation (function, optional) – Activation function \(sigma\) (default: dynn.activations.tanh())
  • dropout (float, optional) – Dropout rate (default 0)
__call__(x, h)

Perform the recurrent update.

Parameters:
Returns:

Next recurrent state

\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)

Return type:

dynet.Expression

__init__(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)

Creates a subcollection for this layer with a custom name

get_output(state)

Get the cell’s output from the list of states.

For example this would return h from h,c in the case of the LSTM

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

initial_value(batch_size=1)

Return a vector of dimension hidden_dim filled with zeros

Returns:Zero vector
Return type:dynet.Expression
class dynn.layers.LSTM(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer, dynn.layers.recurrent_layers.RecurrentCell

Standard LSTM

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output (hidden) dimension
  • dropout_x (float, optional) – Input dropout rate (default 0)
  • dropout_h (float, optional) – Recurrent dropout rate (default 0)
__call__(x, h, c)

Perform the recurrent update.

Parameters:
Returns:

dynet.Expression for the ext recurrent states

h and c

Return type:

tuple

__init__(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)

Creates a subcollection for this layer with a custom name

get_output(state)

Get the cell’s output from the list of states.

For example this would return h from h,c in the case of the LSTM

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

initial_value(batch_size=1)

Return two vectors of dimension hidden_dim filled with zeros

Returns:two zero vectors for \(h_0\) and \(c_0\)
Return type:tuple
class dynn.layers.StackedLSTM(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)

Bases: dynn.layers.recurrent_layers.StackedRecurrentCells

Stacked LSTMs

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • num_layers (int) – Number of layers
  • input_dim (int) – Input dimension
  • output_dim (int) – Output (hidden) dimension
  • dropout_x (float, optional) – Input dropout rate (default 0)
  • dropout_h (float, optional) – Recurrent dropout rate (default 0)
__init__(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Transduction(layer)

Bases: dynn.layers.base_layers.BaseLayer

Feed forward transduction layer

This layer runs one cell on a sequence of inputs and returns the list of outputs. Calling it is equivalent to calling:

[layer(x) for x in input_sequence]
Parameters:cell (base_layers.BaseLayer) – The recurrent cell to use for transduction
__call__(input_sequence)

Runs the layer over the input

The output is a list of the output of the layer at each step

Parameters:input_sequence (list) – Input as a list of dynet.Expression objects
Returns:List of recurrent states (depends on the recurrent layer)
Return type:list
__init__(layer)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Unidirectional(cell, output_only=False)

Bases: dynn.layers.base_layers.BaseLayer

Unidirectional transduction layer

This layer runs a recurrent cell on a sequence of inputs and produces resulting the sequence of recurrent states.

Example:

# LSTM cell
lstm_cell = dynn.layers.LSTM(dy.ParameterCollection(), 10, 10)
# Transduction layer
lstm = dynn.layers.Unidirectional(lstm_cell)
# Inputs
dy.renew_cg()
xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)]
# Initialize layer
lstm.init(test=False)
# Transduce forward
states = lstm(xs)
# Retrieve last h
h_final = states[-1][0]
Parameters:
  • cell (recurrent_layers.RecurrentCell) – The recurrent cell to use for transduction
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
__call__(input_sequence, backward=False, lengths=None, left_padded=True, output_only=None, initial_state=None)

Transduces the sequence using the recurrent cell.

The output is a list of the output states at each step. For instance in an LSTM the output is (h1, c1), (h2, c2), ...

This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.

Parameters:
  • input_sequence (list) – Input as a list of dynet.Expression objects
  • backward (bool, optional) – If this is True the sequence will be processed from left to right. The output sequence will still be in the same order as the input sequence though.
  • lengths (list, optional) – If the expressions in the sequence are batched, but have different lengths, this should contain a list of the sequence lengths (default: None)
  • left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
  • initial_state (dy.Expression, optional) – Overridies the default initial state of the recurrent cell
Returns:

List of recurrent states (depends on the recurrent layer)

Return type:

list

__init__(cell, output_only=False)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Bidirectional(forward_cell, backward_cell, output_only=False)

Bases: dynn.layers.base_layers.BaseLayer

Bidirectional transduction layer

This layer runs a recurrent cell on in each direction on a sequence of inputs and produces resulting the sequence of recurrent states.

Example:

# Parameter collection
pc = dy.ParameterCollection()
# LSTM cell
fwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10)
bwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10)
# Transduction layer
bilstm = dynn.layers.Bidirectional(fwd_lstm_cell, bwd_lstm_cell)
# Inputs
dy.renew_cg()
xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)]
# Initialize layer
bilstm.init(test=False)
# Transduce forward
fwd_states, bwd_states = bilstm(xs)
# Retrieve last h
fwd_h_final = fwd_states[-1][0]
# For the backward LSTM the final state is at
# the beginning of the sequence (assuming left padding)
bwd_h_final = fwd_states[0][0]
Parameters:
__call__(input_sequence, lengths=None, left_padded=True, output_only=None, fwd_initial_state=None, bwd_initial_state=None)

Transduces the sequence in both directions

The output is a tuple forward_states, backward_states where each forward_states is a list of the output states of the forward recurrent cell at each step (and backward_states for the backward cell). For instance in a BiLSTM the output is [(fwd_h1, fwd_c1), ...], [(bwd_h1, bwd_c1), ...]

This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.

Parameters:
  • input_sequence (list) – Input as a list of dynet.Expression objects
  • lengths (list, optional) – If the expressions in the sequence are batched, but have different lengths, this should contain a list of the sequence lengths (default: None)
  • left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
  • fwd_initial_state (dy.Expression, optional) – Overridies the default initial state of the forward recurrent cell.
  • bwd_initial_state (dy.Expression, optional) – Overridies the default initial state of the backward recurrent cell.
Returns:

List of forward and backward recurrent states

(depends on the recurrent layer)

Return type:

tuple

__init__(forward_cell, backward_cell, output_only=False)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.MaxPool1D(kernel_size=None, stride=1)

Bases: dynn.layers.base_layers.BaseLayer

1D max pooling

Parameters:
  • kernel_size (int, optional) – Default kernel size. If this is not specified, the default is to pool over the full sequence (default: None)
  • stride (int, optional) – Default temporal stride (default: 1)
__call__(x, kernel_size=None, stride=None)

Max pooling over the first dimension.

This takes either a list of N d-dimensional vectors or a N x d matrix.

The output will be a matrix of dimension (N - kernel_size + 1) // stride x d

Parameters:
  • x (dynet.Expression) – Input matrix or list of vectors
  • dim (int, optional) – The reduction dimension (default: 0)
  • kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
  • stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns:

Pooled sequence.

Return type:

dynet.Expression

__init__(kernel_size=None, stride=1)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.MaxPool2D(kernel_size=None, strides=None)

Bases: dynn.layers.base_layers.BaseLayer

2D max pooling.

Parameters:
  • kernel_size (list, optional) – Default kernel size. This is a list of two elements, one per dimension. If either is not specified, the default is to pool over the entire dimension (default: [None, None])
  • strides (list, optional) – Stride along each dimension (list of size 2, defaults to [1, 1]).
__call__(x, kernel_size=None, strides=None)

Max pooling over the first dimension.

If either of the kernel_size elements is not specified, the pooling will be done over the full dimension (and the stride is ignored)

Parameters:
  • x (dynet.Expression) – Input image (3-d tensor) or matrix.
  • kernel_size (list, optional) – Size of the pooling kernel. If this is not specified, the default specified in the constructor is used.
  • strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns:

Pooled sequence.

Return type:

dynet.Expression

__init__(kernel_size=None, strides=None)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.MeanPool1D(kernel_size=None, stride=1)

Bases: dynn.layers.base_layers.BaseLayer

1D mean pooling.

The stride and kernel size arguments are here for consistency with MaxPooling1D but they are unsupported for now.

Parameters:
  • kernel_size (int, optional) – Default kernel size. If this is not specified, the default is to pool over the full sequence (default: None)
  • stride (int, optional) – Default temporal stride (default: 1)
__call__(x, kernel_size=None, stride=None, lengths=None)

Mean pooling over the first dimension.

This takes either a list of N d-dimensional vectors or a N x d matrix.

The output will be a matrix of dimension (N - kernel_size + 1) // stride x d

Parameters:
  • x (dynet.Expression) – Input matrix or list of vectors
  • dim (int, optional) – The reduction dimension (default: 0)
  • kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
  • stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns:

Pooled sequence.

Return type:

dynet.Expression

__init__(kernel_size=None, stride=1)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.MLPAttention(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Multilayer Perceptron based attention

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • query_dim (int) – Queries dimension
  • key_dim (int) – Keys dimension
  • hidden_dim (int) – Hidden dimension of the MLP
  • activation (function, optional) – MLP activation (defaults to tanh).
  • dropout (float, optional) – Attention dropout (defaults to 0)
__call__(query, keys, values, mask=None)

Compute attention scores and return the pooled value

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.BilinearAttention(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Bilinear attention layer.

Here the scores are computed according to

\[\alpha_{ij}=q_i^\intercal A k_j\]

Where \(q_i,k_j\) are the ith query and jth key respectively. If dot_product is set to True this is replaced by:

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Where \(d\) is the dimension of the keys and queries.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • query_dim (int) – Queries dimension
  • key_dim (int) – Keys dimension
  • dot_product (bool, optional) – Compute attention with the dot product only (no weight matrix). The requires that query_dim==key_dim.
  • dropout (float, optional) – Attention dropout (defaults to 0)
  • A_p (dynet.Parameters, optional) – Specify the weight matrix directly.
__call__(query, keys, values, mask=None)

Compute attention scores and return the pooled value.

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.MultiHeadAttention(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Multi headed attention layer.

This functions like dot product attention

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Except the key, query and values are split into multiple heads.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_heads (int) – Number of heads
  • query_dim (int) – Dimension of queries
  • key_dim (int) – Dimension of keys
  • values_dim (int) – Dimension of values
  • hidden_dim (int) – Hidden dimension (must be a multiple of n_heads)
  • out_dim (bool, optional) – Output dimension
  • dropout (float, optional) – Attention dropout (defaults to 0)
  • Wq_p (dynet.Parameters, optional) – Specify the queries projection matrix directly.
  • Wk_p (dynet.Parameters, optional) – Specify the keys projection matrix directly.
  • Wv_p (dynet.Parameters, optional) – Specify the values projection matrix directly.
  • Wo_p (dynet.Parameters, optional) – Specify the output projection matrix directly.
__call__(queries, keys, values, mask=None)

Compute attention weightss and return the pooled value.

This expects the queries, keys and values to have dimensions dq x l, dk x L, dv x L respectively.

Returns both the pooled value and the attention weights (list of weights, one per head). You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Conv1D(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

1D convolution along the first dimension

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • num_kernels (int) – Number of kernels (essentially the output dimension)
  • kernel_width (int) – Width of the kernels
  • activation (function, optional) – activation function (default: identity)
  • dropout (float, optional) – Dropout rate (default 0)
  • nobias (bool, optional) – Omit the bias (default False)
  • zero_padded (bool, optional) – Default padding behaviour. Pad the input with zeros so that the output has the same length (default True)
  • stride (list, optional) – Default stride along the length (defaults to 1).
__call__(x, stride=None, zero_padded=None)

Forward pass

Parameters:
  • x (dynet.Expression) – Input expression with the shape (length, input_dim)
  • stride (int, optional) – Stride along the temporal dimension
  • zero_padded (bool, optional) – Pad the image with zeros so that the output has the same length (default True)
Returns:

Convolved sequence.

Return type:

dynet.Expression

__init__(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Conv2D(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

2D convolution

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • num_channels (int) – Number of channels in the input image
  • num_kernels (int) – Number of kernels (essentially the output dimension)
  • kernel_size (list, optional) – Default kernel size. This is a list of two elements, one per dimension.
  • activation (function, optional) – activation function (default: identity)
  • dropout (float, optional) – Dropout rate (default 0)
  • nobias (bool, optional) – Omit the bias (default False)
  • zero_padded (bool, optional) – Default padding behaviour. Pad the image with zeros so that the output has the same width/height (default True)
  • strides (list, optional) – Default stride along each dimension (list of size 2, defaults to [1, 1]).
__call__(x, strides=None, zero_padded=None)

Forward pass

Parameters:
  • x (dynet.Expression) – Input image (3-d tensor) or matrix.
  • zero_padded (bool, optional) – Pad the image with zeros so that the output has the same width/height. If this is not specified, the default specified in the constructor is used.
  • strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns:

Convolved image.

Return type:

dynet.Expression

__init__(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Flatten

Bases: dynn.layers.base_layers.BaseLayer

Flattens the output such that there is only one dimension left (batch dimension notwithstanding)

Example:

# Create the layer
flatten = Flatten()
# Dummy batched 2d input
x = dy.zeros((3, 4), batch_size=7)
# x.dim() -> (3, 4), 7
y = flatten(x)
# y.dim() -> (12,), 7
__call__(x)

Flattens the output such that there is only one dimension left (batch dimension notwithstanding)

Parameters:x ([type]) – [description]
Returns:[description]
Return type:[type]
__init__()

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.LayerNorm(pc, input_dim, gain=None, bias=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Layer normalization layer:

\(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\)

Parameters:
__call__(x, d=None)

Layer-normalize the input.

Parameters:x (dynet.Expression) – Input expression
Returns:\(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\)
Return type:dynet.Expression
__init__(pc, input_dim, gain=None, bias=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Sequential(*layers, default_return_last_only=True)

Bases: dynn.layers.base_layers.BaseLayer

A helper class to stack layers into deep networks.

Parameters:
  • layers (list) – A list of dynn.layers.BaseLayer objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. the output of each layer can be fed into the next one)
  • default_return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
__call__(x, return_last_only=None)

Calls all the layers in succession.

Computes layers[n-1](layers[n-2](...layers[0](x)))

Parameters:
  • x (dynet.Expression) – Input expression
  • return_last_only (bool, optional) – Overrides the default
Returns:

Depending on

return_last_only, returns either the last expression or a list of all the layer’s outputs (first to last)

Return type:

dynet.Expression, list

__init__(*layers, default_return_last_only=True)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Parallel(*layers, dim=0, default_insert_dim=False)

Bases: dynn.layers.base_layers.BaseLayer

A helper class to run layers on the same input and concatenate their outputs

This can be used to create 2d conv layers with multiple kernel sizes by concatenating multiple dynn.layers.Conv2D .

Parameters:
  • layers (list) – A list of dynn.layers.BaseLayer objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. each layer takes the same input and the outputs have the same shape everywhere except along the concatenation dimension)
  • dim (int) – The concatenation dimension
  • default_insert_dim (bool, optional) – Instead of concatenating along an existing dimension, insert a a new dimension at dim and concatenate.
__call__(x, insert_dim=None, **kwargs)

Calls all the layers in succession.

Computes dy.concatenate([layers[0](x)...layers[n-1](x)], d=dim)

Parameters:
  • x (dynet.Expression) – Input expression
  • default_insert_dim (bool, optional) – Override the default
Returns:

Depending on

return_last_only, returns either the last expression or a list of all the layer’s outputs (first to last)

Return type:

dynet.Expression, list

__init__(*layers, dim=0, default_insert_dim=False)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Transformer(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.base_layers.ParametrizedLayer

Transformer layer.

As described in Vaswani et al. (2017) This is the “encoder” side of the transformer, ie self attention only.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Hidden dimension (used everywhere)
  • n_heads (int) – Number of heads for self attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False)

Run the transformer layer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
  • left_aligned (bool, optional) – Defaults to True. Used for masking
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers)
  • return_att (bool, optional) – Defaults to False. Return the self attention weights
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Creates a subcollection for this layer with a custom name

class dynn.layers.StackedTransformers(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.combination_layers.Sequential

Multilayer transformer.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_layers (int) – Number of layers
  • input_dim (int) – Hidden dimension (used everywhere)
  • n_heads (int) – Number of heads for self attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False, return_last_only=True)

Run the multilayer transformer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
  • left_aligned (bool, optional) – Defaults to True. USed for masking
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers)
  • return_att (bool, optional) – Defaults to False. Return the self attention weights
  • return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.CondTransformer(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.base_layers.ParametrizedLayer

Conditional transformer layer.

As described in Vaswani et al. (2017) This is the “decoder” side of the transformer, ie self attention + attention to context.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Hidden dimension (used everywhere)
  • cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
  • n_heads (int) – Number of heads for attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)

Run the transformer layer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • c (dynet.Expression) – Context (dimensions cond_dim x l)
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
  • left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers).
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Creates a subcollection for this layer with a custom name

step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)

Runs the transformer for one step. Useful for decoding.

The “state” of the transformer is the list of L-1 inputs and its output is the L th output. This returns a tuple of both the new state (L-1 previous inputs + L th input concatenated) and the L th output

Parameters:
  • x (dynet.Expression) – Input (dimension input_dim)
  • state (dynet.Expression, optional) – Previous “state” (dimensions input_dim x (L-1))
  • c (dynet.Expression) – Context (dimensions cond_dim x l)
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
  • return_att – Defaults to False. [description]
Returns:

[description]

Return type:

[type]

class dynn.layers.StackedCondTransformers(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.combination_layers.Sequential

Multilayer transformer.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_layers (int) – Number of layers
  • input_dim (int) – Hidden dimension (used everywhere)
  • cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
  • n_heads (int) – Number of heads for self attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)

Run the multilayer transformer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • c (list) – list of contexts (one per layer, each of dim cond_dim x L). If this is not a list (but an expression), the same context will be used for each layer.
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
  • left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers).
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Initialize self. See help(type(self)) for accurate signature.

step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)

Runs the transformer for one step. Useful for decoding.

The “state” of the multilayered transformer is the list of n_layers L-1 sized inputs and its output is the output of the last layer. This returns a tuple of both the new state (list of n_layers L sized inputs) and the L th output.

Parameters:
  • x (dynet.Expression) – Input (dimension input_dim)
  • state (dynet.Expression) – Previous “state” (list of n_layers expressions of dimensions input_dim x (L-1))
  • c (dynet.Expression) – Context (dimensions cond_dim x l)
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
  • return_att – Defaults to False. [description]
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple

Submodules
Attention layers
class dynn.layers.attention_layers.BilinearAttention(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Bilinear attention layer.

Here the scores are computed according to

\[\alpha_{ij}=q_i^\intercal A k_j\]

Where \(q_i,k_j\) are the ith query and jth key respectively. If dot_product is set to True this is replaced by:

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Where \(d\) is the dimension of the keys and queries.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • query_dim (int) – Queries dimension
  • key_dim (int) – Keys dimension
  • dot_product (bool, optional) – Compute attention with the dot product only (no weight matrix). The requires that query_dim==key_dim.
  • dropout (float, optional) – Attention dropout (defaults to 0)
  • A_p (dynet.Parameters, optional) – Specify the weight matrix directly.
__call__(query, keys, values, mask=None)

Compute attention scores and return the pooled value.

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.attention_layers.MLPAttention(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Multilayer Perceptron based attention

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • query_dim (int) – Queries dimension
  • key_dim (int) – Keys dimension
  • hidden_dim (int) – Hidden dimension of the MLP
  • activation (function, optional) – MLP activation (defaults to tanh).
  • dropout (float, optional) – Attention dropout (defaults to 0)
__call__(query, keys, values, mask=None)

Compute attention scores and return the pooled value

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.attention_layers.MultiHeadAttention(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Multi headed attention layer.

This functions like dot product attention

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Except the key, query and values are split into multiple heads.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_heads (int) – Number of heads
  • query_dim (int) – Dimension of queries
  • key_dim (int) – Dimension of keys
  • values_dim (int) – Dimension of values
  • hidden_dim (int) – Hidden dimension (must be a multiple of n_heads)
  • out_dim (bool, optional) – Output dimension
  • dropout (float, optional) – Attention dropout (defaults to 0)
  • Wq_p (dynet.Parameters, optional) – Specify the queries projection matrix directly.
  • Wk_p (dynet.Parameters, optional) – Specify the keys projection matrix directly.
  • Wv_p (dynet.Parameters, optional) – Specify the values projection matrix directly.
  • Wo_p (dynet.Parameters, optional) – Specify the output projection matrix directly.
__call__(queries, keys, values, mask=None)

Compute attention weightss and return the pooled value.

This expects the queries, keys and values to have dimensions dq x l, dk x L, dv x L respectively.

Returns both the pooled value and the attention weights (list of weights, one per head). You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)

Creates a subcollection for this layer with a custom name

Base layer
class dynn.layers.base_layers.BaseLayer(name)

Bases: object

Base layer interface

__call__(*args, **kwargs)

Execute forward pass

__init__(name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

init(test=True, update=False)

Initialize the layer before performing computation

For example setup dropout, freeze some parameters, etc…

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

sublayers

Returns all attributes of the layer which are layers themselves

class dynn.layers.base_layers.ParametrizedLayer(pc, name)

Bases: dynn.layers.base_layers.BaseLayer

This is the base class for layers with trainable parameters

When implementing a ParametrizedLayer, use self.add_parameters / self.add_lookup_parameters to add parameters to the layer.

__init__(pc, name)

Creates a subcollection for this layer with a custom name

add_lookup_parameters(name, dim, lookup_param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)

This adds a parameter to this layer’s parametercollection

The layer will have 1 new attribute: self.[name] which will contain the lookup parameter object (which you should use in __call__).

You can provide an existing lookup parameter with the lookup_param argument, in which case this parameter will be reused.

The other arguments are the same as dynet.ParameterCollection.add_lookup_parameters

add_parameters(name, dim, param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)

This adds a parameter to this layer’s ParameterCollection.

The layer will have 1 new attribute: self.[name] which will contain the expression for this parameter (which you should use in __call__).

You can provide an existing parameter with the param argument, in which case this parameter will be reused.

The other arguments are the same as dynet.ParameterCollection.add_parameters

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

lookup_parameters

Return all lookup parameters specific to this layer

parameters

Return all parameters specific to this layer

Combination layers

Perhaps unsurprisingly, combination layers are layers that combine other layers within one layer.

class dynn.layers.combination_layers.Parallel(*layers, dim=0, default_insert_dim=False)

Bases: dynn.layers.base_layers.BaseLayer

A helper class to run layers on the same input and concatenate their outputs

This can be used to create 2d conv layers with multiple kernel sizes by concatenating multiple dynn.layers.Conv2D .

Parameters:
  • layers (list) – A list of dynn.layers.BaseLayer objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. each layer takes the same input and the outputs have the same shape everywhere except along the concatenation dimension)
  • dim (int) – The concatenation dimension
  • default_insert_dim (bool, optional) – Instead of concatenating along an existing dimension, insert a a new dimension at dim and concatenate.
__call__(x, insert_dim=None, **kwargs)

Calls all the layers in succession.

Computes dy.concatenate([layers[0](x)...layers[n-1](x)], d=dim)

Parameters:
  • x (dynet.Expression) – Input expression
  • default_insert_dim (bool, optional) – Override the default
Returns:

Depending on

return_last_only, returns either the last expression or a list of all the layer’s outputs (first to last)

Return type:

dynet.Expression, list

__init__(*layers, dim=0, default_insert_dim=False)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.combination_layers.Sequential(*layers, default_return_last_only=True)

Bases: dynn.layers.base_layers.BaseLayer

A helper class to stack layers into deep networks.

Parameters:
  • layers (list) – A list of dynn.layers.BaseLayer objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. the output of each layer can be fed into the next one)
  • default_return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
__call__(x, return_last_only=None)

Calls all the layers in succession.

Computes layers[n-1](layers[n-2](...layers[0](x)))

Parameters:
  • x (dynet.Expression) – Input expression
  • return_last_only (bool, optional) – Overrides the default
Returns:

Depending on

return_last_only, returns either the last expression or a list of all the layer’s outputs (first to last)

Return type:

dynet.Expression, list

__init__(*layers, default_return_last_only=True)

Initialize self. See help(type(self)) for accurate signature.

Convolution layers
class dynn.layers.convolution_layers.Conv1D(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

1D convolution along the first dimension

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • num_kernels (int) – Number of kernels (essentially the output dimension)
  • kernel_width (int) – Width of the kernels
  • activation (function, optional) – activation function (default: identity)
  • dropout (float, optional) – Dropout rate (default 0)
  • nobias (bool, optional) – Omit the bias (default False)
  • zero_padded (bool, optional) – Default padding behaviour. Pad the input with zeros so that the output has the same length (default True)
  • stride (list, optional) – Default stride along the length (defaults to 1).
__call__(x, stride=None, zero_padded=None)

Forward pass

Parameters:
  • x (dynet.Expression) – Input expression with the shape (length, input_dim)
  • stride (int, optional) – Stride along the temporal dimension
  • zero_padded (bool, optional) – Pad the image with zeros so that the output has the same length (default True)
Returns:

Convolved sequence.

Return type:

dynet.Expression

__init__(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.convolution_layers.Conv2D(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

2D convolution

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • num_channels (int) – Number of channels in the input image
  • num_kernels (int) – Number of kernels (essentially the output dimension)
  • kernel_size (list, optional) – Default kernel size. This is a list of two elements, one per dimension.
  • activation (function, optional) – activation function (default: identity)
  • dropout (float, optional) – Dropout rate (default 0)
  • nobias (bool, optional) – Omit the bias (default False)
  • zero_padded (bool, optional) – Default padding behaviour. Pad the image with zeros so that the output has the same width/height (default True)
  • strides (list, optional) – Default stride along each dimension (list of size 2, defaults to [1, 1]).
__call__(x, strides=None, zero_padded=None)

Forward pass

Parameters:
  • x (dynet.Expression) – Input image (3-d tensor) or matrix.
  • zero_padded (bool, optional) – Pad the image with zeros so that the output has the same width/height. If this is not specified, the default specified in the constructor is used.
  • strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns:

Convolved image.

Return type:

dynet.Expression

__init__(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)

Creates a subcollection for this layer with a custom name

Densely connected layers
class dynn.layers.dense_layers.Affine(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Densely connected layer

\(y=f(Wx+b)\)

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output dimension
  • activation (function, optional) – activation function (default: :py:function:`identity`)
  • dropout (float, optional) – Dropout rate (default 0)
  • nobias (bool, optional) – Omit the bias (default False)
__call__(x)

Forward pass.

Parameters:x (dynet.Expression) – Input expression (a vector)
Returns:\(y=f(Wx+b)\)
Return type:dynet.Expression
__init__(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.dense_layers.GatedLayer(pc, input_dim, output_dim, activation=<built-in function tanh>, dropout=0.0, Wo=None, bo=None, Wg=None, bg=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Gated linear layer:

\(y=(W_ox+b_o)\circ \sigma(W_gx+b_g)\)

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output dimension
  • activation (function, optional) – activation function (default: dynet.tanh)
  • dropout (float, optional) – Dropout rate (default 0)
__call__(x)

Forward pass

Parameters:x (dynet.Expression) – Input expression (a vector)
Returns:\(y=(W_ox+b_o)\circ \sigma(W_gx+b_g)\)
Return type:dynet.Expression
__init__(pc, input_dim, output_dim, activation=<built-in function tanh>, dropout=0.0, Wo=None, bo=None, Wg=None, bg=None)

Creates a subcollection for this layer with a custom name

Embedding layers

For embedding discrete inputs (such as words, characters).

class dynn.layers.embedding_layers.Embeddings(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Layer for embedding elements of a dictionary

Example:

# Dictionary
dic = dynn.data.dictionary.Dictionary(symbols=["a", "b"])
# Parameter collection
pc = dy.ParameterCollection()
# Embedding layer of dimension 10
embed = Embeddings(pc,dic, 10)
# Initialize
dy.renew_cg()
embed.init()
# Return a batch of 2 10-dimensional vectors
vectors = embed([dic.index("b"), dic.index("a")])
Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • dictionary (dynn.data.dictionary.Dictionary) – Mapping from symbols to indices
  • embed_dim (int) – Embedding dimension
  • init (dynet.PyInitializer, optional) – How to initialize the parameters. By default this will initialize to \(\mathcal N(0, \frac{\)}{sqrt{textt{embed_dim}}})`
  • pad_mask (float, optional) – If provided, embeddings of the dictionary.pad_idx index will be masked with this value
__call__(idxs, length_dim=0)

Returns the input’s embedding

If idxs is a list this returns a batch of embeddings. If it’s a numpy array of shape N x b it returns a batch of b N x embed_dim matrices

Parameters:idxs (list,int) – Index or list of indices to embed
Returns:Batch of embeddings
Return type:dynet.Expression
__init__(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)

Creates a subcollection for this layer with a custom name

weights

Numpy array containing the embeddings

The first dimension is the lookup dimension

Functional layers
class dynn.layers.functional_layers.AdditionLayer(layer1, layer2)

Bases: dynn.layers.functional_layers.BinaryOpLayer

Addition of two layers.

This is the layer returned by the addition syntax:

AdditionLayer(layer1, layer2)(x) == layer1(x) + layer2(x)
# is the same thing as
add_1_2 = layer1 + layer2
add_1_2(x) == layer1(x) + layer2(x)
Parameters:
  • layer1 (base_layers.BaseLayer) – First layer
  • layer2 (base_layers.BaseLayer) – Second layer
__init__(layer1, layer2)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.functional_layers.BinaryOpLayer(layer1, layer2, binary_operation)

Bases: dynn.layers.base_layers.BaseLayer

This layer wraps two layers with a binary operation.

BinaryOpLayer(layer1, layer2, op)(x) == op(layer1(x), layer2(x))

This is useful to express the addition of two layers as another layer.

Parameters:
  • layer1 (base_layers.BaseLayer) – First layer
  • layer2 (base_layers.BaseLayer) – Second layer
  • binary_operation (function) – A binary operation on dynet.Expression objects
__call__(*args, **kwargs)

Execute forward pass

__init__(layer1, layer2, binary_operation)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.functional_layers.CmultLayer(layer1, layer2)

Bases: dynn.layers.functional_layers.BinaryOpLayer

Coordinate-wise multiplication of two layers.

CmultLayer(layer1, layer2)(x) == dy.cmult(layer1(x), layer2(x))
Parameters:
  • layer1 (base_layers.BaseLayer) – First layer
  • layer2 (base_layers.BaseLayer) – Second layer
__init__(layer1, layer2)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.functional_layers.ConstantLayer(constant)

Bases: dynn.layers.base_layers.BaseLayer

This is the “zero”-ary layer.

# Takes in numbers
ConstantLayer(5)() == dy.inputTensor([5])
# Or lists
ConstantLayer([5, 6])() == dy.inputTensor([5, 6])
# Or numpy arrays
ConstantLayer(np.ones((10, 12)))() == dy.inputTensor(np.ones((10, 12)))
Parameters:constant (number, np.ndarray) – The constant. It must be a type that can be turned into a dynet.Expression
__call__(*args, **kwargs)

Execute forward pass

__init__(constant)

Initialize self. See help(type(self)) for accurate signature.

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

class dynn.layers.functional_layers.IdentityLayer

Bases: dynn.layers.functional_layers.Lambda

The identity layer does literally nothing

IdentityLayer()(x) == x

It passes its input directly as the output. Still, it can be useful to express more complicated layers like residual connections.

__init__()

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.functional_layers.Lambda(function)

Bases: dynn.layers.base_layers.BaseLayer

This layer applies an arbitrary function to its input.

Lambda(f)(x) == f(x)

This is useful if you want to wrap activation functions as layers. The unary operation should be a function taking dynet.Expression to dynet.Expression.

You shouldn’t use this to stack layers though, op oughtn’t be a layer. If you want to stack layers, use combination_layers.Sequential.

Parameters:
  • layer (base_layers.BaseLayer) – The layer to which output you want to apply the unary operation.
  • binary_operation (function) – A unary operation on dynet.Expression objects
__call__(*args, **kwargs)

Returns function(*args, **kwargs)

__init__(function)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.functional_layers.NegationLayer(layer)

Bases: dynn.layers.functional_layers.UnaryOpLayer

Negates the output of another layer:

NegationLayer(layer)(x) == - layer(x)

It can also be used with the - syntax directly:

negated_layer = - layer
# is the same as
negated_layer = NegationLayer(layer)
Parameters:layer (base_layers.BaseLayer) – The layer to which output you want to apply the negation.
__init__(layer)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.functional_layers.SubstractionLayer(layer1, layer2)

Bases: dynn.layers.functional_layers.BinaryOpLayer

Substraction of two layers.

This is the layer returned by the substraction syntax:

SubstractionLayer(layer1, layer2)(x) == layer1(x) - layer2(x)
# is the same thing as
add_1_2 = layer1 - layer2
add_1_2(x) == layer1(x) - layer2(x)
Parameters:
  • layer1 (base_layers.BaseLayer) – First layer
  • layer2 (base_layers.BaseLayer) – Second layer
__init__(layer1, layer2)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.functional_layers.UnaryOpLayer(layer, unary_operation)

Bases: dynn.layers.base_layers.BaseLayer

This layer wraps a unary operation on another layer.

UnaryOpLayer(layer, op)(x) == op(layer(x))

This is a shorter way of writing:

UnaryOpLayer(layer, op)(x) == Sequential(layer, Lambda(op))

You shouldn’t use this to stack layers though, op oughtn’t be a layer. If you want to stack layers, use combination_layers.Sequential.

Parameters:
  • layer (base_layers.BaseLayer) – The layer to which output you want to apply the unary operation.
  • binary_operation (function) – A unary operation on dynet.Expression objects
__call__(*args, **kwargs)

Returns unary_operation(layer(*args, **kwargs))

__init__(layer, unary_operation)

Initialize self. See help(type(self)) for accurate signature.

Normalization layers
class dynn.layers.normalization_layers.LayerNorm(pc, input_dim, gain=None, bias=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Layer normalization layer:

\(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\)

Parameters:
__call__(x, d=None)

Layer-normalize the input.

Parameters:x (dynet.Expression) – Input expression
Returns:\(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\)
Return type:dynet.Expression
__init__(pc, input_dim, gain=None, bias=None)

Creates a subcollection for this layer with a custom name

Pooling layers
class dynn.layers.pooling_layers.MaxPool1D(kernel_size=None, stride=1)

Bases: dynn.layers.base_layers.BaseLayer

1D max pooling

Parameters:
  • kernel_size (int, optional) – Default kernel size. If this is not specified, the default is to pool over the full sequence (default: None)
  • stride (int, optional) – Default temporal stride (default: 1)
__call__(x, kernel_size=None, stride=None)

Max pooling over the first dimension.

This takes either a list of N d-dimensional vectors or a N x d matrix.

The output will be a matrix of dimension (N - kernel_size + 1) // stride x d

Parameters:
  • x (dynet.Expression) – Input matrix or list of vectors
  • dim (int, optional) – The reduction dimension (default: 0)
  • kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
  • stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns:

Pooled sequence.

Return type:

dynet.Expression

__init__(kernel_size=None, stride=1)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.pooling_layers.MaxPool2D(kernel_size=None, strides=None)

Bases: dynn.layers.base_layers.BaseLayer

2D max pooling.

Parameters:
  • kernel_size (list, optional) – Default kernel size. This is a list of two elements, one per dimension. If either is not specified, the default is to pool over the entire dimension (default: [None, None])
  • strides (list, optional) – Stride along each dimension (list of size 2, defaults to [1, 1]).
__call__(x, kernel_size=None, strides=None)

Max pooling over the first dimension.

If either of the kernel_size elements is not specified, the pooling will be done over the full dimension (and the stride is ignored)

Parameters:
  • x (dynet.Expression) – Input image (3-d tensor) or matrix.
  • kernel_size (list, optional) – Size of the pooling kernel. If this is not specified, the default specified in the constructor is used.
  • strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns:

Pooled sequence.

Return type:

dynet.Expression

__init__(kernel_size=None, strides=None)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.pooling_layers.MeanPool1D(kernel_size=None, stride=1)

Bases: dynn.layers.base_layers.BaseLayer

1D mean pooling.

The stride and kernel size arguments are here for consistency with MaxPooling1D but they are unsupported for now.

Parameters:
  • kernel_size (int, optional) – Default kernel size. If this is not specified, the default is to pool over the full sequence (default: None)
  • stride (int, optional) – Default temporal stride (default: 1)
__call__(x, kernel_size=None, stride=None, lengths=None)

Mean pooling over the first dimension.

This takes either a list of N d-dimensional vectors or a N x d matrix.

The output will be a matrix of dimension (N - kernel_size + 1) // stride x d

Parameters:
  • x (dynet.Expression) – Input matrix or list of vectors
  • dim (int, optional) – The reduction dimension (default: 0)
  • kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
  • stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns:

Pooled sequence.

Return type:

dynet.Expression

__init__(kernel_size=None, stride=1)

Initialize self. See help(type(self)) for accurate signature.

dynn.layers.pooling_layers.max_pool_dim(x, d=0, kernel_width=None, stride=1)

Efficent max pooling on GPU, assuming x is a matrix or a list of vectors

Recurrent layers

The particularity of recurrent is that their output can be fed back as input. This includes common recurrent cells like the Elman RNN or the LSTM.

class dynn.layers.recurrent_layers.ElmanRNN(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer, dynn.layers.recurrent_layers.RecurrentCell

The standard Elman RNN cell:

\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output (hidden) dimension
  • activation (function, optional) – Activation function \(sigma\) (default: dynn.activations.tanh())
  • dropout (float, optional) – Dropout rate (default 0)
__call__(x, h)

Perform the recurrent update.

Parameters:
Returns:

Next recurrent state

\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)

Return type:

dynet.Expression

__init__(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)

Creates a subcollection for this layer with a custom name

get_output(state)

Get the cell’s output from the list of states.

For example this would return h from h,c in the case of the LSTM

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

initial_value(batch_size=1)

Return a vector of dimension hidden_dim filled with zeros

Returns:Zero vector
Return type:dynet.Expression
class dynn.layers.recurrent_layers.LSTM(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer, dynn.layers.recurrent_layers.RecurrentCell

Standard LSTM

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output (hidden) dimension
  • dropout_x (float, optional) – Input dropout rate (default 0)
  • dropout_h (float, optional) – Recurrent dropout rate (default 0)
__call__(x, h, c)

Perform the recurrent update.

Parameters:
Returns:

dynet.Expression for the ext recurrent states

h and c

Return type:

tuple

__init__(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)

Creates a subcollection for this layer with a custom name

get_output(state)

Get the cell’s output from the list of states.

For example this would return h from h,c in the case of the LSTM

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

initial_value(batch_size=1)

Return two vectors of dimension hidden_dim filled with zeros

Returns:two zero vectors for \(h_0\) and \(c_0\)
Return type:tuple
class dynn.layers.recurrent_layers.RecurrentCell(*args, **kwargs)

Bases: object

Base recurrent cell interface

Recurrent cells must provide a default initial value for their recurrent state (eg. all zeros)

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

get_output(state)

Get the cell’s output from the list of states.

For example this would return h from h,c in the case of the LSTM

initial_value(batch_size=1)

Initial value of the recurrent state. Should return a list.

class dynn.layers.recurrent_layers.StackedLSTM(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)

Bases: dynn.layers.recurrent_layers.StackedRecurrentCells

Stacked LSTMs

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • num_layers (int) – Number of layers
  • input_dim (int) – Input dimension
  • output_dim (int) – Output (hidden) dimension
  • dropout_x (float, optional) – Input dropout rate (default 0)
  • dropout_h (float, optional) – Recurrent dropout rate (default 0)
__init__(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.recurrent_layers.StackedRecurrentCells(*cells)

Bases: dynn.layers.base_layers.BaseLayer, dynn.layers.recurrent_layers.RecurrentCell

This implements a stack of recurrent layers

The recurrent state of the resulting cell is the list of the states of all the sub-cells. For example for a stack of 2 LSTM cells the resulting state will be [h_1, c_1, h_2, c_2]

Example:

# Parameter collection
pc = dy.ParameterCollection()
# Stacked recurrent cell
stacked_cell = StackedRecurrentCells(
    LSTM(pc, 10, 15),
    LSTM(pc, 15, 5),
    ElmanRNN(pc, 5, 20),
)
# Inputs
dy.renew_cg()
x = dy.random_uniform(10, batch_size=5)
# Initialize layer
stacked_cell.init(test=False)
# Initial state: [h_1, c_1, h_2, c_2, h_3] of sizes [15, 15, 5, 5, 20]
init_state = stacked_cell.initial_value()
# Run the cell on the input.
new_state = stacked_cell(x, *init_state)
# Get the final output (h_3 of size 20)
h = stacked_cell.get_output(new_state)
__call__(x, *state)

Compute the cell’s output from the list of states and an input expression

Parameters:x (dynet.Expression) – Input vector
Returns:new recurrent state
Return type:list
__init__(*cells)

Initialize self. See help(type(self)) for accurate signature.

get_output(state)

Get the output of the last cell

initial_value(batch_size=1)

Initial value of the recurrent state.

Residual layers
class dynn.layers.residual_layers.Residual(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)

Bases: dynn.layers.base_layers.BaseLayer

Adds residual connections to a layer

__call__(*args, **kwargs)

Execute forward pass

__init__(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)

Initialize self. See help(type(self)) for accurate signature.

Sequence transduction layers

Sequence transduction layers take in a sequence of expression and runs one layer over each input. They can be feed-forward (each input is treated independently, eg. Transduction) or recurrent (the output at one step depends on the output at the previous step, eg. Unidirectional).

class dynn.layers.transduction_layers.Bidirectional(forward_cell, backward_cell, output_only=False)

Bases: dynn.layers.base_layers.BaseLayer

Bidirectional transduction layer

This layer runs a recurrent cell on in each direction on a sequence of inputs and produces resulting the sequence of recurrent states.

Example:

# Parameter collection
pc = dy.ParameterCollection()
# LSTM cell
fwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10)
bwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10)
# Transduction layer
bilstm = dynn.layers.Bidirectional(fwd_lstm_cell, bwd_lstm_cell)
# Inputs
dy.renew_cg()
xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)]
# Initialize layer
bilstm.init(test=False)
# Transduce forward
fwd_states, bwd_states = bilstm(xs)
# Retrieve last h
fwd_h_final = fwd_states[-1][0]
# For the backward LSTM the final state is at
# the beginning of the sequence (assuming left padding)
bwd_h_final = fwd_states[0][0]
Parameters:
  • forward_cell (recurrent_layers.RecurrentCell) – The recurrent cell to use for forward transduction
  • backward_cell (recurrent_layers.RecurrentCell) – The recurrent cell to use for backward transduction
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
__call__(input_sequence, lengths=None, left_padded=True, output_only=None, fwd_initial_state=None, bwd_initial_state=None)

Transduces the sequence in both directions

The output is a tuple forward_states, backward_states where each forward_states is a list of the output states of the forward recurrent cell at each step (and backward_states for the backward cell). For instance in a BiLSTM the output is [(fwd_h1, fwd_c1), ...], [(bwd_h1, bwd_c1), ...]

This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.

Parameters:
  • input_sequence (list) – Input as a list of dynet.Expression objects
  • lengths (list, optional) – If the expressions in the sequence are batched, but have different lengths, this should contain a list of the sequence lengths (default: None)
  • left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
  • fwd_initial_state (dy.Expression, optional) – Overridies the default initial state of the forward recurrent cell.
  • bwd_initial_state (dy.Expression, optional) – Overridies the default initial state of the backward recurrent cell.
Returns:

List of forward and backward recurrent states

(depends on the recurrent layer)

Return type:

tuple

__init__(forward_cell, backward_cell, output_only=False)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.transduction_layers.SequenceMaskingLayer(mask_value=0.0, left_padded=True)

Bases: dynn.layers.base_layers.BaseLayer

Masks a sequence of batched expressions according to each batch element’s length

This layer applies a mask value to the elements of a sequence of batched expressions which correspond to padding tokens. Typically if you batch a sequence of size 2 and a sequence of size 3 you will pad the first sequence to obtain a list of 3 expresions of batch size 2. This layer will mask the batch element of the last expression corresponding to the padding token in the 1st sequence.

This is useful when doing attention or max-pooling on padded sequences when you want to mask padding tokens with \(-\infty\) to ensure that they are ignored.

Parameters:
  • mask_value (float, optional) – The value to use for masking
  • left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
__call__(input_sequence, lengths, left_padded=None)

Runs the layer over the input

The output is a list of the output of the layer at each step

Parameters:
  • input_sequence (list) – Input as a list of dynet.Expression objects
  • lengths (list, optional) – If the expressions in the sequence are batched, but have different lengths, this should contain a list of the sequence lengths (default: None)
  • left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded. Overwrites the value given in the constructor.
Returns:

List of masked expression

Return type:

list

__init__(mask_value=0.0, left_padded=True)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.transduction_layers.Transduction(layer)

Bases: dynn.layers.base_layers.BaseLayer

Feed forward transduction layer

This layer runs one cell on a sequence of inputs and returns the list of outputs. Calling it is equivalent to calling:

[layer(x) for x in input_sequence]
Parameters:cell (base_layers.BaseLayer) – The recurrent cell to use for transduction
__call__(input_sequence)

Runs the layer over the input

The output is a list of the output of the layer at each step

Parameters:input_sequence (list) – Input as a list of dynet.Expression objects
Returns:List of recurrent states (depends on the recurrent layer)
Return type:list
__init__(layer)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.transduction_layers.Unidirectional(cell, output_only=False)

Bases: dynn.layers.base_layers.BaseLayer

Unidirectional transduction layer

This layer runs a recurrent cell on a sequence of inputs and produces resulting the sequence of recurrent states.

Example:

# LSTM cell
lstm_cell = dynn.layers.LSTM(dy.ParameterCollection(), 10, 10)
# Transduction layer
lstm = dynn.layers.Unidirectional(lstm_cell)
# Inputs
dy.renew_cg()
xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)]
# Initialize layer
lstm.init(test=False)
# Transduce forward
states = lstm(xs)
# Retrieve last h
h_final = states[-1][0]
Parameters:
  • cell (recurrent_layers.RecurrentCell) – The recurrent cell to use for transduction
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
__call__(input_sequence, backward=False, lengths=None, left_padded=True, output_only=None, initial_state=None)

Transduces the sequence using the recurrent cell.

The output is a list of the output states at each step. For instance in an LSTM the output is (h1, c1), (h2, c2), ...

This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.

Parameters:
  • input_sequence (list) – Input as a list of dynet.Expression objects
  • backward (bool, optional) – If this is True the sequence will be processed from left to right. The output sequence will still be in the same order as the input sequence though.
  • lengths (list, optional) – If the expressions in the sequence are batched, but have different lengths, this should contain a list of the sequence lengths (default: None)
  • left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
  • initial_state (dy.Expression, optional) – Overridies the default initial state of the recurrent cell
Returns:

List of recurrent states (depends on the recurrent layer)

Return type:

list

__init__(cell, output_only=False)

Initialize self. See help(type(self)) for accurate signature.

Transformer layers
class dynn.layers.transformer_layers.CondTransformer(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.base_layers.ParametrizedLayer

Conditional transformer layer.

As described in Vaswani et al. (2017) This is the “decoder” side of the transformer, ie self attention + attention to context.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Hidden dimension (used everywhere)
  • cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
  • n_heads (int) – Number of heads for attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)

Run the transformer layer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • c (dynet.Expression) – Context (dimensions cond_dim x l)
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
  • left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers).
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Creates a subcollection for this layer with a custom name

step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)

Runs the transformer for one step. Useful for decoding.

The “state” of the transformer is the list of L-1 inputs and its output is the L th output. This returns a tuple of both the new state (L-1 previous inputs + L th input concatenated) and the L th output

Parameters:
  • x (dynet.Expression) – Input (dimension input_dim)
  • state (dynet.Expression, optional) – Previous “state” (dimensions input_dim x (L-1))
  • c (dynet.Expression) – Context (dimensions cond_dim x l)
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
  • return_att – Defaults to False. [description]
Returns:

[description]

Return type:

[type]

class dynn.layers.transformer_layers.StackedCondTransformers(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.combination_layers.Sequential

Multilayer transformer.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_layers (int) – Number of layers
  • input_dim (int) – Hidden dimension (used everywhere)
  • cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
  • n_heads (int) – Number of heads for self attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)

Run the multilayer transformer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • c (list) – list of contexts (one per layer, each of dim cond_dim x L). If this is not a list (but an expression), the same context will be used for each layer.
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
  • left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers).
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Initialize self. See help(type(self)) for accurate signature.

step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)

Runs the transformer for one step. Useful for decoding.

The “state” of the multilayered transformer is the list of n_layers L-1 sized inputs and its output is the output of the last layer. This returns a tuple of both the new state (list of n_layers L sized inputs) and the L th output.

Parameters:
  • x (dynet.Expression) – Input (dimension input_dim)
  • state (dynet.Expression) – Previous “state” (list of n_layers expressions of dimensions input_dim x (L-1))
  • c (dynet.Expression) – Context (dimensions cond_dim x l)
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
  • return_att – Defaults to False. [description]
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple

class dynn.layers.transformer_layers.StackedTransformers(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.combination_layers.Sequential

Multilayer transformer.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_layers (int) – Number of layers
  • input_dim (int) – Hidden dimension (used everywhere)
  • n_heads (int) – Number of heads for self attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False, return_last_only=True)

Run the multilayer transformer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
  • left_aligned (bool, optional) – Defaults to True. USed for masking
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers)
  • return_att (bool, optional) – Defaults to False. Return the self attention weights
  • return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.transformer_layers.Transformer(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.base_layers.ParametrizedLayer

Transformer layer.

As described in Vaswani et al. (2017) This is the “encoder” side of the transformer, ie self attention only.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Hidden dimension (used everywhere)
  • n_heads (int) – Number of heads for self attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False)

Run the transformer layer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
  • left_aligned (bool, optional) – Defaults to True. Used for masking
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers)
  • return_att (bool, optional) – Defaults to False. Return the self attention weights
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Creates a subcollection for this layer with a custom name

Submodules

Activation functions

Common activation functions for neural networks.

Most of those are wrappers around standard dynet operations (eg. rectify -> relu)

dynn.activations.identity(x)

The identity function

\(y=x\)

Parameters:x (dynet.Expression) – Input expression
Returns:\(x\)
Return type:dynet.Expression
dynn.activations.relu(x)

The REctified Linear Unit

\(y=\max(0,x)\)

Parameters:x (dynet.Expression) – Input expression
Returns:\(\max(0,x)\)
Return type:dynet.Expression
dynn.activations.sigmoid(x)

The sigmoid function

\(y=\frac{1}{1+e^{-x}}\)

Parameters:x (dynet.Expression) – Input expression
Returns:\(\frac{1}{1+e^{-x}}\)
Return type:dynet.Expression
dynn.activations.tanh(x)

The hyperbolic tangent function

\(y=\tanh(x)\)

Parameters:x (dynet.Expression) – Input expression
Returns:\(\tanh(x)\)
Return type:dynet.Expression
Command line utilities
dynn.command_line.add_dynet_args(parser, new_group=True)

Adds dynet command line arguments to an argparse.ArgumentParser

You can apply this to your argument parser so that it doesn’t throw an error when you add command line arguments for dynet. For a description of the arguments available for dynet, see the official documentation

Parameters:
  • parser (argparse.ArgumentParser) – Your argument parser.
  • new_group (bool, optional) – Add the arguments in a specific argument group (default: True)
Input/output functions

These functions help writing to and reading from files

dynn.io.load(filename, ignore_invalid_names=False)

Load a ParameterCollection from a .npz file.

This will recover the subcollection structure.

Parameters:
  • filename (str) – File to load from.
  • ignore_invalid_names (bool, optional) – Ignore elements with invalid parameter names in the .npz without raising an exception. This is useful if for some reason the .npz contains other arrays.
Returns:

Loaded ParameterCollection

Return type:

dynet.ParameterCollection

dynn.io.loadtxt(filename, encoding='utf-8')

Read text from a file

dynn.io.populate(pc, filename, ignore_shape_mismatch=False)

Populate a ParameterCollection from a .npz file

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to populate.
  • filename (str) – File to populate from.
  • ignore_shape_mismatch (bool, optional) – Silently ignore shape mismatch between the parameter and the value in the .npz file (just don’t load the parameter and move on)
dynn.io.save(pc, filename, compressed=True)

Save a ParameterCollection as a .npz archive.

Each parameter is an entry in the archive and its name describes the subcollection it lives in.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to save.
  • filename (str) – Target filename. The .npz extension will be appended to the file name if it is not already there.
  • compressed (bool, optional) – Compressed .npz (slower but smaller on disk)
dynn.io.savetxt(filename, txt, encoding='utf-8')

Save text to a file

Operations

This extends the base dynet library with useful operations.

dynn.operations.nll_softmax(logit, y)

This is the same as dy.pickneglogsoftmax.

The main difference is the shorter name and transparent handling of batches. It computes:

\[-\texttt{logit[y]}+\log(\sum_{\texttt{c'}}e^{logit[c']})\]

(softmax then negative log likelihood of y)

Parameters:
dynn.operations.seq_mask(size, lengths, base_val=1, mask_val=0, left_aligned=True)

Returns a mask for a batch sequences of different lengths.

This will return a (size,), len(lengths) shaped expression where the i th element of batch b is base_val iff i<=lengths[b] (and mask_val otherwise).

For example, if size is 4 and lengths is [1,2,4] then the returned mask will be:

(here each row is a batch element)

Parameters:
  • size (int) – Max size of the sequence (must be >=max(lengths))
  • lengths (list) – List of lengths
  • base_val (int, optional) – Value of the mask for non-masked indices (typically 1 for multiplicative masks and 0 for additive masks). Defaults to 1.
  • mask_val (int, optional) – Value of the mask for masked indices (typically 0 for multiplicative masks and -inf for additive masks). Defaults to 0.
  • left_aligned (bool, optional) – Defaults to True.
Returns:

dynet.Expression: Mask expression

dynn.operations.squeeze(x, d=0)

Removes a dimension of size 1 at the given position.

Example:

# (1, 20)
x = dy.zeros((1, 20))
# (20,)
squeeze(x, 0)
# (20, 1)
x = dy.zeros((20, 1))
# (20,)
squeeze(x, 1)
# (20,)
squeeze(x, -1)
dynn.operations.stack(xs, d=0)

Like concatenated but inserts a dimension

d=-1 to insert a dimension at the last position

Parameters:
  • xs (list) – List of expressions with the same dimensions
  • d (int, optional) – Position of the dimension ypu want to insert
dynn.operations.unsqueeze(x, d=0)

Insert a dimension of size 1 at the given position

Example:

# (10, 20)
x = dy.zeros((10, 20))
# (1, 10, 20)
unsqueeze(x, 0)
# (10, 20, 1)
unsqueeze(x, -1)
Parameter initialization

Some of those are just less verbose versions of dynet’s PyInitializer s

dynn.parameter_initialization.NormalInit(mean=0, std=1)

Gaussian initialization

Parameters:
  • mean (int, optional) – Mean (default: 0.0)
  • std (int, optional) – Standard deviation (\(\neq\) variance) (default: 1.0)
Returns:

dy.NormalInitializer(mean, sqrt(std))

Return type:

dynet.PyInitializer

dynn.parameter_initialization.OneInit()

Initialize with \(1\)

Returns:dy.ConstInitializer(1)
Return type:dynet.PyInitializer
dynn.parameter_initialization.UniformInit(scale=1.0)

Uniform initialization between -scale and scale

Parameters:scale (float) – Scale of the distribution
Returns:dy.UniformInitializer(scale)
Return type:dynet.PyInitializer
dynn.parameter_initialization.ZeroInit()

Initialize with \(0\)

Returns:dy.ConstInitializer(0)
Return type:dynet.PyInitializer
Training helper functions and classes

Adds new optimizers and LR schedules to dynet.

dynn.training.inverse_sqrt_schedule(warmup, lr0)

Inverse square root learning rate schedule

At step \(t\) , the learning rate has value

\[\texttt{lr}\times \min(1 {\sqrt{t}}, \sqrt{\frac {t} {\texttt{warmup}^3})\]
Parameters:
  • warmup (int) – Number of warmup steps
  • lr0 (float) – Initial learning rate
Utility functions
dynn.util.conditional_dropout(x, dropout_rate, flag)

This helper function applies dropout only if the flag is set to True and the dropout_rate is positive.

Parameters:
  • x (dynet.Expression) – Input expression
  • dropout_rate (float) – Dropout rate
  • flag (bool) – Setting this to false ensures that dropout is never applied (for testing for example)
dynn.util.image_to_matrix(M)

Transforms an ‘image’ with one channel (d1, d2, 1) into a matrix (d1, d2)

dynn.util.list_to_matrix(l)

Transforms a list of N vectors of dimension d into a (N, d) matrix

dynn.util.mask_batches(x, mask, value=0.0)

Apply a mask to the batch dimension

Parameters:
dynn.util.matrix_to_image(M)

Transforms a matrix (d1, d2) into an ‘image’ with one channel (d1, d2, 1)

dynn.util.num_params(pc, params=True, lookup_params=True)

Number of parameters in a given ParameterCollection

dynn.util.sin_embeddings(length, dim, transposed=False)

Returns sinusoidal position encodings.

As described in Vaswani et al. (2017)

Specifically this return a length x dim matrix \(PE\) such that \(PE[p, 2i]=\sin(\frac{p}/{1000^{\frac{2i}{dim}}})\) and \(PE[p, 2i+1]=\cos(\frac{p}/{1000^{\frac{2i}{dim}}})\)

Parameters:
  • length (int) – Length
  • dim (int) – Dimension of the embeddings

Indices and tables