dynn.layers package

Layers

Layers are the standard unit of neural models in DyNN. Layers are typically used like this:

# Instantiate layer
layer = Layer(parameter_collection, *args, **kwargs)
# [...]
# Renew computation graph
dy.renew_cg()
# Initialize layer
layer.init(*args, **kwargs)
# Apply layer forward pass
y = layer(x)
class dynn.layers.BaseLayer(name)

Bases: object

Base layer interface

__call__(*args, **kwargs)

Execute forward pass

__init__(name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

init(test=True, update=False)

Initialize the layer before performing computation

For example setup dropout, freeze some parameters, etc…

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

sublayers

Returns all attributes of the layer which are layers themselves

class dynn.layers.ParametrizedLayer(pc, name)

Bases: dynn.layers.base_layers.BaseLayer

This is the base class for layers with trainable parameters

When implementing a ParametrizedLayer, use self.add_parameters / self.add_lookup_parameters to add parameters to the layer.

__init__(pc, name)

Creates a subcollection for this layer with a custom name

add_lookup_parameters(name, dim, lookup_param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)

This adds a parameter to this layer’s parametercollection

The layer will have 1 new attribute: self.[name] which will contain the lookup parameter object (which you should use in __call__).

You can provide an existing lookup parameter with the lookup_param argument, in which case this parameter will be reused.

The other arguments are the same as dynet.ParameterCollection.add_lookup_parameters

add_parameters(name, dim, param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)

This adds a parameter to this layer’s ParameterCollection.

The layer will have 1 new attribute: self.[name] which will contain the expression for this parameter (which you should use in __call__).

You can provide an existing parameter with the param argument, in which case this parameter will be reused.

The other arguments are the same as dynet.ParameterCollection.add_parameters

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

lookup_parameters

Return all lookup parameters specific to this layer

parameters

Return all parameters specific to this layer

class dynn.layers.Lambda(function)

Bases: dynn.layers.base_layers.BaseLayer

This layer applies an arbitrary function to its input.

Lambda(f)(x) == f(x)

This is useful if you want to wrap activation functions as layers. The unary operation should be a function taking dynet.Expression to dynet.Expression.

You shouldn’t use this to stack layers though, op oughtn’t be a layer. If you want to stack layers, use combination_layers.Sequential.

Parameters:
__call__(*args, **kwargs)

Returns function(*args, **kwargs)

__init__(function)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Affine(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Densely connected layer

\(y=f(Wx+b)\)

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output dimension
  • activation (function, optional) – activation function (default: :py:function:`identity`)
  • dropout (float, optional) – Dropout rate (default 0)
  • nobias (bool, optional) – Omit the bias (default False)
__call__(x)

Forward pass.

Parameters:x (dynet.Expression) – Input expression (a vector)
Returns:\(y=f(Wx+b)\)
Return type:dynet.Expression
__init__(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Embeddings(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Layer for embedding elements of a dictionary

Example:

# Dictionary
dic = dynn.data.dictionary.Dictionary(symbols=["a", "b"])
# Parameter collection
pc = dy.ParameterCollection()
# Embedding layer of dimension 10
embed = Embeddings(pc,dic, 10)
# Initialize
dy.renew_cg()
embed.init()
# Return a batch of 2 10-dimensional vectors
vectors = embed([dic.index("b"), dic.index("a")])
Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • dictionary (dynn.data.dictionary.Dictionary) – Mapping from symbols to indices
  • embed_dim (int) – Embedding dimension
  • init (dynet.PyInitializer, optional) – How to initialize the parameters. By default this will initialize to \(\mathcal N(0, \frac{\)}{sqrt{textt{embed_dim}}})`
  • pad_mask (float, optional) – If provided, embeddings of the dictionary.pad_idx index will be masked with this value
__call__(idxs, length_dim=0)

Returns the input’s embedding

If idxs is a list this returns a batch of embeddings. If it’s a numpy array of shape N x b it returns a batch of b N x embed_dim matrices

Parameters:idxs (list,int) – Index or list of indices to embed
Returns:Batch of embeddings
Return type:dynet.Expression
__init__(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)

Creates a subcollection for this layer with a custom name

weights

Numpy array containing the embeddings

The first dimension is the lookup dimension

class dynn.layers.Residual(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)

Bases: dynn.layers.base_layers.BaseLayer

Adds residual connections to a layer

__call__(*args, **kwargs)

Execute forward pass

__init__(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.RecurrentCell(*args, **kwargs)

Bases: object

Base recurrent cell interface

Recurrent cells must provide a default initial value for their recurrent state (eg. all zeros)

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

get_output(state)

Get the cell’s output from the list of states.

For example this would return h from h,c in the case of the LSTM

initial_value(batch_size=1)

Initial value of the recurrent state. Should return a list.

class dynn.layers.StackedRecurrentCells(*cells)

Bases: dynn.layers.base_layers.BaseLayer, dynn.layers.recurrent_layers.RecurrentCell

This implements a stack of recurrent layers

The recurrent state of the resulting cell is the list of the states of all the sub-cells. For example for a stack of 2 LSTM cells the resulting state will be [h_1, c_1, h_2, c_2]

Example:

# Parameter collection
pc = dy.ParameterCollection()
# Stacked recurrent cell
stacked_cell = StackedRecurrentCells(
    LSTM(pc, 10, 15),
    LSTM(pc, 15, 5),
    ElmanRNN(pc, 5, 20),
)
# Inputs
dy.renew_cg()
x = dy.random_uniform(10, batch_size=5)
# Initialize layer
stacked_cell.init(test=False)
# Initial state: [h_1, c_1, h_2, c_2, h_3] of sizes [15, 15, 5, 5, 20]
init_state = stacked_cell.initial_value()
# Run the cell on the input.
new_state = stacked_cell(x, *init_state)
# Get the final output (h_3 of size 20)
h = stacked_cell.get_output(new_state)
__call__(x, *state)

Compute the cell’s output from the list of states and an input expression

Parameters:x (dynet.Expression) – Input vector
Returns:new recurrent state
Return type:list
__init__(*cells)

Initialize self. See help(type(self)) for accurate signature.

get_output(state)

Get the output of the last cell

initial_value(batch_size=1)

Initial value of the recurrent state.

class dynn.layers.ElmanRNN(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer, dynn.layers.recurrent_layers.RecurrentCell

The standard Elman RNN cell:

\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output (hidden) dimension
  • activation (function, optional) – Activation function \(sigma\) (default: dynn.activations.tanh())
  • dropout (float, optional) – Dropout rate (default 0)
__call__(x, h)

Perform the recurrent update.

Parameters:
Returns:

Next recurrent state

\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)

Return type:

dynet.Expression

__init__(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)

Creates a subcollection for this layer with a custom name

get_output(state)

Get the cell’s output from the list of states.

For example this would return h from h,c in the case of the LSTM

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

initial_value(batch_size=1)

Return a vector of dimension hidden_dim filled with zeros

Returns:Zero vector
Return type:dynet.Expression
class dynn.layers.LSTM(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer, dynn.layers.recurrent_layers.RecurrentCell

Standard LSTM

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • output_dim (int) – Output (hidden) dimension
  • dropout_x (float, optional) – Input dropout rate (default 0)
  • dropout_h (float, optional) – Recurrent dropout rate (default 0)
__call__(x, h, c)

Perform the recurrent update.

Parameters:
Returns:

dynet.Expression for the ext recurrent states

h and c

Return type:

tuple

__init__(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)

Creates a subcollection for this layer with a custom name

get_output(state)

Get the cell’s output from the list of states.

For example this would return h from h,c in the case of the LSTM

init_layer(test=True, update=False)

Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer

initial_value(batch_size=1)

Return two vectors of dimension hidden_dim filled with zeros

Returns:two zero vectors for \(h_0\) and \(c_0\)
Return type:tuple
class dynn.layers.StackedLSTM(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)

Bases: dynn.layers.recurrent_layers.StackedRecurrentCells

Stacked LSTMs

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • num_layers (int) – Number of layers
  • input_dim (int) – Input dimension
  • output_dim (int) – Output (hidden) dimension
  • dropout_x (float, optional) – Input dropout rate (default 0)
  • dropout_h (float, optional) – Recurrent dropout rate (default 0)
__init__(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Transduction(layer)

Bases: dynn.layers.base_layers.BaseLayer

Feed forward transduction layer

This layer runs one cell on a sequence of inputs and returns the list of outputs. Calling it is equivalent to calling:

[layer(x) for x in input_sequence]
Parameters:cell (base_layers.BaseLayer) – The recurrent cell to use for transduction
__call__(input_sequence)

Runs the layer over the input

The output is a list of the output of the layer at each step

Parameters:input_sequence (list) – Input as a list of dynet.Expression objects
Returns:List of recurrent states (depends on the recurrent layer)
Return type:list
__init__(layer)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Unidirectional(cell, output_only=False)

Bases: dynn.layers.base_layers.BaseLayer

Unidirectional transduction layer

This layer runs a recurrent cell on a sequence of inputs and produces resulting the sequence of recurrent states.

Example:

# LSTM cell
lstm_cell = dynn.layers.LSTM(dy.ParameterCollection(), 10, 10)
# Transduction layer
lstm = dynn.layers.Unidirectional(lstm_cell)
# Inputs
dy.renew_cg()
xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)]
# Initialize layer
lstm.init(test=False)
# Transduce forward
states = lstm(xs)
# Retrieve last h
h_final = states[-1][0]
Parameters:
  • cell (recurrent_layers.RecurrentCell) – The recurrent cell to use for transduction
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
__call__(input_sequence, backward=False, lengths=None, left_padded=True, output_only=None, initial_state=None)

Transduces the sequence using the recurrent cell.

The output is a list of the output states at each step. For instance in an LSTM the output is (h1, c1), (h2, c2), ...

This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.

Parameters:
  • input_sequence (list) – Input as a list of dynet.Expression objects
  • backward (bool, optional) – If this is True the sequence will be processed from left to right. The output sequence will still be in the same order as the input sequence though.
  • lengths (list, optional) – If the expressions in the sequence are batched, but have different lengths, this should contain a list of the sequence lengths (default: None)
  • left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
  • initial_state (dy.Expression, optional) – Overridies the default initial state of the recurrent cell
Returns:

List of recurrent states (depends on the recurrent layer)

Return type:

list

__init__(cell, output_only=False)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Bidirectional(forward_cell, backward_cell, output_only=False)

Bases: dynn.layers.base_layers.BaseLayer

Bidirectional transduction layer

This layer runs a recurrent cell on in each direction on a sequence of inputs and produces resulting the sequence of recurrent states.

Example:

# Parameter collection
pc = dy.ParameterCollection()
# LSTM cell
fwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10)
bwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10)
# Transduction layer
bilstm = dynn.layers.Bidirectional(fwd_lstm_cell, bwd_lstm_cell)
# Inputs
dy.renew_cg()
xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)]
# Initialize layer
bilstm.init(test=False)
# Transduce forward
fwd_states, bwd_states = bilstm(xs)
# Retrieve last h
fwd_h_final = fwd_states[-1][0]
# For the backward LSTM the final state is at
# the beginning of the sequence (assuming left padding)
bwd_h_final = fwd_states[0][0]
Parameters:
__call__(input_sequence, lengths=None, left_padded=True, output_only=None, fwd_initial_state=None, bwd_initial_state=None)

Transduces the sequence in both directions

The output is a tuple forward_states, backward_states where each forward_states is a list of the output states of the forward recurrent cell at each step (and backward_states for the backward cell). For instance in a BiLSTM the output is [(fwd_h1, fwd_c1), ...], [(bwd_h1, bwd_c1), ...]

This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.

Parameters:
  • input_sequence (list) – Input as a list of dynet.Expression objects
  • lengths (list, optional) – If the expressions in the sequence are batched, but have different lengths, this should contain a list of the sequence lengths (default: None)
  • left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
  • output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
  • fwd_initial_state (dy.Expression, optional) – Overridies the default initial state of the forward recurrent cell.
  • bwd_initial_state (dy.Expression, optional) – Overridies the default initial state of the backward recurrent cell.
Returns:

List of forward and backward recurrent states

(depends on the recurrent layer)

Return type:

tuple

__init__(forward_cell, backward_cell, output_only=False)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.MaxPool1D(kernel_size=None, stride=1)

Bases: dynn.layers.base_layers.BaseLayer

1D max pooling

Parameters:
  • kernel_size (int, optional) – Default kernel size. If this is not specified, the default is to pool over the full sequence (default: None)
  • stride (int, optional) – Default temporal stride (default: 1)
__call__(x, kernel_size=None, stride=None)

Max pooling over the first dimension.

This takes either a list of N d-dimensional vectors or a N x d matrix.

The output will be a matrix of dimension (N - kernel_size + 1) // stride x d

Parameters:
  • x (dynet.Expression) – Input matrix or list of vectors
  • dim (int, optional) – The reduction dimension (default: 0)
  • kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
  • stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns:

Pooled sequence.

Return type:

dynet.Expression

__init__(kernel_size=None, stride=1)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.MaxPool2D(kernel_size=None, strides=None)

Bases: dynn.layers.base_layers.BaseLayer

2D max pooling.

Parameters:
  • kernel_size (list, optional) – Default kernel size. This is a list of two elements, one per dimension. If either is not specified, the default is to pool over the entire dimension (default: [None, None])
  • strides (list, optional) – Stride along each dimension (list of size 2, defaults to [1, 1]).
__call__(x, kernel_size=None, strides=None)

Max pooling over the first dimension.

If either of the kernel_size elements is not specified, the pooling will be done over the full dimension (and the stride is ignored)

Parameters:
  • x (dynet.Expression) – Input image (3-d tensor) or matrix.
  • kernel_size (list, optional) – Size of the pooling kernel. If this is not specified, the default specified in the constructor is used.
  • strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns:

Pooled sequence.

Return type:

dynet.Expression

__init__(kernel_size=None, strides=None)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.MeanPool1D(kernel_size=None, stride=1)

Bases: dynn.layers.base_layers.BaseLayer

1D mean pooling.

The stride and kernel size arguments are here for consistency with MaxPooling1D but they are unsupported for now.

Parameters:
  • kernel_size (int, optional) – Default kernel size. If this is not specified, the default is to pool over the full sequence (default: None)
  • stride (int, optional) – Default temporal stride (default: 1)
__call__(x, kernel_size=None, stride=None, lengths=None)

Mean pooling over the first dimension.

This takes either a list of N d-dimensional vectors or a N x d matrix.

The output will be a matrix of dimension (N - kernel_size + 1) // stride x d

Parameters:
  • x (dynet.Expression) – Input matrix or list of vectors
  • dim (int, optional) – The reduction dimension (default: 0)
  • kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
  • stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns:

Pooled sequence.

Return type:

dynet.Expression

__init__(kernel_size=None, stride=1)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.MLPAttention(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Multilayer Perceptron based attention

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • query_dim (int) – Queries dimension
  • key_dim (int) – Keys dimension
  • hidden_dim (int) – Hidden dimension of the MLP
  • activation (function, optional) – MLP activation (defaults to tanh).
  • dropout (float, optional) – Attention dropout (defaults to 0)
__call__(query, keys, values, mask=None)

Compute attention scores and return the pooled value

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.BilinearAttention(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Bilinear attention layer.

Here the scores are computed according to

\[\alpha_{ij}=q_i^\intercal A k_j\]

Where \(q_i,k_j\) are the ith query and jth key respectively. If dot_product is set to True this is replaced by:

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Where \(d\) is the dimension of the keys and queries.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • query_dim (int) – Queries dimension
  • key_dim (int) – Keys dimension
  • dot_product (bool, optional) – Compute attention with the dot product only (no weight matrix). The requires that query_dim==key_dim.
  • dropout (float, optional) – Attention dropout (defaults to 0)
  • A_p (dynet.Parameters, optional) – Specify the weight matrix directly.
__call__(query, keys, values, mask=None)

Compute attention scores and return the pooled value.

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.MultiHeadAttention(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Multi headed attention layer.

This functions like dot product attention

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Except the key, query and values are split into multiple heads.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_heads (int) – Number of heads
  • query_dim (int) – Dimension of queries
  • key_dim (int) – Dimension of keys
  • values_dim (int) – Dimension of values
  • hidden_dim (int) – Hidden dimension (must be a multiple of n_heads)
  • out_dim (bool, optional) – Output dimension
  • dropout (float, optional) – Attention dropout (defaults to 0)
  • Wq_p (dynet.Parameters, optional) – Specify the queries projection matrix directly.
  • Wk_p (dynet.Parameters, optional) – Specify the keys projection matrix directly.
  • Wv_p (dynet.Parameters, optional) – Specify the values projection matrix directly.
  • Wo_p (dynet.Parameters, optional) – Specify the output projection matrix directly.
__call__(queries, keys, values, mask=None)

Compute attention weightss and return the pooled value.

This expects the queries, keys and values to have dimensions dq x l, dk x L, dv x L respectively.

Returns both the pooled value and the attention weights (list of weights, one per head). You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Conv1D(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

1D convolution along the first dimension

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Input dimension
  • num_kernels (int) – Number of kernels (essentially the output dimension)
  • kernel_width (int) – Width of the kernels
  • activation (function, optional) – activation function (default: identity)
  • dropout (float, optional) – Dropout rate (default 0)
  • nobias (bool, optional) – Omit the bias (default False)
  • zero_padded (bool, optional) – Default padding behaviour. Pad the input with zeros so that the output has the same length (default True)
  • stride (list, optional) – Default stride along the length (defaults to 1).
__call__(x, stride=None, zero_padded=None)

Forward pass

Parameters:
  • x (dynet.Expression) – Input expression with the shape (length, input_dim)
  • stride (int, optional) – Stride along the temporal dimension
  • zero_padded (bool, optional) – Pad the image with zeros so that the output has the same length (default True)
Returns:

Convolved sequence.

Return type:

dynet.Expression

__init__(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Conv2D(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

2D convolution

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • num_channels (int) – Number of channels in the input image
  • num_kernels (int) – Number of kernels (essentially the output dimension)
  • kernel_size (list, optional) – Default kernel size. This is a list of two elements, one per dimension.
  • activation (function, optional) – activation function (default: identity)
  • dropout (float, optional) – Dropout rate (default 0)
  • nobias (bool, optional) – Omit the bias (default False)
  • zero_padded (bool, optional) – Default padding behaviour. Pad the image with zeros so that the output has the same width/height (default True)
  • strides (list, optional) – Default stride along each dimension (list of size 2, defaults to [1, 1]).
__call__(x, strides=None, zero_padded=None)

Forward pass

Parameters:
  • x (dynet.Expression) – Input image (3-d tensor) or matrix.
  • zero_padded (bool, optional) – Pad the image with zeros so that the output has the same width/height. If this is not specified, the default specified in the constructor is used.
  • strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns:

Convolved image.

Return type:

dynet.Expression

__init__(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Flatten

Bases: dynn.layers.base_layers.BaseLayer

Flattens the output such that there is only one dimension left (batch dimension notwithstanding)

Example:

# Create the layer
flatten = Flatten()
# Dummy batched 2d input
x = dy.zeros((3, 4), batch_size=7)
# x.dim() -> (3, 4), 7
y = flatten(x)
# y.dim() -> (12,), 7
__call__(x)

Flattens the output such that there is only one dimension left (batch dimension notwithstanding)

Parameters:x ([type]) – [description]
Returns:[description]
Return type:[type]
__init__()

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.LayerNorm(pc, input_dim, gain=None, bias=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Layer normalization layer:

\(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\)

Parameters:
__call__(x, d=None)

Layer-normalize the input.

Parameters:x (dynet.Expression) – Input expression
Returns:\(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\)
Return type:dynet.Expression
__init__(pc, input_dim, gain=None, bias=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.Sequential(*layers, default_return_last_only=True)

Bases: dynn.layers.base_layers.BaseLayer

A helper class to stack layers into deep networks.

Parameters:
  • layers (list) – A list of dynn.layers.BaseLayer objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. the output of each layer can be fed into the next one)
  • default_return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
__call__(x, return_last_only=None)

Calls all the layers in succession.

Computes layers[n-1](layers[n-2](...layers[0](x)))

Parameters:
  • x (dynet.Expression) – Input expression
  • return_last_only (bool, optional) – Overrides the default
Returns:

Depending on

return_last_only, returns either the last expression or a list of all the layer’s outputs (first to last)

Return type:

dynet.Expression, list

__init__(*layers, default_return_last_only=True)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Parallel(*layers, dim=0, default_insert_dim=False)

Bases: dynn.layers.base_layers.BaseLayer

A helper class to run layers on the same input and concatenate their outputs

This can be used to create 2d conv layers with multiple kernel sizes by concatenating multiple dynn.layers.Conv2D .

Parameters:
  • layers (list) – A list of dynn.layers.BaseLayer objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. each layer takes the same input and the outputs have the same shape everywhere except along the concatenation dimension)
  • dim (int) – The concatenation dimension
  • default_insert_dim (bool, optional) – Instead of concatenating along an existing dimension, insert a a new dimension at dim and concatenate.
__call__(x, insert_dim=None, **kwargs)

Calls all the layers in succession.

Computes dy.concatenate([layers[0](x)...layers[n-1](x)], d=dim)

Parameters:
  • x (dynet.Expression) – Input expression
  • default_insert_dim (bool, optional) – Override the default
Returns:

Depending on

return_last_only, returns either the last expression or a list of all the layer’s outputs (first to last)

Return type:

dynet.Expression, list

__init__(*layers, dim=0, default_insert_dim=False)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.Transformer(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.base_layers.ParametrizedLayer

Transformer layer.

As described in Vaswani et al. (2017) This is the “encoder” side of the transformer, ie self attention only.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Hidden dimension (used everywhere)
  • n_heads (int) – Number of heads for self attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False)

Run the transformer layer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
  • left_aligned (bool, optional) – Defaults to True. Used for masking
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers)
  • return_att (bool, optional) – Defaults to False. Return the self attention weights
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Creates a subcollection for this layer with a custom name

class dynn.layers.StackedTransformers(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.combination_layers.Sequential

Multilayer transformer.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_layers (int) – Number of layers
  • input_dim (int) – Hidden dimension (used everywhere)
  • n_heads (int) – Number of heads for self attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False, return_last_only=True)

Run the multilayer transformer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
  • left_aligned (bool, optional) – Defaults to True. USed for masking
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers)
  • return_att (bool, optional) – Defaults to False. Return the self attention weights
  • return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)

Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.CondTransformer(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.base_layers.ParametrizedLayer

Conditional transformer layer.

As described in Vaswani et al. (2017) This is the “decoder” side of the transformer, ie self attention + attention to context.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • input_dim (int) – Hidden dimension (used everywhere)
  • cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
  • n_heads (int) – Number of heads for attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)

Run the transformer layer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • c (dynet.Expression) – Context (dimensions cond_dim x l)
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
  • left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers).
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Creates a subcollection for this layer with a custom name

step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)

Runs the transformer for one step. Useful for decoding.

The “state” of the transformer is the list of L-1 inputs and its output is the L th output. This returns a tuple of both the new state (L-1 previous inputs + L th input concatenated) and the L th output

Parameters:
  • x (dynet.Expression) – Input (dimension input_dim)
  • state (dynet.Expression, optional) – Previous “state” (dimensions input_dim x (L-1))
  • c (dynet.Expression) – Context (dimensions cond_dim x l)
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
  • return_att – Defaults to False. [description]
Returns:

[description]

Return type:

[type]

class dynn.layers.StackedCondTransformers(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Bases: dynn.layers.combination_layers.Sequential

Multilayer transformer.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_layers (int) – Number of layers
  • input_dim (int) – Hidden dimension (used everywhere)
  • cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
  • n_heads (int) – Number of heads for self attention.
  • activation (function, optional) – MLP activation (defaults to relu).
  • dropout (float, optional) – Dropout rate (defaults to 0)
__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)

Run the multilayer transformer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:
  • x (dynet.Expression) – Input (dimensions input_dim x L)
  • c (list) – list of contexts (one per layer, each of dim cond_dim x L). If this is not a list (but an expression), the same context will be used for each layer.
  • lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
  • left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
  • mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers).
  • triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)

Initialize self. See help(type(self)) for accurate signature.

step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)

Runs the transformer for one step. Useful for decoding.

The “state” of the multilayered transformer is the list of n_layers L-1 sized inputs and its output is the output of the last layer. This returns a tuple of both the new state (list of n_layers L sized inputs) and the L th output.

Parameters:
  • x (dynet.Expression) – Input (dimension input_dim)
  • state (dynet.Expression) – Previous “state” (list of n_layers expressions of dimensions input_dim x (L-1))
  • c (dynet.Expression) – Context (dimensions cond_dim x l)
  • lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
  • left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
  • mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
  • return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
  • return_att – Defaults to False. [description]
Returns:

The output expression (+ the

attention weights if return_att is True)

Return type:

tuple