dynn.layers package¶
Layers¶
Layers are the standard unit of neural models in DyNN. Layers are typically used like this:
# Instantiate layer
layer = Layer(parameter_collection, *args, **kwargs)
# [...]
# Renew computation graph
dy.renew_cg()
# Initialize layer
layer.init(*args, **kwargs)
# Apply layer forward pass
y = layer(x)
-
class
dynn.layers.
BaseLayer
(name)¶ Bases:
object
Base layer interface
-
__call__
(*args, **kwargs)¶ Execute forward pass
-
__init__
(name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
init
(test=True, update=False)¶ Initialize the layer before performing computation
For example setup dropout, freeze some parameters, etc…
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
sublayers
¶ Returns all attributes of the layer which are layers themselves
-
-
class
dynn.layers.
ParametrizedLayer
(pc, name)¶ Bases:
dynn.layers.base_layers.BaseLayer
This is the base class for layers with trainable parameters
When implementing a ParametrizedLayer, use
self.add_parameters
/self.add_lookup_parameters
to add parameters to the layer.-
__init__
(pc, name)¶ Creates a subcollection for this layer with a custom name
-
add_lookup_parameters
(name, dim, lookup_param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)¶ This adds a parameter to this layer’s parametercollection
The layer will have 1 new attribute:
self.[name]
which will contain the lookup parameter object (which you should use in__call__
).You can provide an existing lookup parameter with the
lookup_param
argument, in which case this parameter will be reused.The other arguments are the same as
dynet.ParameterCollection.add_lookup_parameters
-
add_parameters
(name, dim, param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)¶ This adds a parameter to this layer’s ParameterCollection.
The layer will have 1 new attribute:
self.[name]
which will contain the expression for this parameter (which you should use in__call__
).You can provide an existing parameter with the param argument, in which case this parameter will be reused.
The other arguments are the same as
dynet.ParameterCollection.add_parameters
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
lookup_parameters
¶ Return all lookup parameters specific to this layer
-
parameters
¶ Return all parameters specific to this layer
-
-
class
dynn.layers.
Lambda
(function)¶ Bases:
dynn.layers.base_layers.BaseLayer
This layer applies an arbitrary function to its input.
Lambda(f)(x) == f(x)
This is useful if you want to wrap activation functions as layers. The unary operation should be a function taking
dynet.Expression
todynet.Expression
.You shouldn’t use this to stack layers though,
op
oughtn’t be a layer. If you want to stack layers, usecombination_layers.Sequential
.Parameters: - layer (
base_layers.BaseLayer
) – The layer to which output you want to apply the unary operation. - binary_operation (function) – A unary operation on
dynet.Expression
objects
-
__call__
(*args, **kwargs)¶ Returns
function(*args, **kwargs)
-
__init__
(function)¶ Initialize self. See help(type(self)) for accurate signature.
- layer (
-
class
dynn.layers.
Affine
(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Densely connected layer
\(y=f(Wx+b)\)
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output dimension
- activation (function, optional) – activation function (default: :py:function:`identity`)
- dropout (float, optional) – Dropout rate (default 0)
- nobias (bool, optional) – Omit the bias (default
False
)
-
__call__
(x)¶ Forward pass.
Parameters: x ( dynet.Expression
) – Input expression (a vector)Returns: \(y=f(Wx+b)\) Return type: dynet.Expression
-
__init__
(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
Embeddings
(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Layer for embedding elements of a dictionary
Example:
# Dictionary dic = dynn.data.dictionary.Dictionary(symbols=["a", "b"]) # Parameter collection pc = dy.ParameterCollection() # Embedding layer of dimension 10 embed = Embeddings(pc,dic, 10) # Initialize dy.renew_cg() embed.init() # Return a batch of 2 10-dimensional vectors vectors = embed([dic.index("b"), dic.index("a")])
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - dictionary (
dynn.data.dictionary.Dictionary
) – Mapping from symbols to indices - embed_dim (int) – Embedding dimension
- init (
dynet.PyInitializer
, optional) – How to initialize the parameters. By default this will initialize to \(\mathcal N(0, \frac{\)}{sqrt{textt{embed_dim}}})` - pad_mask (float, optional) – If provided, embeddings of the
dictionary.pad_idx
index will be masked with this value
-
__call__
(idxs, length_dim=0)¶ Returns the input’s embedding
If
idxs
is a list this returns a batch of embeddings. If it’s a numpy array of shapeN x b
it returns a batch ofb
N x embed_dim
matricesParameters: idxs (list,int) – Index or list of indices to embed Returns: Batch of embeddings Return type: dynet.Expression
-
__init__
(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)¶ Creates a subcollection for this layer with a custom name
-
weights
¶ Numpy array containing the embeddings
The first dimension is the lookup dimension
- pc (
-
class
dynn.layers.
Residual
(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)¶ Bases:
dynn.layers.base_layers.BaseLayer
Adds residual connections to a layer
-
__call__
(*args, **kwargs)¶ Execute forward pass
-
__init__
(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
RecurrentCell
(*args, **kwargs)¶ Bases:
object
Base recurrent cell interface
Recurrent cells must provide a default initial value for their recurrent state (eg. all zeros)
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_output
(state)¶ Get the cell’s output from the list of states.
For example this would return
h
fromh,c
in the case of the LSTM
-
initial_value
(batch_size=1)¶ Initial value of the recurrent state. Should return a list.
-
-
class
dynn.layers.
StackedRecurrentCells
(*cells)¶ Bases:
dynn.layers.base_layers.BaseLayer
,dynn.layers.recurrent_layers.RecurrentCell
This implements a stack of recurrent layers
The recurrent state of the resulting cell is the list of the states of all the sub-cells. For example for a stack of 2 LSTM cells the resulting state will be
[h_1, c_1, h_2, c_2]
Example:
# Parameter collection pc = dy.ParameterCollection() # Stacked recurrent cell stacked_cell = StackedRecurrentCells( LSTM(pc, 10, 15), LSTM(pc, 15, 5), ElmanRNN(pc, 5, 20), ) # Inputs dy.renew_cg() x = dy.random_uniform(10, batch_size=5) # Initialize layer stacked_cell.init(test=False) # Initial state: [h_1, c_1, h_2, c_2, h_3] of sizes [15, 15, 5, 5, 20] init_state = stacked_cell.initial_value() # Run the cell on the input. new_state = stacked_cell(x, *init_state) # Get the final output (h_3 of size 20) h = stacked_cell.get_output(new_state)
-
__call__
(x, *state)¶ Compute the cell’s output from the list of states and an input expression
Parameters: x ( dynet.Expression
) – Input vectorReturns: new recurrent state Return type: list
-
__init__
(*cells)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_output
(state)¶ Get the output of the last cell
-
initial_value
(batch_size=1)¶ Initial value of the recurrent state.
-
-
class
dynn.layers.
ElmanRNN
(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
,dynn.layers.recurrent_layers.RecurrentCell
The standard Elman RNN cell:
\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output (hidden) dimension
- activation (function, optional) – Activation function \(sigma\)
(default:
dynn.activations.tanh()
) - dropout (float, optional) – Dropout rate (default 0)
-
__call__
(x, h)¶ Perform the recurrent update.
Parameters: - x (
dynet.Expression
) – Input vector - h (
dynet.Expression
) – Previous recurrent vector
Returns: - Next recurrent state
\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)
Return type: - x (
-
__init__
(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)¶ Creates a subcollection for this layer with a custom name
-
get_output
(state)¶ Get the cell’s output from the list of states.
For example this would return
h
fromh,c
in the case of the LSTM
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
initial_value
(batch_size=1)¶ Return a vector of dimension hidden_dim filled with zeros
Returns: Zero vector Return type: dynet.Expression
- pc (
-
class
dynn.layers.
LSTM
(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
,dynn.layers.recurrent_layers.RecurrentCell
Standard LSTM
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output (hidden) dimension
- dropout_x (float, optional) – Input dropout rate (default 0)
- dropout_h (float, optional) – Recurrent dropout rate (default 0)
-
__call__
(x, h, c)¶ Perform the recurrent update.
Parameters: - x (
dynet.Expression
) – Input vector - h (
dynet.Expression
) – Previous recurrent vector - c (
dynet.Expression
) – Previous cell state vector
Returns: dynet.Expression
for the ext recurrent statesh
andc
Return type: - x (
-
__init__
(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)¶ Creates a subcollection for this layer with a custom name
-
get_output
(state)¶ Get the cell’s output from the list of states.
For example this would return
h
fromh,c
in the case of the LSTM
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
- pc (
-
class
dynn.layers.
StackedLSTM
(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)¶ Bases:
dynn.layers.recurrent_layers.StackedRecurrentCells
Stacked LSTMs
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - num_layers (int) – Number of layers
- input_dim (int) – Input dimension
- output_dim (int) – Output (hidden) dimension
- dropout_x (float, optional) – Input dropout rate (default 0)
- dropout_h (float, optional) – Recurrent dropout rate (default 0)
-
__init__
(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
- pc (
-
class
dynn.layers.
Transduction
(layer)¶ Bases:
dynn.layers.base_layers.BaseLayer
Feed forward transduction layer
This layer runs one cell on a sequence of inputs and returns the list of outputs. Calling it is equivalent to calling:
[layer(x) for x in input_sequence]
Parameters: cell ( base_layers.BaseLayer
) – The recurrent cell to use for transduction-
__call__
(input_sequence)¶ Runs the layer over the input
The output is a list of the output of the layer at each step
Parameters: input_sequence (list) – Input as a list of dynet.Expression
objectsReturns: List of recurrent states (depends on the recurrent layer) Return type: list
-
__init__
(layer)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
Unidirectional
(cell, output_only=False)¶ Bases:
dynn.layers.base_layers.BaseLayer
Unidirectional transduction layer
This layer runs a recurrent cell on a sequence of inputs and produces resulting the sequence of recurrent states.
Example:
# LSTM cell lstm_cell = dynn.layers.LSTM(dy.ParameterCollection(), 10, 10) # Transduction layer lstm = dynn.layers.Unidirectional(lstm_cell) # Inputs dy.renew_cg() xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)] # Initialize layer lstm.init(test=False) # Transduce forward states = lstm(xs) # Retrieve last h h_final = states[-1][0]
Parameters: - cell (
recurrent_layers.RecurrentCell
) – The recurrent cell to use for transduction - output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
-
__call__
(input_sequence, backward=False, lengths=None, left_padded=True, output_only=None, initial_state=None)¶ Transduces the sequence using the recurrent cell.
The output is a list of the output states at each step. For instance in an LSTM the output is
(h1, c1), (h2, c2), ...
This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.
Parameters: - input_sequence (list) – Input as a list of
dynet.Expression
objects - backward (bool, optional) – If this is
True
the sequence will be processed from left to right. The output sequence will still be in the same order as the input sequence though. - lengths (list, optional) – If the expressions in the sequence are
batched, but have different lengths, this should contain a list
of the sequence lengths (default:
None
) - left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
- output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
- initial_state (
dy.Expression
, optional) – Overridies the default initial state of the recurrent cell
Returns: List of recurrent states (depends on the recurrent layer)
Return type: - input_sequence (list) – Input as a list of
-
__init__
(cell, output_only=False)¶ Initialize self. See help(type(self)) for accurate signature.
- cell (
-
class
dynn.layers.
Bidirectional
(forward_cell, backward_cell, output_only=False)¶ Bases:
dynn.layers.base_layers.BaseLayer
Bidirectional transduction layer
This layer runs a recurrent cell on in each direction on a sequence of inputs and produces resulting the sequence of recurrent states.
Example:
# Parameter collection pc = dy.ParameterCollection() # LSTM cell fwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10) bwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10) # Transduction layer bilstm = dynn.layers.Bidirectional(fwd_lstm_cell, bwd_lstm_cell) # Inputs dy.renew_cg() xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)] # Initialize layer bilstm.init(test=False) # Transduce forward fwd_states, bwd_states = bilstm(xs) # Retrieve last h fwd_h_final = fwd_states[-1][0] # For the backward LSTM the final state is at # the beginning of the sequence (assuming left padding) bwd_h_final = fwd_states[0][0]
Parameters: - forward_cell (
recurrent_layers.RecurrentCell
) – The recurrent cell to use for forward transduction - backward_cell (
recurrent_layers.RecurrentCell
) – The recurrent cell to use for backward transduction - output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
-
__call__
(input_sequence, lengths=None, left_padded=True, output_only=None, fwd_initial_state=None, bwd_initial_state=None)¶ Transduces the sequence in both directions
The output is a tuple
forward_states, backward_states
where eachforward_states
is a list of the output states of the forward recurrent cell at each step (andbackward_states
for the backward cell). For instance in a BiLSTM the output is[(fwd_h1, fwd_c1), ...], [(bwd_h1, bwd_c1), ...]
This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.
Parameters: - input_sequence (list) – Input as a list of
dynet.Expression
objects - lengths (list, optional) – If the expressions in the sequence are
batched, but have different lengths, this should contain a list
of the sequence lengths (default:
None
) - left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
- output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
- fwd_initial_state (
dy.Expression
, optional) – Overridies the default initial state of the forward recurrent cell. - bwd_initial_state (
dy.Expression
, optional) – Overridies the default initial state of the backward recurrent cell.
Returns: - List of forward and backward recurrent states
(depends on the recurrent layer)
Return type: - input_sequence (list) – Input as a list of
-
__init__
(forward_cell, backward_cell, output_only=False)¶ Initialize self. See help(type(self)) for accurate signature.
- forward_cell (
-
class
dynn.layers.
MaxPool1D
(kernel_size=None, stride=1)¶ Bases:
dynn.layers.base_layers.BaseLayer
1D max pooling
Parameters: -
__call__
(x, kernel_size=None, stride=None)¶ Max pooling over the first dimension.
This takes either a list of
N
d
-dimensional vectors or aN x d
matrix.The output will be a matrix of dimension
(N - kernel_size + 1) // stride x d
Parameters: - x (
dynet.Expression
) – Input matrix or list of vectors - dim (int, optional) – The reduction dimension (default:
0
) - kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
- stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns: Pooled sequence.
Return type: - x (
-
__init__
(kernel_size=None, stride=1)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
MaxPool2D
(kernel_size=None, strides=None)¶ Bases:
dynn.layers.base_layers.BaseLayer
2D max pooling.
Parameters: -
__call__
(x, kernel_size=None, strides=None)¶ Max pooling over the first dimension.
If either of the
kernel_size
elements is not specified, the pooling will be done over the full dimension (and the stride is ignored)Parameters: - x (
dynet.Expression
) – Input image (3-d tensor) or matrix. - kernel_size (list, optional) – Size of the pooling kernel. If this is not specified, the default specified in the constructor is used.
- strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns: Pooled sequence.
Return type: - x (
-
__init__
(kernel_size=None, strides=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
MeanPool1D
(kernel_size=None, stride=1)¶ Bases:
dynn.layers.base_layers.BaseLayer
1D mean pooling.
The stride and kernel size arguments are here for consistency with
MaxPooling1D
but they are unsupported for now.Parameters: -
__call__
(x, kernel_size=None, stride=None, lengths=None)¶ Mean pooling over the first dimension.
This takes either a list of
N
d
-dimensional vectors or aN x d
matrix.The output will be a matrix of dimension
(N - kernel_size + 1) // stride x d
Parameters: - x (
dynet.Expression
) – Input matrix or list of vectors - dim (int, optional) – The reduction dimension (default:
0
) - kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
- stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns: Pooled sequence.
Return type: - x (
-
__init__
(kernel_size=None, stride=1)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
MLPAttention
(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Multilayer Perceptron based attention
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - query_dim (int) – Queries dimension
- key_dim (int) – Keys dimension
- hidden_dim (int) – Hidden dimension of the MLP
- activation (function, optional) – MLP activation (defaults to tanh).
- dropout (float, optional) – Attention dropout (defaults to 0)
-
__call__
(query, keys, values, mask=None)¶ Compute attention scores and return the pooled value
This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - query (
dynet.Expression
) – Query vector of size(dq,), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - query (
-
__init__
(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
BilinearAttention
(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Bilinear attention layer.
Here the scores are computed according to
\[\alpha_{ij}=q_i^\intercal A k_j\]Where \(q_i,k_j\) are the ith query and jth key respectively. If
dot_product
is set toTrue
this is replaced by:\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]Where \(d\) is the dimension of the keys and queries.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - query_dim (int) – Queries dimension
- key_dim (int) – Keys dimension
- dot_product (bool, optional) – Compute attention with the dot product
only (no weight matrix). The requires that
query_dim==key_dim
. - dropout (float, optional) – Attention dropout (defaults to 0)
- A_p (
dynet.Parameters
, optional) – Specify the weight matrix directly.
-
__call__
(query, keys, values, mask=None)¶ Compute attention scores and return the pooled value.
This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - query (
dynet.Expression
) – Query vector of size(dq, l), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression for the source side (size(L,), B
)
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - query (
-
__init__
(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
MultiHeadAttention
(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Multi headed attention layer.
This functions like dot product attention
\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]Except the key, query and values are split into multiple
heads
.Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_heads (int) – Number of heads
- query_dim (int) – Dimension of queries
- key_dim (int) – Dimension of keys
- values_dim (int) – Dimension of values
- hidden_dim (int) – Hidden dimension (must be a multiple of
n_heads
) - out_dim (bool, optional) – Output dimension
- dropout (float, optional) – Attention dropout (defaults to 0)
- Wq_p (
dynet.Parameters
, optional) – Specify the queries projection matrix directly. - Wk_p (
dynet.Parameters
, optional) – Specify the keys projection matrix directly. - Wv_p (
dynet.Parameters
, optional) – Specify the values projection matrix directly. - Wo_p (
dynet.Parameters
, optional) – Specify the output projection matrix directly.
-
__call__
(queries, keys, values, mask=None)¶ Compute attention weightss and return the pooled value.
This expects the queries, keys and values to have dimensions
dq x l
,dk x L
,dv x L
respectively.Returns both the pooled value and the attention weights (list of weights, one per head). You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - queries (
dynet.Expression
) – Query vector of size(dq, l), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression for the source side (size(L,), B
)
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - queries (
-
__init__
(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
Conv1D
(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
1D convolution along the first dimension
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- num_kernels (int) – Number of kernels (essentially the output dimension)
- kernel_width (int) – Width of the kernels
- activation (function, optional) – activation function
(default:
identity
) - dropout (float, optional) – Dropout rate (default 0)
- nobias (bool, optional) – Omit the bias (default
False
) - zero_padded (bool, optional) – Default padding behaviour. Pad
the input with zeros so that the output has the same length
(default
True
) - stride (list, optional) – Default stride along the length
(defaults to
1
).
-
__call__
(x, stride=None, zero_padded=None)¶ Forward pass
Parameters: - x (
dynet.Expression
) – Input expression with the shape (length, input_dim) - stride (int, optional) – Stride along the temporal dimension
- zero_padded (bool, optional) – Pad the image with zeros so that the
output has the same length (default
True
)
Returns: Convolved sequence.
Return type: - x (
-
__init__
(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
Conv2D
(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
2D convolution
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - num_channels (int) – Number of channels in the input image
- num_kernels (int) – Number of kernels (essentially the output dimension)
- kernel_size (list, optional) – Default kernel size. This is a list of two elements, one per dimension.
- activation (function, optional) – activation function
(default:
identity
) - dropout (float, optional) – Dropout rate (default 0)
- nobias (bool, optional) – Omit the bias (default
False
) - zero_padded (bool, optional) – Default padding behaviour. Pad
the image with zeros so that the output has the same width/height
(default
True
) - strides (list, optional) – Default stride along each dimension
(list of size 2, defaults to
[1, 1]
).
-
__call__
(x, strides=None, zero_padded=None)¶ Forward pass
Parameters: - x (
dynet.Expression
) – Input image (3-d tensor) or matrix. - zero_padded (bool, optional) – Pad the image with zeros so that the output has the same width/height. If this is not specified, the default specified in the constructor is used.
- strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns: Convolved image.
Return type: - x (
-
__init__
(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
Flatten
¶ Bases:
dynn.layers.base_layers.BaseLayer
Flattens the output such that there is only one dimension left (batch dimension notwithstanding)
Example:
# Create the layer flatten = Flatten() # Dummy batched 2d input x = dy.zeros((3, 4), batch_size=7) # x.dim() -> (3, 4), 7 y = flatten(x) # y.dim() -> (12,), 7
-
__call__
(x)¶ Flattens the output such that there is only one dimension left (batch dimension notwithstanding)
Parameters: x ([type]) – [description] Returns: [description] Return type: [type]
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
LayerNorm
(pc, input_dim, gain=None, bias=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Layer normalization layer:
\(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\)
Parameters: - input_dim (int, tuple) – Input dimension
- pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters
-
__call__
(x, d=None)¶ Layer-normalize the input.
Parameters: x ( dynet.Expression
) – Input expressionReturns: \(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\) Return type: dynet.Expression
-
__init__
(pc, input_dim, gain=None, bias=None)¶ Creates a subcollection for this layer with a custom name
-
class
dynn.layers.
Sequential
(*layers, default_return_last_only=True)¶ Bases:
dynn.layers.base_layers.BaseLayer
A helper class to stack layers into deep networks.
Parameters: - layers (list) – A list of
dynn.layers.BaseLayer
objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. the output of each layer can be fed into the next one) - default_return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
-
__call__
(x, return_last_only=None)¶ Calls all the layers in succession.
Computes
layers[n-1](layers[n-2](...layers[0](x)))
Parameters: - x (
dynet.Expression
) – Input expression - return_last_only (bool, optional) – Overrides the default
Returns: - Depending on
return_last_only
, returns either the last expression or a list of all the layer’s outputs (first to last)
Return type: dynet.Expression
, list- x (
-
__init__
(*layers, default_return_last_only=True)¶ Initialize self. See help(type(self)) for accurate signature.
- layers (list) – A list of
-
class
dynn.layers.
Parallel
(*layers, dim=0, default_insert_dim=False)¶ Bases:
dynn.layers.base_layers.BaseLayer
A helper class to run layers on the same input and concatenate their outputs
This can be used to create 2d conv layers with multiple kernel sizes by concatenating multiple
dynn.layers.Conv2D
.Parameters: - layers (list) – A list of
dynn.layers.BaseLayer
objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. each layer takes the same input and the outputs have the same shape everywhere except along the concatenation dimension) - dim (int) – The concatenation dimension
- default_insert_dim (bool, optional) – Instead of concatenating along an
existing dimension, insert a a new dimension at
dim
and concatenate.
-
__call__
(x, insert_dim=None, **kwargs)¶ Calls all the layers in succession.
Computes
dy.concatenate([layers[0](x)...layers[n-1](x)], d=dim)
Parameters: - x (
dynet.Expression
) – Input expression - default_insert_dim (bool, optional) – Override the default
Returns: - Depending on
return_last_only
, returns either the last expression or a list of all the layer’s outputs (first to last)
Return type: dynet.Expression
, list- x (
-
__init__
(*layers, dim=0, default_insert_dim=False)¶ Initialize self. See help(type(self)) for accurate signature.
- layers (list) – A list of
-
class
dynn.layers.
Transformer
(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Transformer layer.
As described in Vaswani et al. (2017) This is the “encoder” side of the transformer, ie self attention only.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. Used for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
StackedTransformers
(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.Sequential
Multilayer transformer.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
- return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
- pc (
-
class
dynn.layers.
CondTransformer
(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Conditional transformer layer.
As described in Vaswani et al. (2017) This is the “decoder” side of the transformer, ie self attention + attention to context.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
-
step
(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the transformer is the list of
L-1
inputs and its output is theL
th output. This returns a tuple of both the new state (L-1
previous inputs +L
th input concatenated) and theL
th outputParameters: - x (
dynet.Expression
) – Input (dimensioninput_dim
) - state (
dynet.Expression
, optional) – Previous “state” (dimensionsinput_dim x (L-1)
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: [description]
Return type: [type]
- x (
- pc (
-
class
dynn.layers.
StackedCondTransformers
(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.Sequential
Multilayer transformer.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - c (list) – list of contexts (one per layer, each of dim
cond_dim x L
). If this is not a list (but an expression), the same context will be used for each layer. - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
-
step
(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the multilayered transformer is the list of
n_layers
L-1
sized inputs and its output is the output of the last layer. This returns a tuple of both the new state (list ofn_layers
L
sized inputs) and theL
th output.Parameters: - x (
dynet.Expression
) – Input (dimensioninput_dim
) - state (
dynet.Expression
) – Previous “state” (list ofn_layers
expressions of dimensionsinput_dim x (L-1)
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: - x (
- pc (