Transformer layers¶
-
class
dynn.layers.transformer_layers.
CondTransformer
(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Conditional transformer layer.
As described in Vaswani et al. (2017) This is the “decoder” side of the transformer, ie self attention + attention to context.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
-
step
(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the transformer is the list of
L-1
inputs and its output is theL
th output. This returns a tuple of both the new state (L-1
previous inputs +L
th input concatenated) and theL
th outputParameters: - x (
dynet.Expression
) – Input (dimensioninput_dim
) - state (
dynet.Expression
, optional) – Previous “state” (dimensionsinput_dim x (L-1)
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: [description]
Return type: [type]
- x (
- pc (
-
class
dynn.layers.transformer_layers.
StackedCondTransformers
(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.Sequential
Multilayer transformer.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - c (list) – list of contexts (one per layer, each of dim
cond_dim x L
). If this is not a list (but an expression), the same context will be used for each layer. - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
-
step
(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the multilayered transformer is the list of
n_layers
L-1
sized inputs and its output is the output of the last layer. This returns a tuple of both the new state (list ofn_layers
L
sized inputs) and theL
th output.Parameters: - x (
dynet.Expression
) – Input (dimensioninput_dim
) - state (
dynet.Expression
) – Previous “state” (list ofn_layers
expressions of dimensionsinput_dim x (L-1)
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: - x (
- pc (
-
class
dynn.layers.transformer_layers.
StackedTransformers
(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.Sequential
Multilayer transformer.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
- return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
- pc (
-
class
dynn.layers.transformer_layers.
Transformer
(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Transformer layer.
As described in Vaswani et al. (2017) This is the “encoder” side of the transformer, ie self attention only.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. Used for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
- pc (