Transformer layers¶
-
class
dynn.layers.transformer_layers.CondTransformer(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayerConditional transformer layer.
As described in Vaswani et al. (2017) This is the “decoder” side of the transformer, ie self attention + attention to context.
Parameters: - pc (
dynet.ParameterCollection) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x LwhereLis the length dimension.Parameters: - x (
dynet.Expression) – Input (dimensionsinput_dim x L) - c (
dynet.Expression) – Context (dimensionscond_dim x l) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression, optional) – Defaults to None. As an alternative tolength, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression, optional) – Defaults to None. As an alternative tolength_c, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
Returns: - The output expression (+ the
attention weights if
return_attisTrue)
Return type: tuple,
dynet.Expression- x (
-
__init__(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
-
step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the transformer is the list of
L-1inputs and its output is theLth output. This returns a tuple of both the new state (L-1previous inputs +Lth input concatenated) and theLth outputParameters: - x (
dynet.Expression) – Input (dimensioninput_dim) - state (
dynet.Expression, optional) – Previous “state” (dimensionsinput_dim x (L-1)) - c (
dynet.Expression) – Context (dimensionscond_dim x l) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression, optional) – Defaults to None. As an alternative tolength_c, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: [description]
Return type: [type]
- x (
- pc (
-
class
dynn.layers.transformer_layers.StackedCondTransformers(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.SequentialMultilayer transformer.
Parameters: - pc (
dynet.ParameterCollection) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x LwhereLis the length dimension.Parameters: - x (
dynet.Expression) – Input (dimensionsinput_dim x L) - c (list) – list of contexts (one per layer, each of dim
cond_dim x L). If this is not a list (but an expression), the same context will be used for each layer. - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression, optional) – Defaults to None. As an alternative tolength, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression, optional) – Defaults to None. As an alternative tolength_c, you can pass a mask expression directly (useful to reuse masks accross layers). - return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_attisTrue)
Return type: tuple,
dynet.Expression- x (
-
__init__(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
-
step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the multilayered transformer is the list of
n_layersL-1sized inputs and its output is the output of the last layer. This returns a tuple of both the new state (list ofn_layersLsized inputs) and theLth output.Parameters: - x (
dynet.Expression) – Input (dimensioninput_dim) - state (
dynet.Expression) – Previous “state” (list ofn_layersexpressions of dimensionsinput_dim x (L-1)) - c (
dynet.Expression) – Context (dimensionscond_dim x l) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression, optional) – Defaults to None. As an alternative tolength_c, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: - The output expression (+ the
attention weights if
return_attisTrue)
Return type: - x (
- pc (
-
class
dynn.layers.transformer_layers.StackedTransformers(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.SequentialMultilayer transformer.
Parameters: - pc (
dynet.ParameterCollection) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x LwhereLis the length dimension.Parameters: - x (
dynet.Expression) – Input (dimensionsinput_dim x L) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression, optional) – Defaults to None. As an alternative tolength, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
- return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_attisTrue)
Return type: tuple,
dynet.Expression- x (
-
__init__(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
- pc (
-
class
dynn.layers.transformer_layers.Transformer(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayerTransformer layer.
As described in Vaswani et al. (2017) This is the “encoder” side of the transformer, ie self attention only.
Parameters: - pc (
dynet.ParameterCollection) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x LwhereLis the length dimension.Parameters: - x (
dynet.Expression) – Input (dimensionsinput_dim x L) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. Used for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression, optional) – Defaults to None. As an alternative tolength, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
Returns: - The output expression (+ the
attention weights if
return_attisTrue)
Return type: tuple,
dynet.Expression- x (
-
__init__(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
- pc (