Transformer layers¶

class dynn.layers.transformer_layers.CondTransformer(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶

Bases: dynn.layers.base_layers.ParametrizedLayer

Conditional transformer layer.

As described in Vaswani et al. (2017) This is the “decoder” side of the transformer, ie self attention + attention to context.

Parameters:

pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
input_dim (int) – Hidden dimension (used everywhere)
cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
n_heads (int) – Number of heads for attention.
activation (function, optional) – MLP activation (defaults to relu).
dropout (float, optional) – Dropout rate (defaults to 0)

__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶

Run the transformer layer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:

x (dynet.Expression) – Input (dimensions input_dim x L)
c (dynet.Expression) – Context (dimensions cond_dim x l)
lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers).
triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights

Returns:

The output expression (+ the: attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶: Creates a subcollection for this layer with a custom name

step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶

Runs the transformer for one step. Useful for decoding.

The “state” of the transformer is the list of L-1 inputs and its output is the L th output. This returns a tuple of both the new state (L-1 previous inputs + L th input concatenated) and the L th output

Parameters:	x (`dynet.Expression`) – Input (dimension `input_dim`) state (`dynet.Expression`, optional) – Previous “state” (dimensions `input_dim x (L-1)`) c (`dynet.Expression`) – Context (dimensions `cond_dim x l`) lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention) left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention. mask_c (`dynet.Expression`, optional) – Defaults to None. As an alternative to `length_c`, you can pass a mask expression directly (useful to reuse masks accross layers). return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights return_att – Defaults to False. [description]
Returns:	[description]
Return type:	[type]

class dynn.layers.transformer_layers.StackedCondTransformers(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶

Bases: dynn.layers.combination_layers.Sequential

Multilayer transformer.

Parameters:

pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
n_layers (int) – Number of layers
input_dim (int) – Hidden dimension (used everywhere)
cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
n_heads (int) – Number of heads for self attention.
activation (function, optional) – MLP activation (defaults to relu).
dropout (float, optional) – Dropout rate (defaults to 0)

__call__(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶

Run the multilayer transformer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:

x (dynet.Expression) – Input (dimensions input_dim x L)
c (list) – list of contexts (one per layer, each of dim cond_dim x L). If this is not a list (but an expression), the same context will be used for each layer.
lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers).
triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).

Returns:

The output expression (+ the: attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶: Initialize self. See help(type(self)) for accurate signature.

step(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶

Runs the transformer for one step. Useful for decoding.

The “state” of the multilayered transformer is the list of n_layers L-1 sized inputs and its output is the output of the last layer. This returns a tuple of both the new state (list of n_layers L sized inputs) and the L th output.

Parameters:

x (dynet.Expression) – Input (dimension input_dim)
state (dynet.Expression) – Previous “state” (list of n_layers expressions of dimensions input_dim x (L-1))
c (dynet.Expression) – Context (dimensions cond_dim x l)
lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
mask_c (dynet.Expression, optional) – Defaults to None. As an alternative to length_c, you can pass a mask expression directly (useful to reuse masks accross layers).
return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
return_att – Defaults to False. [description]

Returns:

The output expression (+ the: attention weights if return_att is True)

Return type:

tuple

class dynn.layers.transformer_layers.StackedTransformers(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶

Bases: dynn.layers.combination_layers.Sequential

Multilayer transformer.

Parameters:	pc (`dynet.ParameterCollection`) – Parameter collection to hold the parameters n_layers (int) – Number of layers input_dim (int) – Hidden dimension (used everywhere) n_heads (int) – Number of heads for self attention. activation (function, optional) – MLP activation (defaults to relu). dropout (float, optional) – Dropout rate (defaults to 0)

__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False, return_last_only=True)¶

Run the multilayer transformer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:

x (dynet.Expression) – Input (dimensions input_dim x L)
lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
left_aligned (bool, optional) – Defaults to True. USed for masking
triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers)
return_att (bool, optional) – Defaults to False. Return the self attention weights
return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).

Returns:

The output expression (+ the: attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶: Initialize self. See help(type(self)) for accurate signature.

class dynn.layers.transformer_layers.Transformer(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶

Bases: dynn.layers.base_layers.ParametrizedLayer

Transformer layer.

As described in Vaswani et al. (2017) This is the “encoder” side of the transformer, ie self attention only.

Parameters:	pc (`dynet.ParameterCollection`) – Parameter collection to hold the parameters input_dim (int) – Hidden dimension (used everywhere) n_heads (int) – Number of heads for self attention. activation (function, optional) – MLP activation (defaults to relu). dropout (float, optional) – Dropout rate (defaults to 0)

__call__(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False)¶

Run the transformer layer.

The input is expected to have dimensions d x L where L is the length dimension.

Parameters:

x (dynet.Expression) – Input (dimensions input_dim x L)
lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
left_aligned (bool, optional) – Defaults to True. Used for masking
triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
mask (dynet.Expression, optional) – Defaults to None. As an alternative to length, you can pass a mask expression directly (useful to reuse masks accross layers)
return_att (bool, optional) – Defaults to False. Return the self attention weights

Returns:

The output expression (+ the: attention weights if return_att is True)

Return type:

tuple, dynet.Expression

__init__(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶: Creates a subcollection for this layer with a custom name