Attention layers¶

class dynn.layers.attention_layers.BilinearAttention(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶

Bases: dynn.layers.base_layers.ParametrizedLayer

Bilinear attention layer.

Here the scores are computed according to

\[\alpha_{ij}=q_i^\intercal A k_j\]

Where \(q_i,k_j\) are the ith query and jth key respectively. If dot_product is set to True this is replaced by:

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Where \(d\) is the dimension of the keys and queries.

Parameters:

pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
query_dim (int) – Queries dimension
key_dim (int) – Keys dimension
dot_product (bool, optional) – Compute attention with the dot product only (no weight matrix). The requires that query_dim==key_dim.
dropout (float, optional) – Attention dropout (defaults to 0)
A_p (dynet.Parameters, optional) – Specify the weight matrix directly.

__call__(query, keys, values, mask=None)¶

Compute attention scores and return the pooled value.

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:

query (dynet.Expression) – Query vector of size (dq, l), B
keys (dynet.Expression) – Key vectors of size (dk, L), B
values (dynet.Expression) – Value vectors of size (dv, L), B
mask (dynet.Expression, optional) – Additive mask expression for the source side (size (L,), B)

Returns:

pooled_value, scores, of size (dv,), B and: (L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶: Creates a subcollection for this layer with a custom name

class dynn.layers.attention_layers.MLPAttention(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶

Bases: dynn.layers.base_layers.ParametrizedLayer

Multilayer Perceptron based attention

Parameters:	pc (`dynet.ParameterCollection`) – Parameter collection to hold the parameters query_dim (int) – Queries dimension key_dim (int) – Keys dimension hidden_dim (int) – Hidden dimension of the MLP activation (function, optional) – MLP activation (defaults to tanh). dropout (float, optional) – Attention dropout (defaults to 0)

__call__(query, keys, values, mask=None)¶

Compute attention scores and return the pooled value

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:

query (dynet.Expression) – Query vector of size (dq,), B
keys (dynet.Expression) – Key vectors of size (dk, L), B
values (dynet.Expression) – Value vectors of size (dv, L), B
mask (dynet.Expression, optional) – Additive mask expression

Returns:

pooled_value, scores, of size (dv,), B and: (L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶: Creates a subcollection for this layer with a custom name

class dynn.layers.attention_layers.MultiHeadAttention(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶

Bases: dynn.layers.base_layers.ParametrizedLayer

Multi headed attention layer.

This functions like dot product attention

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Except the key, query and values are split into multiple heads.

Parameters:

pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
n_heads (int) – Number of heads
query_dim (int) – Dimension of queries
key_dim (int) – Dimension of keys
values_dim (int) – Dimension of values
hidden_dim (int) – Hidden dimension (must be a multiple of n_heads)
out_dim (bool, optional) – Output dimension
dropout (float, optional) – Attention dropout (defaults to 0)
Wq_p (dynet.Parameters, optional) – Specify the queries projection matrix directly.
Wk_p (dynet.Parameters, optional) – Specify the keys projection matrix directly.
Wv_p (dynet.Parameters, optional) – Specify the values projection matrix directly.
Wo_p (dynet.Parameters, optional) – Specify the output projection matrix directly.

__call__(queries, keys, values, mask=None)¶

Compute attention weightss and return the pooled value.

This expects the queries, keys and values to have dimensions dq x l, dk x L, dv x L respectively.

Returns both the pooled value and the attention weights (list of weights, one per head). You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:

queries (dynet.Expression) – Query vector of size (dq, l), B
keys (dynet.Expression) – Key vectors of size (dk, L), B
values (dynet.Expression) – Value vectors of size (dv, L), B
mask (dynet.Expression, optional) – Additive mask expression for the source side (size (L,), B)

Returns:

pooled_value, scores, of size (dv,), B and: (L,), B respectively

Return type:

tuple

__init__(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶: Creates a subcollection for this layer with a custom name