Attention layers

class dynn.layers.attention_layers.BilinearAttention(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Bilinear attention layer.

Here the scores are computed according to

\[\alpha_{ij}=q_i^\intercal A k_j\]

Where \(q_i,k_j\) are the ith query and jth key respectively. If dot_product is set to True this is replaced by:

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Where \(d\) is the dimension of the keys and queries.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • query_dim (int) – Queries dimension
  • key_dim (int) – Keys dimension
  • dot_product (bool, optional) – Compute attention with the dot product only (no weight matrix). The requires that query_dim==key_dim.
  • dropout (float, optional) – Attention dropout (defaults to 0)
  • A_p (dynet.Parameters, optional) – Specify the weight matrix directly.
__call__(query, keys, values, mask=None)

Compute attention scores and return the pooled value.

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.attention_layers.MLPAttention(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Multilayer Perceptron based attention

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • query_dim (int) – Queries dimension
  • key_dim (int) – Keys dimension
  • hidden_dim (int) – Hidden dimension of the MLP
  • activation (function, optional) – MLP activation (defaults to tanh).
  • dropout (float, optional) – Attention dropout (defaults to 0)
__call__(query, keys, values, mask=None)

Compute attention scores and return the pooled value

This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)

Creates a subcollection for this layer with a custom name

class dynn.layers.attention_layers.MultiHeadAttention(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)

Bases: dynn.layers.base_layers.ParametrizedLayer

Multi headed attention layer.

This functions like dot product attention

\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]

Except the key, query and values are split into multiple heads.

Parameters:
  • pc (dynet.ParameterCollection) – Parameter collection to hold the parameters
  • n_heads (int) – Number of heads
  • query_dim (int) – Dimension of queries
  • key_dim (int) – Dimension of keys
  • values_dim (int) – Dimension of values
  • hidden_dim (int) – Hidden dimension (must be a multiple of n_heads)
  • out_dim (bool, optional) – Output dimension
  • dropout (float, optional) – Attention dropout (defaults to 0)
  • Wq_p (dynet.Parameters, optional) – Specify the queries projection matrix directly.
  • Wk_p (dynet.Parameters, optional) – Specify the keys projection matrix directly.
  • Wv_p (dynet.Parameters, optional) – Specify the values projection matrix directly.
  • Wo_p (dynet.Parameters, optional) – Specify the output projection matrix directly.
__call__(queries, keys, values, mask=None)

Compute attention weightss and return the pooled value.

This expects the queries, keys and values to have dimensions dq x l, dk x L, dv x L respectively.

Returns both the pooled value and the attention weights (list of weights, one per head). You can specify an additive mask when some values are not to be attended to (eg padding).

Parameters:
Returns:

pooled_value, scores, of size (dv,), B and

(L,), B respectively

Return type:

tuple

__init__(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)

Creates a subcollection for this layer with a custom name