Attention layers¶
-
class
dynn.layers.attention_layers.
BilinearAttention
(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Bilinear attention layer.
Here the scores are computed according to
\[\alpha_{ij}=q_i^\intercal A k_j\]Where \(q_i,k_j\) are the ith query and jth key respectively. If
dot_product
is set toTrue
this is replaced by:\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]Where \(d\) is the dimension of the keys and queries.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - query_dim (int) – Queries dimension
- key_dim (int) – Keys dimension
- dot_product (bool, optional) – Compute attention with the dot product
only (no weight matrix). The requires that
query_dim==key_dim
. - dropout (float, optional) – Attention dropout (defaults to 0)
- A_p (
dynet.Parameters
, optional) – Specify the weight matrix directly.
-
__call__
(query, keys, values, mask=None)¶ Compute attention scores and return the pooled value.
This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - query (
dynet.Expression
) – Query vector of size(dq, l), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression for the source side (size(L,), B
)
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - query (
-
__init__
(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.attention_layers.
MLPAttention
(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Multilayer Perceptron based attention
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - query_dim (int) – Queries dimension
- key_dim (int) – Keys dimension
- hidden_dim (int) – Hidden dimension of the MLP
- activation (function, optional) – MLP activation (defaults to tanh).
- dropout (float, optional) – Attention dropout (defaults to 0)
-
__call__
(query, keys, values, mask=None)¶ Compute attention scores and return the pooled value
This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - query (
dynet.Expression
) – Query vector of size(dq,), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - query (
-
__init__
(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.attention_layers.
MultiHeadAttention
(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Multi headed attention layer.
This functions like dot product attention
\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]Except the key, query and values are split into multiple
heads
.Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_heads (int) – Number of heads
- query_dim (int) – Dimension of queries
- key_dim (int) – Dimension of keys
- values_dim (int) – Dimension of values
- hidden_dim (int) – Hidden dimension (must be a multiple of
n_heads
) - out_dim (bool, optional) – Output dimension
- dropout (float, optional) – Attention dropout (defaults to 0)
- Wq_p (
dynet.Parameters
, optional) – Specify the queries projection matrix directly. - Wk_p (
dynet.Parameters
, optional) – Specify the keys projection matrix directly. - Wv_p (
dynet.Parameters
, optional) – Specify the values projection matrix directly. - Wo_p (
dynet.Parameters
, optional) – Specify the output projection matrix directly.
-
__call__
(queries, keys, values, mask=None)¶ Compute attention weightss and return the pooled value.
This expects the queries, keys and values to have dimensions
dq x l
,dk x L
,dv x L
respectively.Returns both the pooled value and the attention weights (list of weights, one per head). You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - queries (
dynet.Expression
) – Query vector of size(dq, l), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression for the source side (size(L,), B
)
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - queries (
-
__init__
(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶ Creates a subcollection for this layer with a custom name
- pc (