2024广州疫情_武汉德升建站_乐事薯片软文推广_自动交换友情链接

时间:2025/7/9 20:46:32来源：https://blog.csdn.net/weixin_48846514/article/details/147223166 浏览次数:0次

1. 点积注意力（Dot-Product Attention）

点积注意力是最简单的注意力机制之一，其基本思想是通过计算查询（query）和键（key）之间的点积来得到相似度，进而为每个值（value）分配一个权重。具体步骤如下：

计算相似度：将查询向量和键向量的点积作为它们的相似度。
归一化：对相似度进行softmax归一化，得到注意力权重。
加权求和：根据计算出来的注意力权重，对值（value）进行加权求和，得到上下文向量。

代码实现：

class DotProductAttention(nn.Module):def __init__(self, hidden_size):super(DotProductAttention, self).__init__()self.out_size = hidden_size * 2def forward(self, query, keys):# 计算查询和键的点积相似度scores = (query * keys).sum(-1)scores = scores.unsqueeze(1)# 对相似度进行归一化weights = F.softmax(scores, dim=-1)# 加权求和得到上下文向量context = torch.bmm(weights, keys)return context, weights

在这里：

query 是解码器的隐藏状态。
keys 是编码器的输出。
scores 是查询和键的点积。
weights 是对 scores 进行 softmax 归一化后的注意力权重。
context 是加权求和后的上下文向量，表示当前时刻的注意力上下文。

2. 加性注意力（Additive Attention）

加性注意力是另一种常见的注意力机制，它通过将查询和键分别通过一个线性变换后加和，再通过一个非线性激活函数（如tanh）来计算相似度。

具体步骤如下：

计算相似度：查询和键经过线性变换后加和，经过tanh激活函数，再通过一个线性变换输出最终的相似度。
归一化：对相似度进行softmax归一化，得到注意力权重。
加权求和：根据注意力权重加权求和得到上下文向量。

代码实现：

class AdditiveAttention(nn.Module):def __init__(self, hidden_size):super(AdditiveAttention, self).__init__()self.Wa = nn.Linear(hidden_size, hidden_size)self.Ua = nn.Linear(hidden_size, hidden_size)self.Va = nn.Linear(hidden_size, 1)self.out_size = hidden_size * 2def forward(self, query, keys):# 计算查询和键的加性相似度scores = self.Va(torch.tanh(self.Wa(query) + self.Ua(keys)))scores = scores.squeeze(2).unsqueeze(1)# 对相似度进行归一化weights = F.softmax(scores, dim=-1)# 加权求和得到上下文向量context = torch.bmm(weights, keys)return context, weights

在这里：

query 是解码器的隐藏状态。
keys 是编码器的输出。
scores 是加性计算后的相似度。
weights 是对 scores 进行 softmax 归一化后的注意力权重。
context 是加权求和后的上下文向量。

3. Decoder（解码器）与 Attention

解码器（Decoder）将注意力机制集成到其计算过程中。解码器的输入包括编码器的输出（即 encoder_outputs）和编码器的最后一个隐藏状态（即 encoder_hidden）。在每一步，解码器都计算当前的上下文向量，并根据这个上下文向量生成新的输出。

代码实现：

class AttnDecoderRNN(nn.Module):def __init__(self, hidden_size, output_size, attention_type="none", dropout_p=0.1):super(AttnDecoderRNN, self).__init__()self.embedding = nn.Embedding(output_size, hidden_size)self.attention = get_attention_module(attention_type, hidden_size)self.gru = nn.GRU(self.attention.out_size, hidden_size, batch_first=True)self.out = nn.Linear(hidden_size, output_size)self.dropout = nn.Dropout(dropout_p)def forward(self, encoder_outputs, encoder_hidden, target_tensor=None):batch_size = encoder_outputs.size(0)decoder_input = torch.empty(batch_size, 1, dtype=torch.long, device=device).fill_(SOS_token)decoder_hidden = encoder_hiddendecoder_outputs = []attentions = []for i in range(MAX_LENGTH):decoder_output, decoder_hidden, attn_weights = self.forward_step(decoder_input, decoder_hidden, encoder_outputs)decoder_outputs.append(decoder_output)attentions.append(attn_weights)if target_tensor is not None:# Teacher forcingdecoder_input = target_tensor[:, i].unsqueeze(1)else:_, topi = decoder_output.topk(1)decoder_input = topi.squeeze(-1).detach()decoder_outputs = torch.cat(decoder_outputs, dim=1)decoder_outputs = F.log_softmax(decoder_outputs, dim=-1)attentions = torch.cat(attentions, dim=1)return decoder_outputs, decoder_hidden, attentions

在这段代码中，AttnDecoderRNN 包含了注意力机制，并且在每个时间步长上，根据当前的输入和上下文（从编码器传来的输出和隐藏状态）生成下一个预测值。

关键字：2024广州疫情_武汉德升建站_乐事薯片软文推广_自动交换友情链接

本网仅为发布的内容提供存储空间，不对发表、转载的内容提供任何形式的保证。凡本网注明“来源：XXX网络”的作品，均转载自其它媒体，著作权归作者所有，商业转载请联系作者获得授权，非商业转载请注明出处。

我们尊重并感谢每一位作者，均已注明文章来源和作者。如因作品内容、版权或其它问题，请及时与我们联系，联系邮箱：809451989@qq.com，投稿邮箱：809451989@qq.com

责任编辑：