2024/11/2-2024/11/4:
大部分思路来自于:PyTorch RNN的原理及其手写复现
https://www.bilibili.com/video/BV13i4y1R7jB/?spm_id_from=333.880.my_history.page.click&vd_source=db0d5acc929b82408b1040d67f2b1dde 改动了一部分使代码实现更加符合公式和直觉。
参数设置与官方api验证:
初始化一些必要的参数:
import torch
import torch.nn as nn# batch size, sequence length
bs, T = 2, 3
input_size, hidden_size = 2, 3
input = torch.randn(bs, T, input_size)
# 初始化隐藏状态,作为第一个时间步的隐藏状态输入
h_prev = torch.randn(bs, hidden_size)
可以看到input的size是 [batch size, sequence length, input_size] 。参考下图,我们的输入shape设置参考了官方api中提供的对input的要求,当rnn实例化时的参数batch_first设置为true时,个人觉得这样的设置也比较符合直觉和习惯。
至于h0的设置,在代码中对应我们的h_prev。它和任何一个h的shape都一样,只不过在计算当前的sequence的隐藏状态ht时,前一个sequence的隐藏状态ht-1是必要的,如下图。那么当我们计算第一个sequence的h1时,因为它前面已经没有sequence了,所以我们需要给他提供一个h0用作计算。
而h0或者说所有h的shape取决于RNN的层数以及是否采用RNN的双向参数bidirectional。
这里简单验证一下h的shape与num_layers以及bidirectional的关系,可以看到与官方文档中给出的信息相符合。
import torch
# 单向 单层RNN
import torch.nn as nn
single_rnn = nn.RNN(4, 3, 1, batch_first=True)
input = torch.randn(1, 2, 4)
output, h_n = single_rnn(input)
output
Out[8]:
tensor([[[-0.8700, -0.8963, -0.9267],[ 0.0953, -0.4410, -0.9181]]], grad_fn=<TransposeBackward1>)
h_n
Out[9]: tensor([[[ 0.0953, -0.4410, -0.9181]]], grad_fn=<StackBackward0>)
# 双向 单层RNN
bi_rnn = nn.RNN(4, 3, 1, batch_first=True, bidirectional=True)
bi_output, bi_h_n = bi_rnn(input)
bi_output.shape
Out[13]: torch.Size([1, 2, 6])
bi_h_n.shape
Out[14]: torch.Size([2, 1, 3])
output.shape
Out[15]: torch.Size([1, 2, 3])
h_n.shape
Out[16]: torch.Size([1, 1, 3])
# 单向 多层RNN
sm_rnn = nn.RNN(4, 3, 2, batch_first=True)
sm_output, sm_h = sm_rnn(input)
sm_output.shape
Out[21]: torch.Size([1, 2, 3])
sm_h.shape
Out[22]: torch.Size([2, 1, 3])
# 双向 多层RNN
bm_rnn = nn.RNN(4, 3, 2, batch_first=True, bidirectional=True)
bm_output, bm_h = bm_rnn(input)
bm_output
Out[26]:
tensor([[[ 0.6568, -0.0670, -0.7799, 0.2645, -0.9087, 0.9372],[ 0.7104, 0.3997, -0.6929, -0.3831, -0.7795, 0.7643]]],grad_fn=<TransposeBackward1>)
bm_output.shape
Out[27]: torch.Size([1, 2, 6])
bm_h.shape
Out[28]: torch.Size([4, 1, 3])
上个代码块还顺便验证了输出output的shape,可以看到也与下图官方给出的信息符合,若采用双向bidirectional参数,则双向的RNN将前向和后向的输出在最后一个维度拼接。
代码实现:
根据前一部分的验证,对照RNN关于h的公式,我们可以得到如下代码:
def rnn_forward(input, weight_ih, weight_hh, bias_ih, bias_hh, h_prev):# input: [bs, T, input_size]# weight_ih: [hidden_size, input_size]# weight_hh: [hidden_size, hidden_size]# bias_ih: [hidden_size]# bias_hh: [hidden_size]# h_prev: [bs, hidden_size]bs, T, input_size = input.shapeh_dim = weight_ih.shape[0]h_out = torch.zeros(bs, T, h_dim) # 初始化一个输出张量for t in range(T):x = input[:, t, :] # [bs, input_size]w_times_x = torch.matmul(x, weight_ih.T) # [bs, h_dim]w_times_h = torch.matmul(h_prev, weight_hh.T) # [bs, h_dim]h_prev = torch.tanh(w_times_x + w_times_h + bias_ih + bias_hh) # [bs, h_dim]h_out[:, t, :] = h_prevreturn h_out, h_prev.unsqueeze(0)
将该函数以下面的测试用例进行测试,输出结果如下:
rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size, batch_first=True)
# unsqueeze(0)增加一个维度,变成3维张量, 作为第一个时间步的隐藏状态输入
rnn_output, state_final = rnn(input, h_prev.unsqueeze(0))
print("pytorch RNN API:")
print(rnn_output)
print(state_final)# 验证rnn_forward函数的正确性
for p, n in rnn.named_parameters():print(p, n.shape)# 直接采用rnn的参数
custom_rnn_output, custom_state_final = rnn_forward(input, rnn.weight_ih_l0, rnn.weight_hh_l0,rnn.bias_ih_l0, rnn.bias_hh_l0, h_prev)
print("custom rnn forward:")
print(custom_rnn_output)
print(custom_state_final)
pytorch RNN API:
tensor([[[ 0.3759, -0.7116, -0.8993],
[-0.5924, -0.1507, 0.9623],
[-0.2508, -0.2265, 0.3904]],[[-0.6226, -0.6587, 0.5304],
[-0.4655, -0.1730, 0.8652],
[-0.1209, -0.5013, 0.0275]]], grad_fn=<TransposeBackward1>)
tensor([[[-0.2508, -0.2265, 0.3904],
[-0.1209, -0.5013, 0.0275]]], grad_fn=<StackBackward0>)
weight_ih_l0 torch.Size([3, 2])
weight_hh_l0 torch.Size([3, 3])
bias_ih_l0 torch.Size([3])
bias_hh_l0 torch.Size([3])
custom rnn forward:
tensor([[[ 0.3759, -0.7116, -0.8993],
[-0.5924, -0.1507, 0.9623],
[-0.2508, -0.2265, 0.3904]],[[-0.6226, -0.6587, 0.5304],
[-0.4655, -0.1730, 0.8652],
[-0.1209, -0.5013, 0.0275]]], grad_fn=<CopySlices>)
tensor([[[-0.2508, -0.2265, 0.3904],
[-0.1209, -0.5013, 0.0275]]], grad_fn=<UnsqueezeBackward0>)
可以看到官方api与我们写的函数输出相同。
运用写出的rnn_forward函数,同理可得双向RNN的函数如下:
# 定义双向RNN
def bidirectional_rnn_forward(input, weight_ih, weight_hh, bias_ih, bias_hh, h_prev, weight_ih_reverse,weight_hh_reverse, bias_ih_reverse, bias_hh_reverse, h_prev_reverse):bs, T, input_size = input.shapeh_dim = weight_ih.shape[0]h_out, h_prev = rnn_forward(input, weight_ih, weight_hh, bias_ih, bias_hh, h_prev)h_out_reverse, h_prev_reverse = rnn_forward(torch.flip(input, [1]), weight_ih_reverse, weight_hh_reverse,bias_ih_reverse, bias_hh_reverse, h_prev_reverse)h_out_bidirectional = torch.cat([h_out, torch.flip(h_out_reverse, [1])], dim=-1)return h_out_bidirectional, torch.cat([h_prev, h_prev_reverse], dim=0)
结合前面的图片,该函数理解起来还是比较简单的。主要在于通过flip函数将input反向的操作。当然,若不调用之前写的函数,直接反向遍历重新算也行。
验证如下:
# 验证双向RNN的正确性
brnn = nn.RNN(input_size=input_size, hidden_size=hidden_size, batch_first=True, bidirectional=True)
h_prev = torch.zeros(2, bs, hidden_size)
brnn_output, brnn_state_final = brnn(input, h_prev)
print("pytorch bidirectional RNN API:")
print(brnn_output)
print(brnn_state_final)# 直接采用brnn的参数
custom_brnn_output, custom_brnn_state_final = bidirectional_rnn_forward(input, brnn.weight_ih_l0, brnn.weight_hh_l0, brnn.bias_ih_l0, brnn.bias_hh_l0, h_prev[0],brnn.weight_ih_l0_reverse, brnn.weight_hh_l0_reverse, brnn.bias_ih_l0_reverse, brnn.bias_hh_l0_reverse,h_prev[1])
print("custom bidirectional rnn forward:")
print(custom_brnn_output)
print(custom_brnn_state_final)
pytorch bidirectional RNN API:
tensor([[[ 0.5844, 0.8434, -0.6575, 0.7862, 0.2991, -0.1547],
[-0.6368, 0.8893, -0.1286, 0.1199, 0.0642, -0.1274],
[ 0.6029, 0.9831, -0.1443, 0.5292, -0.3179, 0.3351]],[[-0.0248, 0.6121, -0.4242, 0.4264, 0.2619, -0.5996],
[ 0.4378, 0.9871, -0.1419, 0.6944, -0.6108, 0.8240],
[-0.6801, 0.1686, -0.5515, 0.0044, 0.9227, -0.6906]]],
grad_fn=<TransposeBackward1>)
tensor([[[ 0.6029, 0.9831, -0.1443],
[-0.6801, 0.1686, -0.5515]],[[ 0.7862, 0.2991, -0.1547],
[ 0.4264, 0.2619, -0.5996]]], grad_fn=<StackBackward0>)
custom bidirectional rnn forward:
tensor([[[ 0.5844, 0.8434, -0.6575, 0.7862, 0.2991, -0.1547],
[-0.6368, 0.8893, -0.1286, 0.1199, 0.0642, -0.1274],
[ 0.6029, 0.9831, -0.1443, 0.5292, -0.3179, 0.3351]],[[-0.0248, 0.6121, -0.4242, 0.4264, 0.2619, -0.5996],
[ 0.4378, 0.9871, -0.1419, 0.6944, -0.6108, 0.8240],
[-0.6801, 0.1686, -0.5515, 0.0044, 0.9227, -0.6906]]],
grad_fn=<CatBackward0>)
tensor([[[ 0.6029, 0.9831, -0.1443],
[-0.6801, 0.1686, -0.5515]],[[ 0.7862, 0.2991, -0.1547],
[ 0.4264, 0.2619, -0.5996]]], grad_fn=<CatBackward0>)
复现成功。
reference:
循环神经网络(RNN, Recurrent Neural Networks)介绍-CSDN博客
RNN详解(Recurrent Neural Network)-CSDN博客