PyTorch 常见问题整理

2020年6月2日

Table of Contents

最近刚刚开始从 Keras 换成 PyTorch，在使用过程中可能会遇到一些常见的问题，做一些整理。

1 Loss 为 NaN

可以在 python 文件头部使用如下函数打开 nan 检查：

torch.autograd.set_detect_anomaly(True)

1	torch.autograd.set_detect_anomaly(True)

如果遇到了 nan 的 Tensor，它会抛出异常。幸运的话它会告诉你 nan 产生的位置。比如说我遇到过：

RuntimeError: Function 'SmoothL1LossBackward' returned nan values in its 0th output.

1	RuntimeError: Function 'SmoothL1LossBackward' returned nan values in its 0th output.

有些时候，往往会遇到比如 Adam 就没有 nan 而 SGD 就会出现 nan，这种通常都是 Loss 设得太大，可以调低学习率试试。

其他可能产生 nan 的地方可以尝试定位下：
1、脏数据，输入有 NaN
2、设置 clip gradient
3、更换初始化参数方法
4、log函数输入为0。对于这种可以考虑在 log 时加上一个小量保证不产生 NaN，例如： torch.log(inputs + 1e-6)

补充：如果上述设置无法准确给出 NaN 的位置，可以做如下检查：
在 optimizer.step() 之前检查所有梯度是否出现 NaN

for name, param in model.named_parameters():
    print(name, torch.isfinite(param.grad).all())

1 2	for name, param in model.named_parameters(): print(name, torch.isfinite(param.grad).all())

2 正确测试模型运行时间

如果是为了测试模型的前向运算运行时间，需要设置 model 为评估模式：

model.eval()

1	model.eval()

同时在 GPU 上测速时需要使用 torch.cuda.synchronize() 同步 CUDA 操作：

torch.cuda.synchronize()
start = time.time()
result = model(input)
torch.cuda.synchronize()
end = time.time()

torch.cuda.synchronize()

start = time.time()

result = model(input)

torch.cuda.synchronize()

end = time.time()

3 参数初始化

在一些任务中，如果不是使用已有训练参数而是从 0 开始训练一个空白的网络，进行参数的初始化（例如 Conv2D）会有利于加快模型的收敛，例如下面参数初始化方式是（通常可以放在 model 的 init 函数结尾）：

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

# weight initialization

for m in self.modules():

if isinstance(m, nn.Conv2d):

nn.init.kaiming_normal_(m.weight, mode='fan_out')

if m.bias is not None:

nn.init.zeros_(m.bias)

elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):

nn.init.ones_(m.weight)

nn.init.zeros_(m.bias)

elif isinstance(m, nn.Linear):

nn.init.normal_(m.weight, 0, 0.01)

nn.init.zeros_(m.bias)

4 获取 torchvision 中某一层的输出

工程实践中经常用 torchvision 预训练参数然后提取其中部分层进行修改。这里面可以有两种方式：
第一种，直接 copy 全部的代码，然后根据自身需要输出中间层：
例如对于 shufflenetv2 代码可以这样修改返回你需要的层（_forward_impl 是原始的，_forward_impl_with_layers 是修改的）：

    def _forward_impl(self, x):
        # See note [TorchScript super()]
        x = self.conv1(x)
        x = self.maxpool(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)
        x = self.conv5(x)
        x = x.mean([2, 3])  # globalpool
        x = self.fc(x)
        return x

    def _forward_impl_with_layers(self, x):
        # See note [TorchScript super()]
        layer1 = self.conv1(x)
        layer2 = self.maxpool(layer1)
        layer3 = self.stage2(layer2)
        layer4 = self.stage3(layer3)
        layer5 = self.stage4(layer4)
        x = self.conv5(layer5)
        x = x.mean([2, 3])  # globalpool
        x = self.fc(x)
        return layer1, layer2, layer3, layer4, layer5, x

    def forward(self, x):
        return self._forward_impl_with_layers(x)

def _forward_impl(self, x):

# See note [TorchScript super()]

x = self.conv1(x)

x = self.maxpool(x)

x = self.stage2(x)

x = self.stage3(x)

x = self.stage4(x)

x = self.conv5(x)

x = x.mean([2, 3]) # globalpool

x = self.fc(x)

return x

def _forward_impl_with_layers(self, x):

# See note [TorchScript super()]

layer1 = self.conv1(x)

layer2 = self.maxpool(layer1)

layer3 = self.stage2(layer2)

layer4 = self.stage3(layer3)

layer5 = self.stage4(layer4)

x = self.conv5(layer5)

x = x.mean([2, 3]) # globalpool

x = self.fc(x)

return layer1, layer2, layer3, layer4, layer5, x

def forward(self, x):

return self._forward_impl_with_layers(x)

另外一种方法不下载代码直接调用 torchvision 中的层，这个可能需要分析每个代码的实现才能知道想要的层，比如这样打印：

import torchvision
model = models.shufflenet_v2_x0_5(pretrained=True)
print('model = {}'.format(model))

import torchvision

model = models.shufflenet_v2_x0_5(pretrained=True)

print('model = {}'.format(model))

打印结果类似：

model = ShuffleNetV2(
  (conv1): Sequential(
    (0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (stage2): Sequential(
    (0): InvertedResidual(
      (branch1): Sequential(
        (0): Conv2d(24, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=24, bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (4): ReLU(inplace=True)
      )
      (branch2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=24, bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (1): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (2): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (3): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
  )
  (stage3): Sequential(
    (0): InvertedResidual(
      (branch1): Sequential(
        (0): Conv2d(48, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=48, bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (4): ReLU(inplace=True)
      )
      (branch2): Sequential(
        (0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=48, bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (1): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (2): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (3): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (4): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (5): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (6): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (7): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
  )
  (stage4): Sequential(
    (0): InvertedResidual(
      (branch1): Sequential(
        (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (4): ReLU(inplace=True)
      )
      (branch2): Sequential(
        (0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (1): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (2): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
    (3): InvertedResidual(
      (branch1): Sequential()
      (branch2): Sequential(
        (0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (7): ReLU(inplace=True)
      )
    )
  )
  (conv5): Sequential(
    (0): Conv2d(192, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (fc): Linear(in_features=1024, out_features=1000, bias=True)
)

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

model = ShuffleNetV2(

(conv1): Sequential(

(0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

)

(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)

(stage2): Sequential(

(0): InvertedResidual(

(branch1): Sequential(

(0): Conv2d(24, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=24, bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(4): ReLU(inplace=True)

)

(branch2): Sequential(

(0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(24, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=24, bias=False)

(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(1): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)

(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(2): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)

(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(3): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)

(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(stage3): Sequential(

(0): InvertedResidual(

(branch1): Sequential(

(0): Conv2d(48, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=48, bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(3): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(4): ReLU(inplace=True)

)

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(1): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(2): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(3): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(4): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(5): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(6): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(7): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(stage4): Sequential(

(0): InvertedResidual(

(branch1): Sequential(

(0): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(4): ReLU(inplace=True)

)

(branch2): Sequential(

(0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)

(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(1): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)

(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(2): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)

(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(3): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)

(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

(conv5): Sequential(

(0): Conv2d(192, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

)

(fc): Linear(in_features=1024, out_features=1000, bias=True)

)

比如获得 conv1 层输出就是

model.conv1

1	model.conv1

5 修正 The NVIDIA driver on your system is too old 错误

有时在你安装某一个版本的 PyTorch （比如 1.5.0）时会出现如下错误提示：

The NVIDIA driver on your system is too old (found version 10000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

The NVIDIA driver on your system is too old (found version 10000).

Please update your GPU driver by downloading and installing a new

version from the URL: http://www.nvidia.com/Download/index.aspx

Alternatively, go to: https://pytorch.org to install

a PyTorch version that has been compiled with your version

of the CUDA driver.

在安装 PyTorch 的时候往往会指定相应的 CUDA 版本，这个错误的意思可能是你没有安装特定版本的 CUDA 或者你的 CUDA 版本与你的 GPU Driver 版本不匹配。
在 Nvidia 官网中给了我们如下的版本匹配：https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

如果需要升级，你可以使用如下方式升级：
1）增加软件源：

sudo add-apt-repository ppa:graphics-drivers/ppa && sudo apt update

1	sudo add-apt-repository ppa:graphics-drivers/ppa && sudo apt update

2）查看可以使用的版本：

ubuntu-drivers devices

1	ubuntu-drivers devices

例如我这里查询结果是：

3）升级指定版本（根据上面表格找到合适的版本升级）：

sudo apt install nvidia-VERSION_NUMBER_HERE

1	sudo apt install nvidia-VERSION_NUMBER_HERE

如果出现某些冲突问题可以尝试先卸载再安装：

sudo apt --purge autoremove nvidia*

1	sudo apt --purge autoremove nvidia*

PS：另外一种方式你也可以先不升级指定版本，先使用如下命令查看本地 CUDA 版本：

nvcc --version

1	nvcc --version

比如我这里显示的就是：

那么我就应该安装支持 CUDA 10.0 的版本。可能 PyTorch 1.5 就不可用了，但是 PyTorch 1.4 还是可以的，可以使用如下命令安装：

pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

1	pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

具体什么版本支持可以参考：
https://download.pytorch.org/whl/torch_stable.html
这个页面。

PS：其他常用命令：
查看 GPU 型号：

lspci | grep -i nvidia

1	lspci \| grep -i nvidia

查看驱动版本：

cat /proc/driver/nvidia/version

1	cat /proc/driver/nvidia/version

查看 PyTorch 所用 CUDA 版本，在 PyTorch 环境中运行如下脚本：

import torch
print('torch.__version__ = {}'.format(torch.__version__))
print('torch.version.cuda = {}'.format(torch.version.cuda))
print('torch.cuda.is_available() = {}'.format(torch.cuda.is_available()))

import torch

print('torch.__version__ = {}'.format(torch.__version__))

print('torch.version.cuda = {}'.format(torch.version.cuda))

print('torch.cuda.is_available() = {}'.format(torch.cuda.is_available()))

6 修正 Expected more than 1 value per channel when training 错误

如果在训练时遇到如下错误：

  File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/contai
ner.py", line 100, in forward
    input = module(input)
  File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/module
.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/batchn
orm.py", line 107, in forward
    exponential_average_factor, self.eps)
  File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/functional.py"
, line 1666, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got 
input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size
 torch.Size([1, 32, 1])

File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/contai

ner.py", line 100, in forward

input = module(input)

File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/module

.py", line 532, in __call__

result = self.forward(*input, **kwargs)

File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/batchn

orm.py", line 107, in forward

exponential_average_factor, self.eps)

File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/functional.py"

, line 1666, in batch_norm

raise ValueError('Expected more than 1 value per channel when training, got

input size {}'.format(size))

ValueError: Expected more than 1 value per channel when training, got input size

torch.Size([1, 32, 1])

一个可能的原因是出现了输入 batch_size = 1 的情况，这时可以考虑在 DataLoader 属性加上 drop_last=True 解决，它会抛弃掉不够一个 batch size 的情况。例如：

train_loader = torch.utils.data.DataLoader(dataset=train_set, shuffle=False, batch_size=opt.batch_size,
                                               drop_last=True)

1 2	train_loader = torch.utils.data.DataLoader(dataset=train_set, shuffle=False, batch_size=opt.batch_size, drop_last=True)

如果实在无法避免或者就需要 batch_size = 1 的训练方式，还可以考虑把网络中的 BatchNorm 换成 InstanceNorm。

7 修正 Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead 错误

如果获取变量值时，遇到下面错误：

RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

1	RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

这里面通常有两种情况：
一种是这个变量是含有训练参数的，需要反向传播，则使用 var.detach().numpy() 获取。
另一种如果这个变量是不进行训练的不需要反向传播，则将相关的代码用如下方式（with torch.no_grad()）修饰即可：

    with torch.no_grad():
        your code here

1 2	with torch.no_grad(): your code here

8 修正 RuntimeError: error executing torch_shm_manager 错误

如果在运行多线程训练时出现类似如下错误：

RuntimeError: error executing torch_shm_manager at "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/bin/torch_shm_manager" at /pytorch/torch/lib/libshm/core.cpp:99
torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory
torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory

RuntimeError: error executing torch_shm_manager at "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/bin/torch_shm_manager" at /pytorch/torch/lib/libshm/core.cpp:99

torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory

可能的解决方法是注释掉如下设置（如果有的话）：

# torch.multiprocessing.set_sharing_strategy('file_system')

1	# torch.multiprocessing.set_sharing_strategy('file_system')

9 修正 RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad 错误

如果在运行多线程训练时出现类似如下错误：

Cowardly refusing to serialize non-leaf tensor which requires_grad, 
since autograd does not support crossing process boundaries.  
If you just want to transfer the data, call detach() on the tensor 
before serializing (e.g., putting it on the queue).

Cowardly refusing to serialize non-leaf tensor which requires_grad,

since autograd does not support crossing process boundaries.

If you just want to transfer the data, call detach() on the tensor

before serializing (e.g., putting it on the queue).

我们看下相关报错的函数是这样的：

def reduce_tensor(tensor):
    storage = tensor.storage()

    if tensor.requires_grad and not tensor.is_leaf:
        raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
                           "since autograd does not support crossing process boundaries.  "
                           "If you just want to transfer the data, call detach() on the tensor "
                           "before serializing (e.g., putting it on the queue).")

    check_serializing_named_tensor(tensor)
    torch.utils.hooks.warn_if_has_hooks(tensor)

def reduce_tensor(tensor):

storage = tensor.storage()

if tensor.requires_grad and not tensor.is_leaf:

raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "

"since autograd does not support crossing process boundaries. "

"If you just want to transfer the data, call detach() on the tensor "

"before serializing (e.g., putting it on the queue).")

check_serializing_named_tensor(tensor)

torch.utils.hooks.warn_if_has_hooks(tensor)

经过分析我这里的发生的原因是在多线程 DataLoader 中使用了一个模型生成数据，然而这个模型的参数有一部分却是 requires_grad = True 属性的。
可以采用如下方式处理模型让生成的 Tensor 都为 no_grad：

        # No need to backward use eval()
        # Use to fix RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad
        for param in self.superpoint.parameters():
            param.requires_grad = False
        self.superpoint.eval()

# No need to backward use eval()

# Use to fix RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad

for param in self.superpoint.parameters():

param.requires_grad = False

self.superpoint.eval()

10 修正多线程 DataLoade rnumpy random 不变错误

由于 numpy 中的 random 不是 thread safe 的，因此在多线程中，其不同线程的 random 无法生成不同的随机数，需要每个线程重新设置 random.seed 才可以。因此对于 DataLoader 在 num_workers > 0 时就可能产生问题（比如需要每次生成不同的随机数据）。对于此问题有几种修改方式：

第一种
利用 worker_init_fn 每个线程重新设置种子，示例代码如下：

ds = DataLoader(ds, 10, shuffle=False, num_workers=4, worker_init_fn=lambda _: np.random.seed())

1	ds = DataLoader(ds, 10, shuffle=False, num_workers=4, worker_init_fn=lambda _: np.random.seed())

第二种
在文件开头加上下面两行设置：

import torch.multiprocessing as mp
mp.set_start_method('spawn')

1 2	import torch.multiprocessing as mp mp.set_start_method('spawn')

11 使用 PyCharm 行调试 PyTorch 项目时遇到 "KeyboardInterrupt"

如果只是 Debug 而不是 Run 的时候出现，此类问题是由于在 PyCharm 中开启了调试子线程的功能，在 File->Settings->Building, Execution, Deployment->Python Debugger 中，将 Attach to subprocess automatically while debugging关闭即可。如图所示：

12 修正 RuntimeError: CUDA error: no kernel image is available for execution on the device 错误

如果在运行 PyTorch 时出现这一次错误，一个可能的原因是你的显卡已经不被高版本的 PyTorch 所支持。
比如在最近的更新中 PyTorch 1.3.1 及以后版本的显卡支持已经升级为 Compute Capability >= 3.7，完整的各种设备支持的 Compute Capability 列表如下：
https://developer.nvidia.com/cuda-gpus

GPU	Compute Capability
NVIDIA TITAN RTX	7.5
Geforce RTX 2080 Ti	7.5
Geforce RTX 2080	7.5
Geforce RTX 2070	7.5
Geforce RTX 2060	7.5
NVIDIA TITAN V	7
NVIDIA TITAN Xp	6.1
NVIDIA TITAN X	6.1
GeForce GTX 1080 Ti	6.1
GeForce GTX 1080	6.1
GeForce GTX 1070	6.1
GeForce GTX 1060	6.1
GeForce GTX 1050	6.1
GeForce GTX TITAN X	5.2
GeForce GTX TITAN Z	3.5
GeForce GTX TITAN Black	3.5
GeForce GTX TITAN	3.5
GeForce GTX 980 Ti	5.2
GeForce GTX 980	5.2
GeForce GTX 970	5.2
GeForce GTX 960	5.2
GeForce GTX 950	5.2
GeForce GTX 780 Ti	3.5
GeForce GTX 780	3.5
GeForce GTX 770	3
GeForce GTX 760	3
GeForce GTX 750 Ti	5
GeForce GTX 750	5
GeForce GTX 690	3
GeForce GTX 680	3
GeForce GTX 670	3
GeForce GTX 660 Ti	3
GeForce GTX 660	3
GeForce GTX 650 Ti BOOST	3
GeForce GTX 650 Ti	3
GeForce GTX 650	3
GeForce GTX 560 Ti	2.1
GeForce GTX 550 Ti	2.1
GeForce GTX 460	2.1
GeForce GTS 450	2.1
GeForce GTS 450*	2.1
GeForce GTX 590	2
GeForce GTX 580	2
GeForce GTX 570	2
GeForce GTX 480	2
GeForce GTX 470	2
GeForce GTX 465	2
GeForce GT 740	3
GeForce GT 730	3.5
GeForce GT 730 DDR3,128bit	2.1
GeForce GT 720	3.5
GeForce GT 705*	3.5
GeForce GT 640 (GDDR5)	3.5
GeForce GT 640 (GDDR3)	2.1
GeForce GT 630	2.1
GeForce GT 620	2.1
GeForce GT 610	2.1
GeForce GT 520	2.1
GeForce GT 440	2.1
GeForce GT 440*	2.1
GeForce GT 430	2.1
GeForce GT 430*	2.1

解决方法有两种：
1）最简单的解决方法是降级成早期版本，比如 Pytorch 1.2：

conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch

1	conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch

参见：
https://pytorch.org/get-started/previous-versions/

2）如果一定要使用新版，则需要使用从 Source Build 的方式安装：
https://github.com/pytorch/pytorch#from-source

13 注意：使用 DataParallel 训练时只有 Tensor 被拆分到不同 GPU上

这是一个常见的容易忽略的问题，比如你的数据中有一部分使用了 Tensor 有一部分使用了 list 等其他数据类型时，PyTorch 在 DataParallel 中只能自动将 Tensor 类型的数据进行拆分，这一点官网解释得很清楚：

Arbitrary positional and keyword inputs are allowed to be passed into DataParallel but some types are specially handled. tensors will be scattered on dim specified (default 0). tuple, list and dict types will be shallow copied. The other types will be shared among different threads and can be corrupted if written to in the model’s forward pass.
参见 https://pytorch.org/docs/master/generated/torch.nn.DataParallel.html

因此对于需要进行并行化拆分到不同 GPU 的数据，其类型都应该是 Tensor 类型。但如果由于某些原因（比如变长），你就只能使用 list 等类型怎么办呢？一个可以尝试的方法是在数据生成时候增加一个 index 的 Tensor 例如在 Dataset 中增加：

indexes = torch.arange(0, batch_size, requires_grad=False)

1	indexes = torch.arange(0, batch_size, requires_grad=False)

这样在 GPU 切片时 indexes 也被相应切片。比如你获得的数据中有一个 list 类型的变量 all_matches，那么你就可以这样获取当前 GPU 切片对应的 all_matches 数据了：

batch_size = indexes.shape[0]
indexes_numpy = indexes.cpu().detach().numpy()

for idx in range(batch_size):
	batch_idx = indexes_numpy[idx]
	for i in range(len(all_matches[batch_idx])):
	    one_match_batch = all_matches[batch_idx]

batch_size = indexes.shape[0]

indexes_numpy = indexes.cpu().detach().numpy()

for idx in range(batch_size):

batch_idx = indexes_numpy[idx]

for i in range(len(all_matches[batch_idx])):

one_match_batch = all_matches[batch_idx]

14 修正 DistributedDataParallel 遇到不均衡输入时出现卡死问题

在使用 DistributedDataParallel 时，如果因为种种原因造成每个 Node 获得的数据大小不同，可能会出现卡死问题。比如下面的代码就是一个很简单的复现问题：

import torch
import torch.distributed as dist
import os
import torch.multiprocessing as mp
import torch.nn as nn

print(torch.__version__)

def worker(rank):
    dist.init_process_group("nccl", rank=rank, world_size=2)
    torch.cuda.set_device(rank)
    model = nn.Linear(1, 1, bias=False).to(rank)
    model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[rank], output_device=rank)
    # Create uneven inputs, rank 1 will get one more input than rank 0. This will cause a hang.
    inputs = [torch.tensor([1]).float() for _ in range(10 + rank)]
    # inputs = [torch.tensor([1]).float() for _ in range(10)]

    for _ in range(5):
        for inp in inputs:
            loss = model(inp).sum()
            loss.backward()
    torch.cuda.synchronize(device=rank)

if __name__ == '__main__':
    os.environ["MASTER_ADDR"] = "localhost" ; os.environ["MASTER_PORT"] = "29501"
    mp.spawn(worker, nprocs=2, args=())

import torch

import torch.distributed as dist

import os

import torch.multiprocessing as mp

import torch.nn as nn

print(torch.__version__)

def worker(rank):

dist.init_process_group("nccl", rank=rank, world_size=2)

torch.cuda.set_device(rank)

model = nn.Linear(1, 1, bias=False).to(rank)

model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[rank], output_device=rank)

# Create uneven inputs, rank 1 will get one more input than rank 0. This will cause a hang.

inputs = [torch.tensor([1]).float() for _ in range(10 + rank)]

# inputs = [torch.tensor([1]).float() for _ in range(10)]

for _ in range(5):

for inp in inputs:

loss = model(inp).sum()

loss.backward()

torch.cuda.synchronize(device=rank)

if __name__ == '__main__':

os.environ["MASTER_ADDR"] = "localhost" ; os.environ["MASTER_PORT"] = "29501"

mp.spawn(worker, nprocs=2, args=())

运行这段代码正常情况下会出现一直卡住无法结束的问题。

解决方法也很简单，在 PyTorch 1.7.0 及以上版本中增加了 module.join() 接口，可以使用 with modle.join(): 一行代码即可：

import torch
import torch.distributed as dist
import os
import torch.multiprocessing as mp
import torch.nn as nn

print(torch.__version__)

def worker(rank):
    dist.init_process_group("nccl", rank=rank, world_size=2)
    torch.cuda.set_device(rank)
    model = nn.Linear(1, 1, bias=False).to(rank)
    model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[rank], output_device=rank)
    # Create uneven inputs, rank 1 will get one more input than rank 0. This will cause a hang.
    inputs = [torch.tensor([1]).float() for _ in range(10 + rank)]
    # inputs = [torch.tensor([1]).float() for _ in range(10)]
    with model.join():
        for _ in range(5):
            for inp in inputs:
                loss = model(inp).sum()
                loss.backward()
        torch.cuda.synchronize(device=rank)

if __name__ == '__main__':
    os.environ["MASTER_ADDR"] = "localhost" ; os.environ["MASTER_PORT"] = "29501"
    mp.spawn(worker, nprocs=2, args=())

import torch

import torch.distributed as dist

import os

import torch.multiprocessing as mp

import torch.nn as nn

print(torch.__version__)

def worker(rank):

dist.init_process_group("nccl", rank=rank, world_size=2)

torch.cuda.set_device(rank)

model = nn.Linear(1, 1, bias=False).to(rank)

model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[rank], output_device=rank)

# Create uneven inputs, rank 1 will get one more input than rank 0. This will cause a hang.

inputs = [torch.tensor([1]).float() for _ in range(10 + rank)]

# inputs = [torch.tensor([1]).float() for _ in range(10)]

with model.join():

for _ in range(5):

for inp in inputs:

loss = model(inp).sum()

loss.backward()

torch.cuda.synchronize(device=rank)

if __name__ == '__main__':

os.environ["MASTER_ADDR"] = "localhost" ; os.environ["MASTER_PORT"] = "29501"

mp.spawn(worker, nprocs=2, args=())

参考 [8]、[9]

15 错误：RuntimeError: unable to write to file

如果运行时报出如下错误：

1  File "/home/liuxiao/anaconda3/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/liuxiao/anaconda3/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/liuxiao/anaconda3/envs/pytorch_env/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 321, in reduce_storage
    fd, size = storage._share_fd_()
RuntimeError: unable to write to file </torch_6121_2934860989>

1 File "/home/liuxiao/anaconda3/lib/python3.7/multiprocessing/queues.py", line 236, in _feed

obj = _ForkingPickler.dumps(obj)

File "/home/liuxiao/anaconda3/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps

cls(buf, protocol).dump(obj)

File "/home/liuxiao/anaconda3/envs/pytorch_env/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 321, in reduce_storage

fd, size = storage._share_fd_()

RuntimeError: unable to write to file </torch_6121_2934860989>

由于 PyTorch 默认将共享文件保存在 /torch_xxx 目录中，如果磁盘空间不足很可能出现上述错误，一个解决方法是关闭 shared_memory，在运行的训练文件头部增加：

import sys
import torch
from torch.utils.data import dataloader
from torch.multiprocessing import reductions
from multiprocessing.reduction import ForkingPickler

default_collate_func = dataloader.default_collate


def default_collate_override(batch):
  dataloader._use_shared_memory = False
  return default_collate_func(batch)

setattr(dataloader, 'default_collate', default_collate_override)

for t in torch._storage_classes:
  if sys.version_info[0] == 2:
    if t in ForkingPickler.dispatch:
        del ForkingPickler.dispatch[t]
  else:
    if t in ForkingPickler._extra_reducers:
        del ForkingPickler._extra_reducers[t]

import sys

import torch

from torch.utils.data import dataloader

from torch.multiprocessing import reductions

from multiprocessing.reduction import ForkingPickler

default_collate_func = dataloader.default_collate

def default_collate_override(batch):

dataloader._use_shared_memory = False

return default_collate_func(batch)

setattr(dataloader, 'default_collate', default_collate_override)

for t in torch._storage_classes:

if sys.version_info[0] == 2:

if t in ForkingPickler.dispatch:

del ForkingPickler.dispatch[t]

else:

if t in ForkingPickler._extra_reducers:

del ForkingPickler._extra_reducers[t]

参考 [10]

16 错误：RuntimeError: view size is not compatible with input tensor’s size and stride

如果运行时报出如下错误：

RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.

1	RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.

按照提示通常可以如下修改，将类似：

pts2d[i].clone().view(-1)

1	pts2d[i].clone().view(-1)

加入 .contiguous() 修改成：

pts2d[i].clone().contiguous().view(-1)

1	pts2d[i].clone().contiguous().view(-1)

17 Load CUDA 模型到 CPU 上

可能你的模型是 CUDA 训练的保存为 ckpt，但是预测时候想在 CPU 上预测，这时可能会遇到如下错误：

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

1	RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

解决方法就在错误中给出了，只需要用如下方式 Load 模型即可：

torch.load("your_weights.ckpt",map_location='cpu')

1	torch.load("your_weights.ckpt",map_location='cpu')

18 检查 GPU 是否可用

创建如下代码保存为 test_pytorch_gpu.py 并运行可以检查 PyTorch 是否可用 GPU：

# Import PyTorch
import torch

# How many GPUs are there?
print(torch.cuda.device_count())

# Which GPU Is The Current GPU?
print(torch.cuda.current_device())

# Get the name of the current GPU
print(torch.cuda.get_device_name(torch.cuda.current_device()))

# Is PyTorch using a GPU?
print(torch.cuda.is_available())

# Import PyTorch

import torch

# How many GPUs are there?

print(torch.cuda.device_count())

# Which GPU Is The Current GPU?

print(torch.cuda.current_device())

# Get the name of the current GPU

print(torch.cuda.get_device_name(torch.cuda.current_device()))

# Is PyTorch using a GPU?

print(torch.cuda.is_available())

如果正确运行的话显示结果类似：

参考文献

[1] https://blog.csdn.net/weixin_41278720/article/details/80778640
[2] https://discuss.pytorch.org/t/getting-nan-after-first-iteration-with-custom-loss/25929/14
[3] https://zllrunning.github.io/2018/03/24/20180324/
[4] https://github.com/MVIG-SJTU/AlphaPose/issues/402
[5] https://github.com/pytorch/pytorch/issues/5059
[6] https://blog.csdn.net/Nin7a/article/details/104138036
[7] https://blog.csdn.net/sinat_33425327/article/details/84823272
[8] https://gist.github.com/rohan-varma/3906e7f07669f0177801a9f753848550
[9] https://github.com/pytorch/pytorch/issues/38174
[10] https://blog.csdn.net/u012796629/article/details/105936386

About The Author

skylook

增强现实、图像识别技术爱好者。

技术刘

PyTorch 常见问题整理

1 Loss 为 NaN

2 正确测试模型运行时间

3 参数初始化

4 获取 torchvision 中某一层的输出

5 修正 The NVIDIA driver on your system is too old 错误

6 修正 Expected more than 1 value per channel when training 错误

7 修正 Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead 错误

8 修正 RuntimeError: error executing torch_shm_manager 错误

9 修正 RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad 错误

10 修正多线程 DataLoade rnumpy random 不变错误

11 使用 PyCharm 行调试 PyTorch 项目时遇到 "KeyboardInterrupt"

12 修正 RuntimeError: CUDA error: no kernel image is available for execution on the device 错误

13 注意：使用 DataParallel 训练时只有 Tensor 被拆分到不同 GPU上

14 修正 DistributedDataParallel 遇到不均衡输入时出现卡死问题

15 错误：RuntimeError: unable to write to file

16 错误：RuntimeError: view size is not compatible with input tensor’s size and stride

17 Load CUDA 模型到 CPU 上

18 检查 GPU 是否可用

参考文献

About The Author

skylook

Add a Comment

1 Loss 为 NaN

2 正确测试模型运行时间

3 参数初始化

4 获取 torchvision 中某一层的输出

5 修正 The NVIDIA driver on your system is too old 错误

6 修正 Expected more than 1 value per channel when training 错误

7 修正 Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead 错误

8 修正 RuntimeError: error executing torch_shm_manager 错误

9 修正 RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad 错误

10 修正多线程 DataLoade rnumpy random 不变错误

11 使用 PyCharm 行调试 PyTorch 项目时遇到 "KeyboardInterrupt"

12 修正 RuntimeError: CUDA error: no kernel image is available for execution on the device 错误

13 注意：使用 DataParallel 训练时只有 Tensor 被拆分到不同 GPU上

14 修正 DistributedDataParallel 遇到不均衡输入时出现卡死问题

15 错误：RuntimeError: unable to write to file

16 错误：RuntimeError: view size is not compatible with input tensor’s size and stride

17 Load CUDA 模型到 CPU 上

18 检查 GPU 是否可用

参考文献

Related Posts

About The Author

skylook

Add a Comment