PyTorch实战：从零构建MNIST手写数字识别模型

📅 2026/7/4 23:54:42 👁️ 阅读次数 📝 编程学习

1. 项目概述：PyTorch与MNIST的经典组合

MNIST手写数字识别堪称深度学习领域的"Hello World"，这个包含6万张训练图片和1万张测试图片的数据集，每张都是28×28像素的灰度图像。作为计算机视觉入门的最佳试金石，它具备了理想数据集的所有特质：规模适中、特征清晰、标注准确。而PyTorch作为当前最受欢迎的深度学习框架之一，其动态计算图和Pythonic的API设计，特别适合教学和快速原型开发。

这个项目我们将从零开始，用PyTorch搭建一个能够识别手写数字的神经网络。不同于直接调用现成模型，我们会一步步实现数据加载、网络构建、训练循环等核心环节，过程中你将掌握：

如何用PyTorch处理图像数据
全连接神经网络的基本原理
训练过程中的关键参数调节
模型评估与优化的实用技巧

提示：虽然现在的深度学习框架都提供了高级API，但理解底层实现原理对debug和模型优化至关重要。这也是我坚持带学员从基础开始实现的原因。

2. 环境准备与数据加载

2.1 PyTorch环境配置

推荐使用Anaconda创建独立环境：

conda create -n pytorch_mnist python=3.8 conda activate pytorch_mnist conda install pytorch torchvision torchaudio -c pytorch

验证安装：

import torch print(torch.__version__) # 应显示如1.12.1 print(torch.cuda.is_available()) # 检查GPU是否可用

2.2 MNIST数据集加载

PyTorch的torchvision已经内置了MNIST数据集加载工具：

from torchvision import datasets, transforms # 定义数据转换 transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ]) # 下载并加载数据集 train_data = datasets.MNIST( root='data', train=True, download=True, transform=transform ) test_data = datasets.MNIST( root='data', train=False, transform=transform )

这里有两个关键操作：

ToTensor()将PIL图像转换为PyTorch张量，并自动归一化到[0,1]范围
Normalize()使用MNIST的全局均值(0.1307)和标准差(0.3081)进行标准化

注意：这些统计值是针对整个MNIST数据集计算得出的，使用它们可以加速模型收敛。如果处理自己的数据集，需要重新计算。

2.3 数据批处理与可视化

创建数据加载器：

from torch.utils.data import DataLoader batch_size = 64 train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True) test_loader = DataLoader(test_data, batch_size=batch_size)

检查一批数据：

import matplotlib.pyplot as plt images, labels = next(iter(train_loader)) print(images.shape) # torch.Size([64, 1, 28, 28]) print(labels.shape) # torch.Size([64]) # 显示前16张图片 fig = plt.figure(figsize=(8,8)) for i in range(16): plt.subplot(4,4,i+1) plt.imshow(images[i][0], cmap='gray_r') plt.title(f"Label: {labels[i]}") plt.axis('off') plt.tight_layout() plt.show()

3. 神经网络模型构建

3.1 全连接网络设计

我们首先实现一个简单的全连接网络：

import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.fc1 = nn.Linear(28*28, 512) self.fc2 = nn.Linear(512, 256) self.fc3 = nn.Linear(256, 10) def forward(self, x): x = x.view(-1, 28*28) # 展平图像 x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return F.log_softmax(x, dim=1)

这个网络包含：

输入层：28×28=784个神经元
第一个隐藏层：512个神经元
第二个隐藏层：256个神经元
输出层：10个神经元（对应0-9十个数字）

经验：隐藏层神经元数量通常选择2的幂次方，这样在GPU上计算效率更高。从输入层到输出层逐渐减少神经元数量，形成"金字塔"结构是常见做法。

3.2 模型初始化与GPU加速

初始化模型并转移到GPU（如果可用）：

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = Net().to(device) print(model)

查看模型参数量：

def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad) print(f"模型可训练参数数量: {count_parameters(model):,}") # 典型输出：模型可训练参数数量: 535,050

4. 模型训练与评估

4.1 训练流程实现

定义损失函数和优化器：

from torch.optim import Adam criterion = nn.CrossEntropyLoss() optimizer = Adam(model.parameters(), lr=0.001)

训练循环：

def train(model, device, train_loader, optimizer, epoch): model.train() train_loss = 0 correct = 0 for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() optimizer.step() train_loss += loss.item() pred = output.argmax(dim=1, keepdim=True) correct += pred.eq(target.view_as(pred)).sum().item() if batch_idx % 100 == 0: print(f"训练进度: {batch_idx * len(data)}/{len(train_loader.dataset)} " f"({100. * batch_idx / len(train_loader):.0f}%)") train_loss /= len(train_loader) accuracy = 100. * correct / len(train_loader.dataset) print(f"训练集平均损失: {train_loss:.4f}, 准确率: {accuracy:.2f}%") return train_loss, accuracy

4.2 测试评估实现

测试函数：

def test(model, device, test_loader): model.eval() test_loss = 0 correct = 0 with torch.no_grad(): for data, target in test_loader: data, target = data.to(device), target.to(device) output = model(data) test_loss += criterion(output, target).item() pred = output.argmax(dim=1, keepdim=True) correct += pred.eq(target.view_as(pred)).sum().item() test_loss /= len(test_loader) accuracy = 100. * correct / len(test_loader.dataset) print(f"测试集平均损失: {test_loss:.4f}, 准确率: {accuracy:.2f}%\n") return test_loss, accuracy

4.3 完整训练过程

运行多轮训练：

epochs = 10 train_losses, test_losses = [], [] train_accs, test_accs = [], [] for epoch in range(1, epochs + 1): print(f"Epoch {epoch}:") train_loss, train_acc = train(model, device, train_loader, optimizer, epoch) test_loss, test_acc = test(model, device, test_loader) train_losses.append(train_loss) test_losses.append(test_loss) train_accs.append(train_acc) test_accs.append(test_acc)

可视化训练过程：

plt.figure(figsize=(12,5)) plt.subplot(1,2,1) plt.plot(train_losses, label='训练损失') plt.plot(test_losses, label='测试损失') plt.legend() plt.subplot(1,2,2) plt.plot(train_accs, label='训练准确率') plt.plot(test_accs, label='测试准确率') plt.legend() plt.show()

5. 模型优化与改进

5.1 学习率调整策略

添加学习率调度器：

from torch.optim.lr_scheduler import StepLR optimizer = Adam(model.parameters(), lr=0.01) scheduler = StepLR(optimizer, step_size=5, gamma=0.1)

然后在每个epoch后调用：

scheduler.step() print(f"当前学习率: {scheduler.get_last_lr()[0]}")

5.2 添加Dropout防止过拟合

修改网络结构：

class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.fc1 = nn.Linear(28*28, 512) self.drop1 = nn.Dropout(0.2) self.fc2 = nn.Linear(512, 256) self.drop2 = nn.Dropout(0.2) self.fc3 = nn.Linear(256, 10) def forward(self, x): x = x.view(-1, 28*28) x = self.drop1(F.relu(self.fc1(x))) x = self.drop2(F.relu(self.fc2(x))) x = self.fc3(x) return F.log_softmax(x, dim=1)

注意：Dropout只在训练时激活，测试时会自动关闭。PyTorch通过model.train()和model.eval()自动管理这种差异。

5.3 卷积神经网络(CNN)改进

更强大的CNN实现：

class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.conv1 = nn.Conv2d(1, 32, 3, 1) self.conv2 = nn.Conv2d(32, 64, 3, 1) self.dropout1 = nn.Dropout(0.25) self.dropout2 = nn.Dropout(0.5) self.fc1 = nn.Linear(9216, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = self.conv2(x) x = F.relu(x) x = F.max_pool2d(x, 2) x = self.dropout1(x) x = torch.flatten(x, 1) x = self.fc1(x) x = F.relu(x) x = self.dropout2(x) x = self.fc2(x) return F.log_softmax(x, dim=1)

这个CNN模型通常能达到99%以上的测试准确率，显著优于全连接网络。

6. 模型保存与应用

6.1 保存和加载模型

保存最佳模型：

torch.save(model.state_dict(), "mnist_model.pth")

加载模型：

model = Net() # 或 CNN() model.load_state_dict(torch.load("mnist_model.pth")) model.eval()

6.2 单张图片预测

实现预测函数：

def predict_image(img_path, model, device): img = Image.open(img_path).convert('L') transform = transforms.Compose([ transforms.Resize((28,28)), transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ]) img_tensor = transform(img).unsqueeze(0).to(device) with torch.no_grad(): output = model(img_tensor) pred = output.argmax(dim=1).item() plt.imshow(img, cmap='gray_r') plt.title(f"预测结果: {pred}") plt.axis('off') return pred

使用示例：

pred = predict_image("my_digit.png", model, device)

7. 常见问题与解决方案

7.1 训练不收敛的可能原因

学习率设置不当：
- 症状：损失值波动大或几乎不变
- 解决方案：尝试0.1, 0.01, 0.001等不同学习率
数据未正确归一化：
- 症状：损失下降非常缓慢
- 检查：确认输入数据在-1到1或0到1范围内
模型结构问题：
- 症状：无论怎样调整参数，准确率都很低
- 解决方案：简化网络结构或增加层数

7.2 过拟合处理技巧

增加Dropout层：
- 在隐藏层后添加Dropout(0.2~0.5)

使用L2正则化：

optimizer = Adam(model.parameters(), lr=0.001, weight_decay=1e-4)

数据增强：

transform = transforms.Compose([ transforms.RandomRotation(10), transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])

7.3 GPU内存不足问题

减小batch size：
- 从64降到32或16

使用梯度累积：

accumulation_steps = 4 for batch_idx, (data, target) in enumerate(train_loader): ... loss = loss / accumulation_steps loss.backward() if (batch_idx + 1) % accumulation_steps == 0: optimizer.step() optimizer.zero_grad()

混合精度训练：

scaler = torch.cuda.amp.GradScaler() with torch.cuda.amp.autocast(): output = model(data) loss = criterion(output, target) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update()

8. 项目扩展方向

实现更先进的网络结构：
- 尝试ResNet、EfficientNet等现代架构
- 实现注意力机制或Transformer结构
开发Web应用接口：
- 使用Flask或FastAPI创建预测API
- 构建交互式手写画板
迁移学习应用：
- 在MNIST预训练模型上微调其他数字数据集
- 实现少样本学习(Few-shot Learning)

模型量化与部署：

quantized_model = torch.quantization.quantize_dynamic( model, {nn.Linear}, dtype=torch.qint8 )

可视化工具集成：
- 使用TensorBoard记录训练过程
- 实现激活热力图可视化

在实际教学中，我发现很多学员容易陷入"调参陷阱"——不断调整超参数却忽视了对数据和模型结构的深入理解。建议新手先固定一组参数（如batch_size=64, lr=0.001），完整跑完训练流程，建立端到端的认知后再进行优化。记住：好的模型=合适的数据+恰当的结构+合理的训练，三者缺一不可。

编程学习技术分享实战经验

资讯详情

PyTorch实战：从零构建MNIST手写数字识别模型

1. 项目概述：PyTorch与MNIST的经典组合

2. 环境准备与数据加载

2.1 PyTorch环境配置

2.2 MNIST数据集加载

2.3 数据批处理与可视化

3. 神经网络模型构建

3.1 全连接网络设计

3.2 模型初始化与GPU加速

4. 模型训练与评估

4.1 训练流程实现

4.2 测试评估实现

4.3 完整训练过程

5. 模型优化与改进

5.1 学习率调整策略

5.2 添加Dropout防止过拟合

5.3 卷积神经网络(CNN)改进

6. 模型保存与应用

6.1 保存和加载模型

6.2 单张图片预测

7. 常见问题与解决方案

7.1 训练不收敛的可能原因

7.2 过拟合处理技巧

7.3 GPU内存不足问题

8. 项目扩展方向

最新新闻

日新闻

周新闻

月新闻

资讯详情

PyTorch实战：从零构建MNIST手写数字识别模型

1. 项目概述：PyTorch与MNIST的经典组合

2. 环境准备与数据加载

2.1 PyTorch环境配置

2.2 MNIST数据集加载

2.3 数据批处理与可视化

3. 神经网络模型构建

3.1 全连接网络设计

3.2 模型初始化与GPU加速

4. 模型训练与评估

4.1 训练流程实现

4.2 测试评估实现

4.3 完整训练过程

5. 模型优化与改进

5.1 学习率调整策略

5.2 添加Dropout防止过拟合

5.3 卷积神经网络(CNN)改进

6. 模型保存与应用

6.1 保存和加载模型

6.2 单张图片预测

7. 常见问题与解决方案

7.1 训练不收敛的可能原因

7.2 过拟合处理技巧

7.3 GPU内存不足问题

8. 项目扩展方向

相关新闻

最新新闻

日新闻

周新闻

月新闻