Basic Handwritten Number Recognition Network

Training a Convolutional Neural Network (CNN) on handwritten digits is a classic beginner project using the MNIST dataset. Here's a clear, minimal walkthrough using Python, PyTorch, and your CUDA-enabled GPU.

Download the files here but still read through the instructions:

0. Create a New Virtual Environment

bash

cd /home/user/directory/etc
python -m venv venv
source venv/bin/activate

1. Install Dependencies

Ensure Python is installed (>=3.8). Then, install PyTorch with CUDA support:

bash

pip install torch torchvision matplotlib

This little maneuver is going to cost you 5GiB++

Check if GPU is detected in Python:

python

import torch
print(torch.cuda.is_available())  

This should print 'True'

2. Load the MNIST Dataset

This is a publicly available dataset and will be loaded when needed, it does not have to be download separately in preparation.

python

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([transforms.ToTensor()])

train_set = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_set = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
test_loader = DataLoader(test_set, batch_size=1000, shuffle=False)

3. Define a Simple CNN

python

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(1, 16, 3, 1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(16, 32, 3, 1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(5*5*32, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )

def forward(self, x):
    return self.net(x)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CNN().to(device)

4. Train the Model

python

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(5):  # 5 epochs is enough for MNIST
    model.train()
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        output = model(images)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1} complete")

5. Evaluate Accuracy

python

correct = 0
total = 0
model.eval()
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")

6. Running the Entire Thing

If you are cutting and pasting, you may be getting some indentation and other errors. In this case, download the complete file here and run using the commands below.

  • create some images of handwritten numbers
  • move them into the mnist directory
  • the script is looking for an image named number.png

bash

cd /home/user/directory/etc
python mnist_train_and_test.py

You should start seeing the process run through 500 epochs, and output a test accuracy. (This number can be adjusted on line 49)

7. Proving the Principle Again

At the end of the above test, you should see a prediction of the number in the picture along with an accuracy certainty. The script also saved the model as mnist_test_net.pth so it can be reused for number prediction for other pictures. Try your other examples from the previous step.

Supplied is a script that will load up a selected model and selected image and return a result.