MNIST Handwritten Digit Classifier

In this project, we will develop a classifier for the MNIST handwritten digit dataset using a neural network architecture. As an added challenge, we will derive and implement the model from scratch, avoiding high-level routines provided by TensorFlow, pyTorch, Keras, etc.

Introduction

In this project, we delve into the world of neural networks by building a classifier for handwritten digits using the MNIST dataset. We will explore the architecture of neural networks, derive the mathematical foundations, and implement the model in Python.

Examples of Handwritten Digits

Below are examples of images from the MNIST dataset along with their corresponding labels.

Neural Network Architecture

We will use a simple feedforward neural network with one hidden layer. The architecture is as follows:

The network uses the sigmoid activation function and is trained using stochastic gradient descent.

Mathematical Derivation

The output of the neural network is computed as follows:

$$\begin{align*} \mathbf{z}^{(1)} &= \mathbf{W}^{(1)} \mathbf{x} + \mathbf{b}^{(1)} \\ \mathbf{a}^{(1)} &= \sigma(\mathbf{z}^{(1)}) \\ \mathbf{z}^{(2)} &= \mathbf{W}^{(2)} \mathbf{a}^{(1)} + \mathbf{b}^{(2)} \\ \mathbf{a}^{(2)} &= \sigma(\mathbf{z}^{(2)}) \end{align*}$$

Where:

The loss is calculated using the cross-entropy function, and gradients are computed for backpropagation.

Python Implementation

The following Python code demonstrates the implementation of the neural network described above.


      import numpy as np
      
      class NeuralNetwork:
          def __init__(self):
              # Initialize weights and biases
              self.W1 = np.random.randn(128, 784) * 0.01
              self.b1 = np.zeros((128, 1))
              self.W2 = np.random.randn(10, 128) * 0.01
              self.b2 = np.zeros((10, 1))
      
          def sigmoid(self, z):
              return 1 / (1 + np.exp(-z))
      
          def feedforward(self, x):
              z1 = np.dot(self.W1, x) + self.b1
              a1 = self.sigmoid(z1)
              z2 = np.dot(self.W2, a1) + self.b2
              a2 = self.sigmoid(z2)
              return a2
          

This class initializes the network parameters and defines methods for the activation function and feedforward computation.

Conclusion

We have explored the fundamentals of neural networks by building and training a model to classify handwritten digits. The concepts and implementations provided lay the groundwork for more advanced studies in deep learning.