Implement a matrix based Dense Neural Network for character recognition on the MNIST dataset. The initial architecture you will program is shown below.
a) Show the matrix equations for the forward pass i..e., for computing S1, a1, S2 and a2
b) Show the matrix equations (partial derivatives) for the backpropagation algorithm i.e., for computing:
δ2, δ1, ∇𝑤2, ∇𝑏2, ∇𝑤1, ∇𝑏1
c) Program the backpropagation algorithm for training and testing on the MNIST dataset. The dataset is provided to you on the CPEG 586 web site. You need to use only 1000 images for training and all 10000 images for testing. Once you download the data folder, there will be three subfolders in it called Training1000, Test10000, and TrainingAll60000.
The Python code for reading the training and testing images appears as (you will change the folder names in the code below to match your folder where you unzipped the data):
Note: np.dot does a dot product (or inner product) if the two arguments are arrays, but it does a matrix multiplication if the arguments are multi-dimensional arrays. Similarly, np.multiply does an element by element multiplication, but the regular * also does an element by element multiplication on arrays or vectors, so you do not have to use np.multiply.
import os
import sys
import cv2
import numpy as np
from sklearn.utils import shuffle
train = np.empty((1000,28,28),dtype='float64')
trainY = np.zeros((1000,10,1))
test = np.empty((10000,28,28),dtype='float64')
testY = np.zeros((10000,10,1))
# Load in the images
i = 0
for filename in os.listdir('D:/Data/Training1000/'):
y = int(filename[0])
trainY[i,y] = 1.0
train[i] = cv2.imread('D:/Data/Training1000/{0}'.format(filename),0)/255.0
#for color, use 1
i = i + 1
i = 0 # read test data
for filename in os.listdir('D:/Data/Test10000'):
y = int(filename[0])
testY[i,y] = 1.0
test[i] = cv2.imread('D:/Data/Test10000/{0}'.format(filename),0)/255.0
i = i + 1
trainX = train.reshape(train.shape[0],train.shape[1]*train.shape[2],1)
testX = test.reshape(test.shape[0],test.shape[1]*test.shape[2],1)
Try to program the training and testing code yourself. If you are having difficulty, then use the following skeleton code:
numNeuronsLayer1 = 100
numNeuronsLayer2 = 10
numEpochs = 100
#---------------------NN------------------------
w1 = np.random.uniform(low=-0.1,high=0.1,size=(numNeuronsLayer1,784))
b1 = np.random.uniform(low=-1,high=1,size=(numNeuronsLayer1,1))
w2 = np.random.uniform(low=- 0.1,high=0.1,size
(numNeuronsLayer2,numNeuronsLayer1))
b2 = np.random.uniform(low=-0.1,high=0.1,size=(numNeuronsLayer2,1))
learningRate = 0.1;
for n in range(0,numEpochs):
loss = 0
trainX,trainY = shuffle(trainX, trainY)
# shuffle data for stochastic behavior
for i in range(trainX.shape[0]):
# do forward pass
# your equations for the forward pass
# do backprop and compute the gradients * also works instead
# np.multiply
loss += (0.5 * ((a2-trainY[i])*(a2-trainY[i]))).sum()
# loss += (0.5 * np.multiply((a2-trainY[i]),(a2-trainY[i]))).sum()
# your equations for computing the deltas and the gradients
# adjust the weights
w2 = w2 - learningRate * gradw2
b2 = b2 - learningRate * gradb2
w1 = w1 - learningRate * gradw1
b1 = b1 - learningRate * gradb1
print("epoch = " + str(n) + " loss = " + (str(loss)))
print("done training , starting testing..")
accuracyCount = 0
for i in range(testY.shape[0]):
# do forward pass
s1 = np.dot(w1,testX[i]) + b1
a1 = 1/(1+np.exp(-1*s1))
# np.exp operates on the array
s2 = np.dot(w2,a1) + b2
a2 = 1/(1+np.exp(-1*s2))
# determine index of maximum output value
a2index = a2.argmax(axis = 0)
if (testY[i,a2index] == 1):
accuracyCount = accuracyCount + 1
print("Accuracy count = " + str(accuracyCount/10000.0))
d) The code above implements the Stochastic Gradient Descent (SGD) algorithm, i.e., the weights and biases are updated after each input pass. Implement the mini Batch Gradient Descent where after a specified batch size e.g., 10, the accumulated gradients are used to update the weights and biases.
e) Compare the performance of the mini-batch SGD, with the SGD and see which one produces better accuracy for different hidden layer Network sizes for number of Epochs equaling 25, 50, 100 and 150, and number of Neurons in the hidden layer to be 25, 50, 100 and 150. Graph the results using Matplotlib in Python. Also implement the tanh and RELU activation functions and experiment on the above cases of number of Neurons in the hidden layer and the number of epochs.
If you need any help related to deep learning project then w are ready to help you. You need to send your requirement details at:
realcode4you@gmail.com
Realcode4you team provide the full support to do any deep learning project or assignment with an affordable price.
Comments