I used the pandas library to calculate summary statistics of the traffic signs data set:
Here is histogram of the data set, showing how the data is distributed:
As a first step, I decided to convert the images to grayscale because color is not the major factor that distinguishes a sign from another. The essence of the signs exist within its shapes and symbols.
Here is an example of a traffic sign image before and after grayscaling. It was done simply by averaging the RGB values that it is not EXACTLY gray, but it meets the intention of the preprocessing.
As a last step, I normalized the image data. In theory, it's not necessary to normalize numeric x-data (also called independent data). However, practice has shown that when numeric x-data values are normalized, neural network training is often more efficient, which leads to a better predictor. Basically, if numeric data is not normalized, and the magnitudes of two predictors are far apart, then a change in the value of a neural network weight has far more relative influence on the x-value with larger magnitudes. For example, for the first line of raw data, a neural network weight change of 0.1 will change magnitude of the age factor by (0.1 30) = 3, but will change the income factor by (0.1 38,000) = 3,800.
I decided to generate additional data because as it was shown in the histogram above, the volume of dataset for each calss is very unbalanced, which may result in as much unbalanced classifier after training.
To add more data to the the data set, I used the following techniques of rotating, flipping, and changing color values. Even if the image itself may represent the same sign, training the model with the rotated, flipped of color changed images will enhance the performance.
Here is an example of an original image and an augmented image:
The difference between the original data set and the augmented data set is the following ...
|Input||32x32x1 Grayscaled, normalized image|
|Convolution 5x5||1x1 stride, 'VALID' padding, outputs 28x28x6|
|Max pooling||2x2 stride, outputs 14x14x6|
|Convolution 5x5||1x1 stride, 'VALID' padding, outputs 10x10x16|
|Max pooling||2x2 stride, outputs 5x5x16|
|Flatten||Output = 400|
|Fully connected||Output = 120|
|Dropout||keeping probability: 0.500|
|Fully connected||Output = 84|
|Dropout||keeping probability: 0.500|
|Fully connected||Output = 43|
To train the model, I used an AdamOptimizer with the following hyper parameters:
My final model results were:
test set accuracy of 0.942
I implemented LeNet Neural Network Architecture, and added some dropouts to enhance the performance. I chose to use LeNet because it is proven to work well with MNIST dataset, and I believed the traffic signs are not too different from those. Also, this architecture is small enough to be tuned easily, yet large enough to result in great performance. These assumptions were proven with the model's accuracy on training, validation and test set as shown above. The high accuracy of the training set and validation set shows how well the training process was done, and the high accuracy of the test set shows how well it performs with new dataset that were never used in the training process.
Here are five Traffic signs that I found on the web:
Here are the results of the prediction:
|Speed Limit (30km/h)||Speed Limit (30km/h)|
|Priority Road||Priority Road|
|Speed Limit (50km/h)||Speed Limit (50km/h)|
|No vehicle over 3.5ton||No vehicle over 3.5ton|
The model was able to correctly guess 5 of the 5 traffic signs, which gives an accuracy of 100%, which is higher than the accuracy on the test set (94.2%).
This part shows the predictions for each signs with 5 the highest probabilities. Each row represent a result for the image on the left-most. The numbers in the box represent the calculated probability of each image.