Machine Learning & Image Recognition
By Lucien Gheerbrant and Pierre-Louis Audelan-Ameline
Boy, oh boy⊠Have you ever been struggling deciding if a picture is either that of a car, a horse or a deer?! Who has not? Well this project is for you!
Introduction
At the end of 2023, me and my friend, Pierre-Louis Audelan-Ameline , had the opportunity to dive into a hands-on Machine Learning (ML) practical session. It was supervised by Pierre-Alain Moellic (great guy). We had to build an Image Recognition (IC) model. IC is one of the key application with ML. You give an image depicting $x$ to a computer, and (hopefully) it guesses that itâs $x$ (and not $y$ or $z$). So⊠How do we get an accurate Car/Deer/Horse recognition software?
The dataset
Whatâs in there
The dataset we used is CIFAR-3. It is a modified version of the well-known CIFAR-10 [1] dataset. CIFAR-10 was provided to us by M. Moellic. It is composed of three different classes, as visible on figure 1 below:

First and foremost, we need to load the dataset:
=
=
Do the split!
Nevertheless, letâs not forget MLâs golden rule1:
âThe test data cannot influence training the model in any way.â
- Probably one of the big guys of ML
That is to say, if we do not split the database, we will not know if our model works outside of its knowledge base. One of the problem one could run into, if this is not done, could be overfitting.
=
, , ,
Oh! And letâs check if the dataset is balanced. In other words: Is there approximatively the same number of pictures in each class.
= 0
= 0
= 0
+= 1
+= 1
+= 1
= 0
= 0
= 0
+= 1
+= 1
+= 1
Yeah! Thatâs pretty balanced. We are almost good to go.
Dimensionality Reduction with Principal Component Analysis
After loading and balancing the dataset, we moved on to reducing the dimensionality of our data using Principal Component Analysis (PCA). The original dataset had 3072 features (32 $\times$ 32 pixels $\times$ 3 color channels). This is quite high for our poor student laptops. PCA helped us reduce the number of features to a more manageable number while retaining most of the important information. We want to try and keep the most significant details/features, but still reduce dimensionality. That is why we will perform multiple PCAs with different numbers of compenents: To try and find the most balanced PCA transform that doesnât take away too much information.


We found that around 200 components captured 95% of the variance. Thatâs what weâll pick.
=
=
=
Supervised Learning Techniques
With the reduced dataset, we applied two supervised learning techniques: Logistic Regression and Gaussian Naive Bayes Classifier. These methods are relatively simple but effective for classification tasks.
Logistic Regression
We trained a Logistic Regression model on the CIFAR-3-GRAY dataset.
=
=
59.5 % accuracy! Quite decent⊠Especially given the simplicity of the method.
Gaussian Naive Bayes Classifier
=
=
Pretty much the same results as Logistic Regression. Pretty nice2!
Overfitting ? Underfitting ?
If we do not want to gloss over overfitting/underfitting problems, we need to compare the accuracy of the model on the train set with the accuracy on the test set. If the train values are way higher than the test values, the model would be overfitting.
The LR training accuracy is slightly above, with a 3.6Â % variation. However, this is low enough to say that there is no overfitting in both of these models.Deep Learning with Multilayer Perceptron
The good stuff! We then explored deep learning using a Multilayer Perceptron (MLP) model. We designed an MLP with input, hidden, and output layers.
My very first model đ„č
= 3
=
=
=
=
=
=
=
=
=
=
Alright! Letâs see how it doesâŠ
outputs1 = model1.fit(X_train, y_train,epochs=10, batch_size=16,
validation_data=(X_test, y_test))

We have two hidden layers here. The results are alright compared to the LR and the GNB. Indeed, we got over the 70 % accuracy bar. But the loss is pretty high. We will need to play around with the hyperparameters to squeeze out any accuracy.
But thatâs not all: we can see the model starting to overfit at the 20 epoch mark. After this, the losses of the model on the validation and training set start to greatly divert. The loss is way smaller on the validation set than on the training set. Furthermore, we can see that the model becomes much better at predicting on the training set than on the validation set around the same 20 epoch mark. Thatâs our âsweet spotâ then.
Overfitting, Iâm coming for youâŠ
Letâs first fight this overtfitting by adding some dropouts. Letâs also stop earlier, as the loss and accuracy seem stable pretty early in the epochs.

The more, the merrier right?
Alright, I am a naive engineering student. I believe that if I add more parameters, I have a better model, right?! Letâs do this then. We went from 2 layers to 4 layers, with a lot more neurons :

Oh. We have almost the same results as in the previous model. This shows that more neurons does not equal more accuracy. Adding complexity is even sometimes harmful to the model.
Advancing with Convolutional Neural Networks
Finally, we used Convolutional Neural Networks (CNN) for processing color images. The CNN model included convolutional layers, pooling, and dense layers.
CNN model
=
Letâs run it!

The results are way much better than in the MLP and the statistical models. We passed the 80 % accuracy bar, and the loss got below 0.4.
Making it better
Letâs reduce the complexity, but add some pooling layers and some dropouts. We want to diversify the patterns learned by the model. Pooling allows doing spatial filtering mixed with statitics to analyze the data. Letâs also reduce the learning rate, add some convolutional layers and add some dense layers.

We still start to overfit at the 10th epoch mark, but we have now pretty good results, still getting over 85 % accuracy.
Conclusion
The CNN outperforms the MLP and simpler statistical models significantly, achieving an accuracy exceeding 85 %, while the MLP caps at 75 %. This notable difference in performance can be attributed to several key advantages of CNNs over MLPs.
The convolutional nature of CNNs allows them to excel in finding localized patterns within the data. Unlike MLPs, which attempt to identify patterns across the entire image, CNNs focus on smaller, local regions. This enables them to capture intricate details, making them particularly effective in image-related tasks.
Pooling also enhances the robustness of CNNs by making them less sensitive to spatial variations in the input data. MLPs, on the other hand, are more susceptible to changes in the spatial arrangement of features. Pooling helps CNNs still seek for a pattern, regardless of its size, position and orientation.
CNNs features parameter sharing, which reduces the number of parameters and overall complexity when compared to MLPs. By sharing weights across different regions of the input, CNNs efficiently learn and recognize features in various parts of the image. This not only contributes to a more compact model but also enhances the networkâs ability to generalize well to new data. This makes CNN less susceptible to overfitting. With fewer parameters to adjust, CNNs are better equipped to discern meaningful features from the data and are less likely to memorize noise or irrelevant patterns during training than MLPs. This aspect contributes to the CNNâs robustness and improved performance on unseen data.