Web Application with ML

Posted on January 16, 2025 • 4 minutes • 791 words

Here is a video example:

I created a basic application with two ways to detect numbers from 0-9:

You can either click on a number, and the image is detected using a deep learning algorithm.
You can draw a number, and it should detect the number.

The dataset for training and testing numbers was assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. It consists of 60,000 training images and 10,000 test images. Instead of using test images, I am using my input from either the clicked number or the drawn number.

Problem

The problem we are solving here is to classify grayscale images of handwritten digits (28 × 28 pixels) into 10 categories (0 through 9).

How to build this application?

Create a basic web application with:

Backend: FastAPI (FastAPI Documentation )
Frontend: Basic JavaScript, HTML, and CSS
Dataset: MNIST (Learn about MNIST ) - grayscale images of handwritten digits
Deep Learning Framework: Keras (About Keras ) for creating and training the neural network
Canva: For creating different handwritten number images (Canva )

Step 1: Create the Neural Network

Follow these steps to build the neural network:

Get the training and testing data from MNIST

from keras.datasets import mnist  
from keras.models import load_model  

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Create deep learning layers and define activation functions

from keras import layers, Sequential  

model = Sequential([  
    layers.Dense(512, activation="relu"),  
    layers.Dense(10, activation="softmax")  
])

Compile the model with an optimizer, loss function, and metrics

model.compile(optimizer="adam",   
              loss="sparse_categorical_crossentropy",   
              metrics=["accuracy"])

Preprocess the training data and train the model

train_images = train_images.reshape((60000, 28 * 28))  
train_images = train_images.astype("float32") / 255  

model.fit(train_images, train_labels, epochs=5, batch_size=128)

Test the trained model with sample test data

test_digit = test_image.reshape(1, test_image.size)  
predictions = model.predict(test_digit)  
print("\nPredictions:", predictions, "\nPrediction MaxValue:", predictions.argmax(), "\n")

Step 2: Integrate Backend and Frontend

Now, the backend and frontend need to provide input to the trained neural network. The input must be in grayscale format, as the training dataset was grayscale.

For Image Input

For the functionality where clicking an image is taken as input, display the image using basic HTML.

 <img src="/static/images/1.png" alt="Image 1" data-image-name="/static/images/1.png">

For drawing number, use the canvas element and make sure the image is sent to backend.

<canvas id="drawingArea" width="280" height="280"></canvas>

Ensure the image is sent to the backend in blob format via a POST API. Here is the JavaScript code used:

images.forEach(image => {  
  image.addEventListener('click', async () => {  
    const imageName = image.getAttribute('data-image-name');  

    // Fetch the image as a Blob  
    const response = await fetch(imageName);  
    const blob = await response.blob();  

    // Create a FormData object to send the file  
    const formData = new FormData();  
    formData.append('file', blob, imageName);  

    // Send the image to the backend  
    try {  
      const result = await fetch('http://127.0.0.1:8000/number_detection', {  
        method: 'POST',  
        body: formData  
      });  

      if (result.ok) {  
        const jsonResult = await result.json();  
        document.getElementById("output_number").value = jsonResult.result;  
      } else {  
        console.error('Error processing image:', result.statusText);  
      }  
    } catch (error) {  
      console.error('Error uploading image:', error);  
    }  
  });  
});

Image resize and reshape to 1D array

The backend will get the image, it has to resize into 28x28 pixels and convert this input format into 1d array as our neural networks only takes the input in that format.

        # Resize to 28x28 pixels
        image = image.resize((28, 28))

        # Convert to NumPy array and normalize pixel values to [0, 1]
        image_array = np.asarray(image, dtype=np.float32) / 255.0

        # Flatten the array to 1D
        flattened_array = image_array.flatten()

Predict the image

Now, pass this new image (reshaped and resized) to your neural network and then check if it predicts the correct number as that of the image. In result, you will get 2 parts printed:

Predictions: [[7.1005906e-07 9.0936519e-02 8.9278919e-01 5.7332212e-04 4.2719766e-05
  7.1836407e-03 7.6220306e-03 5.4302714e-06 8.4631785e-04 1.0209130e-07]] 

Prediction MaxValue: 2

Here :

Predictions: As we created the second dense layer with 10 neurons ( layers.Dense(10, activation=“softmax”)), we get 10 responses in prediction. Each value is a probability, meaning how confident the model is that the input corresponds to a specific digit.For example, 8.9278919e-01 (or 0.8928) is the probability that the input represents the digit 1.
Prediction MaxValue : From the above list of predictions whichever gives the maximum highest number of confidence/probability is the input number/expected number. In our case, 9.0936519e-02 as the max probability, so the input given to this one should be 2. Display this number on the output block and you are good to go.

And this is how we create the basic web-application and predict the handwritten numbers using deep learning. This is my first app as well, I enjoyed understanding deep learning with an application in action, so hope you enjoyed it too. 😃

You can find the complete git code here: https://github.com/neetaBirajdar/machine_learning_web_app

Okay then see you in my next blog! 👋