Human beings are very good at identifying and detecting objects in their surroundings,  regardless of their location or situation, such as when they are upside down, distinct in color or shape, partially occluded, and many more. As the result, human beings make object detection and identification look trivial. Object identification and recognition on a machine require a great deal of computation to retrieve any knowledge about the shapes and structures in an image. So, today we are going to build a real-time object detection model with a modified YOLO neural network.

Object Identification and detection in a computer vision is the process of finding and recognizing an object in an image or video. The key steps involved in object detection are:

  • 1. Feature Extraction
  • 2. Feature Processing
  • 3. Object Classification

For several traditional approaches, object detection obtained outstanding results, which can be summarized in the following four aspects below:

  • 1. Feature extraction
  • 2. Feature Coding
  • 3. Feature Aggregation
  • 4. Classification

Algorithms like Convolutional Neural Network and Fast Convolutional Neural Network will not extract the feature of the image completely but a modified algorithm named “You Only Look Once” YOLO extracts the entire feature of the entire image by estimating the bounding boxes using a CNN and class probabilities for these boxes. As compared to the other algorithms, the YOLO algorithm detects the image faster. The following points help to improve the feature of the modified YOLO network:

  • 1. The loss function of the modified YOLO network is optimized
  • 2. The structure of the inception model has been added
  • 3. The layer used is a spatial pyramid pooling layer
  • 4. The proposed model removes and extracts features from the images efficiently and improves object detection significantly.


An image is taken first, and then the YOLO algorithm is applied. In our case, the image is divided into 3 x 3 matrices. Depending on the image’s complexity, we can segment or divide it into any number of grids. After the image has been segmented or divided, the object is classified and located in each grid. The confidence score or the objectness of each grid is determined. The bounding box and objectness value of the grid would be 0 if no proper object can be identified in the grid. If an object is identified in the grid, the objectness would be 1 and the bounding box value would be found as the object’s bounding values.


In a working directory of your choice, run pip env python3 -m venv <ENV_NAME>, then source <ENV_NAME>/bin/activate. Finally, install packages using:

>>> pip install -r requirements.txt


Create a MOV video recording and move it to the working directory. Run ./ <PATH_TO_MOV> to create subdirectories, convert the MOV video to MP4, and download the YOLOv3 dependencies for OpenCV.


  • 1. wget -P yolov3
  • 2. wget -P yolov3
  • 3. wget -P yolov3


You can get the input for this at here and extract it into the repository folder.

Python Implementation:
  • 1. Inspiration: Inspired by a project assignment from the course Computer Vision I at
  • 2. Network Used: You Only Look Once (YOLO) Network.

Note: If you face any problem, kindly raise an issue.


From the active environment, run python and observe the frame-by-frame computations while the annotated output video is written.


The machine configuration for our experiment was as follows:

  • 1. RAM: 8 GB or More
  • 2. Operating System: Windows 10 or Linux
  • 3. Hard disk Size: 1 TB
  • 1. Python
  • 2. Keras (Tensorflow Backend)
  • 3. Anaconda
  • 4. Computer Vision Library: YOLO


In this project, we build a real-time object detection model with a modified YOLO neural network. As generalized from natural pictures to other domains, this algorithm outperforms other techniques. As compared with other algorithms, this algorithm is easy to implement and a much more efficient, and the fastest algorithm to use in real-time. Also, the YOLO algorithm is trained on a complete image in predicting boundaries which predict the fewer false positive in background areas.

☺ Thanks for your time ☺

What do you think of this “Real-Time Object Detection with Modified YOLO Neural Network“? Let us know by leaving a comment below. (Appreciation, Suggestions, and Questions are highly appreciated).

Leave a Reply

Your email address will not be published. Required fields are marked *