Object recognition using neural networks (Yolo 7, etc)
Project detail
1. Required skills:
– Basic knowledge of statistics and probability theory.
– Certain knowledge of the methodology of neural networks and elementary architectures, such as dense neural networks and convolutional neural networks (CNN = convolutional neural network).
– Knowledge of frameworks for manipulation of neural networks: for example, PyTorch or TensorFlow.
2. Data format.
It is important to fix the data format for the entire learning process. Neural networks lack flexibility in this regard, and the widely used resizing of the image (so-called re-scaling) can distort it and significantly reduce the probability of identifying a small object that occupies only a few (dozens) of pixels.
Example:
– input image size M x N pixels;
– each pixel is coded with one number – intensity within [0, 1].
For example: 256*192.
3. Development of a neural network for object identification and verification.
It should be a CNN architecture like AlexNet, VGG, or similar.
For training, it is better to take an already trained neural network, add layers for resizing the image (at the beginning), and modeling the desired output (at the end).
4. Preparation of a training sample for training.
Here are a few steps. It is the most complex task:
(a) Acquisition of thermal images through the camera in their final form.
(b) Development of an algorithm for the synthesis of artificial images of the desired type with variations in the number, shape, and size of objects for identification. This algorithm should include a well-designed and controlled probability distribution, with the necessary parameters for image curation.
(c) Generating the required number of images.
(d) Realistic image noise with a separate ready-made algorithm, or with the algorithm and point (b).
5. Training of both neural networks.
Training must be carried out according to the principle of gradient descent using either inertia or the Adam-type algorithm. The learning constant (learning rate) should be reduced several times (for example, divided by 10), two or three times during training.
External loops are possible for choosing the architecture and learning parameters from a finite (adaptive) set of candidates.