Have you ever wondered how your mobile uses facial recognition to unlock itself? Or how different deep learning models are used to detect anomalies in an image? Or how do the Instagram filters work in order to manipulate the image it receives?
In this post, we are going to understand how image processing and computer vision work, and how they are used with deep learning in order to create innovative and complex solutions for many day-to-day problems. In order to get a better understanding of this article, I recommend you to read my article on Artificial intelligence and Machine Learning.
Image processing, as the word suggests, is applying different algorithms and techniques to manipulate or modify the image for making it suitable for the task and use case. Almost every one of us has used image processing for one task or another, for example, when we use portrait mode to click a selfie, we are using image processing to blur the background.
Computer Vision, on other hand, is one of the applications of Artificial Intelligence and used algorithms and techniques to identify patterns within the image data. To relate it with image processing, we can once again take the example of using portrait mode. While image processing is used for blurring the background, artificial intelligence is used to identify or recognize the background, and the whole solution developed by the combination of the two will be a Computer Vision Solution.
The answer to the question “How do our mobile know which objects to blur and which objects to not blur?” is Artificial Intelligence or Deep Learning. The answer to the question “How do our mobile blur the objects?” is Image Processing. And the answer to “How does our mobile implement portrait mode?” is Computer Vision.
Image processing is very commonly used to pre-process the image before implementing computer vision algorithms on it. As the applications of Artificial Intelligence are growing, the fields of Computer Vision and Image Processing are progressing alongside. This nature is very intuitive as when the use cases of something diversify, the pre-processing techniques also get in demand.
Now imagine you are given a task to create an Instagram filter that recognizes the user(human), and makes everything in the background blurry as well as makes the background of the image darker. For now, let us assume, that we already have an AI model that detects the user from the image, and gives out its coordinates. Now how can we approach this problem?
While developing AI solutions like this, we need to remember one thing, most of the time, while using computer vision, the models are not trained on the color images, but they are first converted to grayscale(black and white) images, and then used. That is because color images are much more complex to process and relatively take a lot more time to train the model. Let us understand the reason behind it.
Images are stored in the form of a matrix (2-D arrays/vectors) when it comes to processing it. In the case of the color images, each element of that matrix contains 3 numbers, each representing the concentration of Red, Green, and Blue colors from a scale of 0 to 255. On the other hand, each element of grayscale images contains only a single number ranging from 0 to 225 where 0 represents black and 225 represents white. These elements are called pixels and these numbers are called color channels. So, in order to process three color channels a lot more computation power and time are required, and since that can be easily avoided by using grayscale images, most of the models present today, takes only grayscale images as input.
Coming back to our problem, our first step is going to be to convert the input image from color to grayscale. We can do that by using a pre-built algorithm/technique and sending it as input to our AI model. The model will process this image and return the coordinates of the detected user, and we can store them for our future reference. Now, we create our blurring function, and the function to increase the darkness.
There are many popular algorithms we use to blur our background, one of the most famous ones being Gaussian Blur, you can read more about it on this Wikipedia. Once we have selected an algorithm of our choice and created a function, we need to use it on the image. But how can we leave the detected user, out of our processing? We will use something called a mask. Making covers or replaces the entire image, and prevents it from being processed, and once the rest of the image is processed, we can remove the mask.
Once we have our blurred image, we move towards the darkening of the background, for that, we will again use the same mask, and darken the image, by decreasing the concentration of each color channel of all the pixels. After this, we can remove the makes, and output our freshly processed image.
During this segment, all the strategies or techniques we discussed are part of image processing. Computer Vision also includes Artificial Intelligence which was used for creating the model itself that detected the user from the image. The development of the model again includes a lot of image processing and the use of deep learning algorithms.
Computer vision is not only limited to using image processing along with AI but can also include signal processing. This is one of the most growing fields with numerous applications being implemented and used as I write this article. For innovative minds, there is nothing that can restrict them from finding more and more applications for CVs. Especially now, when the application has diversified in the fields of Healthcare, Space(Astronomical) Research, Crime Control, and so on.