Posts

Document Orientation Correction

Another application of the problem explained in the article “Discovering a small image inside a large image using Python and OpenCV” is that you can use this code to find a special sign in a ID card Scan and make certain changes based on that.

For example, here we have given the program a type of sample certificate that has been scanned in reverse, and after finding the sign in the upper right corner of the card, we rotate it as many times as necessary to show the certificate image in the right direction.

Sample code:

import cv2

def find_image_in_larger_image(small_image, large_image):

# Read images

# Check if images are loaded successfully

if small_image is None or large_image is None:

print(“Error: Unable to load images.”)

return None

# Get dimensions of both images

small_height, small_width = small_image.shape

large_height, large_width = large_image.shape

# Find the template (small image) within the larger image

result = cv2.matchTemplate(large_image, small_image, cv2.TM_CCOEFF_NORMED)

# Define a threshold to consider a match

threshold = 0.6

# Find locations where the correlation coefficient is greater than the threshold

locations = cv2.findNonZero((result >= threshold).astype(int))

# If no match is found

if locations is None:

return None

# Determine the position of the matched areas

matched_positions = []

for loc in locations:

x, y = loc[0]

if x < large_width / 2:

position_x = “left”

otherwise:

position_x = “right”

if y < large_height / 2:

position_y = “top”

otherwise:

position_y = “bottom”

matched_positions.append((position_x, position_y))

return matched_positions

# Example usage

small_image_path = “mark.jpg”

large_image_path = “card.jpg”

small_image = cv2.imread(small_image_path, cv2.IMREAD_GRAYSCALE)

large_image = cv2.imread(large_image_path, cv2.IMREAD_GRAYSCALE)

rotated_large_image=large_image

positions = find_image_in_larger_image(small_image, large_image)

max_rotation = 10 # Set the maximum rotation limit

if positions:

position_x, position_y = positions[0]

print(“Position: {}, {}”.format(position_x, position_y))

while max_rotation>0:

max_rotation-=1

rotated_large_image = cv2.rotate(rotated_large_image, cv2.ROTATE_90_CLOCKWISE)

positions = find_image_in_larger_image(small_image, rotated_large_image)

if positions:

position_x, position_y = positions[0]

print(“Position: {}, {}”.format(position_x, position_y))

if(position_x==’right’ and position_y==’top’):

cv2.imshow(“Mark”, small_image)

cv2.imshow(“Original”, large_image)

cv2.imshow(“Result”, rotated_large_image)

cv2.waitKey(0)

cv2.destroyAllWindows()

max_rotation=0

otherwise:

print(“No match found after {} rotations.”.format(10-max_rotation))

otherwise:

print(“No match found after {} rotations.”.format(10-max_rotation))

Click here to view on GitHub.

Face Detection with HOG and MTCNN

Exploring Face Detection Techniques: HOG vs. MTCNN

Face detection is a fundamental task in computer vision with applications ranging from security systems to social media. Two popular methods for face detection are Histogram of Oriented Gradients (HOG) and Multi-task Cascaded Convolutional Networks (MTCNN). In this blog post, we’ll delve into both techniques, providing an overview of their principles and showcasing Python code for each.

Histogram of Oriented Gradients (HOG)

Understanding HOG

HOG is a feature descriptor widely used for object detection. It works by analyzing the distribution of gradients in an image, making it particularly effective for detecting objects with distinct shapes and textures.

How it Works

The HOG algorithm divides an image into small, overlapping cells, computes the gradient orientation within each cell, and then creates a histogram of these orientations. These histograms are then concatenated to form the final feature vector, which is used for training a support vector machine (SVM) or another classifier.

Python Code Example

We’ll begin by exploring HOG-based face detection using OpenCV in Python. The provided code loads an image, applies the HOG detector, and draws bounding boxes around detected faces.

View on GitHub

Multi-task Cascaded Convolutional Networks (MTCNN)

Understanding MTCNN

MTCNN is a deep learning-based face detection model designed to handle various face orientations and scales. It consists of three stages: face detection, bounding box regression, and facial landmark localization.

How it Works

MTCNN operates in a cascaded manner, with each stage refining the results of the previous one. The first stage detects potential face regions, the second stage refines the bounding boxes, and the third stage locates facial landmarks. The combined information provides accurate face detection.

Python Code Example

Next, we’ll explore face detection using MTCNN with the help of the `mtcnn` library in Python. The code loads an image, applies the MTCNN detector, and displays the image with bounding boxes around detected faces.

View on GitHub

Choosing the Right Method

While both HOG and MTCNN are effective for face detection, each has its strengths and limitations. HOG is robust and computationally efficient, making it suitable for real-time applications. On the other hand, MTCNN excels in handling diverse face orientations and is well-suited for scenarios where faces may appear at different scales and angles.

In the accompanying Python code, we showcase how to implement both techniques and provide insights into their usage. Feel free to experiment with the code and explore which method best fits your specific use case.

Continue reading for a detailed walkthrough of the code, usage instructions, and a discussion on factors influencing detection accuracy.