Classical Computer Vision and Perspective Transformation for Sudoku Extraction

Animated perspective transformation warping the Sudoku grid

of AI hype, it feels like everyone is using Vision-Language Models and large Vision Transformers for every problem in Computer Vision. Many people see these tools as one-size-fits-all solutions and immediately use the latest, shiniest model instead of understanding the underlying signal they want to extract. But oftentimes there’s beauty to simplicity. It’s one of the most important lessons I’ve learned as an engineer: don’t overcomplicate solutions to simple problems.

Animation of processing pipeline — Processing pipeline steps animated

Let me show you a practical application of some simple classical Computer Vision techniques to detect rectangular objects on flat surfaces and apply a perspective transformation to transform the skewed rectangle. Similar methods are widely used, for example, in document scanning and extraction applications.

Along the way you will learn some interesting concepts from standard classical Computer Vision techniques to how to order polygon points and why this is related to a combinatoric assignment problem.

Overview

Detection
- Grayscale
- Edge Detection
- Dilation
- Contour Detection
Perspective Transformation
- Variant A: Simple sort based on sum/diff
- Variant B: Assignment Optimization Problem
- Variant C: Cyclic sorting with anchor
- Applying the Perspective Transformation
Conclusion

Detection

To detect Sudoku grids I considered many different approaches ranging from simple thresholding, hough line transformations or some form of edge detection to training a deep learning model for segmentation or keypoint detection.

Let’s define some assumptions to scope the problem:

The Sudoku grid is clearly and fully visible in the frame with a clear quadrilateral border, with strong contrast from the background.
The surface on which the Sudoku grid is printed needs to be flat, but can be captured from an angle and appear skewed or rotated.

I will show you a simple pipeline with some filtering steps to detect the bounds of our Sudoku grid. On a high level, the processing pipeline looks as follows:

Visualization of processing pipeline steps

Grayscale

In this first step we simply convert the input image from its three color channels to a single channel grayscale image, as we don’t need any color information to process these images.

def find_sudoku_grid(
    image: np.ndarray,
) -> np.ndarray | None:
    """
    Finds the largest square-like contour in an image, likely the Sudoku grid.

    Returns:
        The contour of the found grid as a numpy array, or None if not found.
    """

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Edge Detection

After converting the image to grayscale we can use the Canny edge detection algorithm to extract edges. There are two thresholds to choose for this algorithm that determine if pixels are accepted as edges:

Canny edge thresholds explained — Thresholds of Canny edge detection

In our case of detecting Sudoku grids, we assume very strong edges on the border lines of our grid. We can choose a high upper threshold to reject noise from appearing in our mask, and a lower threshold not too low to reject small noisy edges connected to the main border from showing up in our mask.

A blur filter is often used before passing images to Canny to reduce noise, but in this case the edges are very strong but narrow, hence the blur is omitted.

def find_sudoku_grid(
    image: np.ndarray,
    canny_threshold_1: int = 100,
    canny_threshold_2: int = 255,
) -> np.ndarray | None:
    """
    Finds the largest square-like contour in an image, likely the Sudoku grid.

    Args:
        image: The input image.
        canny_threshold_1: Lower threshold for the Canny edge detector.
        canny_threshold_2: Upper threshold for the Canny edge detector.

    Returns:
        The contour of the found grid as a numpy array, or None if not found.
    """

    ...

    canny = cv2.Canny(gray, threshold1=canny_threshold_1, threshold2=canny_threshold_2)

Dilation

In this next step, we post-process the edge detection mask with a dilation kernel to close small gaps in the mask.

def find_sudoku_grid(
    image: np.ndarray,
    canny_threshold_1: int = 100,
    canny_threshold_2: int = 255,
    morph_kernel_size: int = 3,
) -> np.ndarray | None:
    """
    Finds the largest square-like contour in an image, likely the Sudoku grid.

    Args:
        image: The input image.
        canny_threshold_1: First threshold for the Canny edge detector.
        canny_threshold_2: Second threshold for the Canny edge detector.
        morph_kernel_size: Size of the morphological operation kernel.

    Returns:
        The contour of the found grid as a numpy array, or None if not found.
    """

    ...

    kernel = cv2.getStructuringElement(
        shape=cv2.MORPH_RECT, ksize=(morph_kernel_size, morph_kernel_size)
    )
    mask = cv2.morphologyEx(canny, op=cv2.MORPH_DILATE, kernel=kernel, iterations=1)

Contour Detection

Now that the binary mask is ready, we can run a contour detection algorithm to find coherent blobs and filter down to a single contour with four points.

contours, _ = cv2.findContours(
    mask, mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE
)

This initial contour detection will return a list of contours that contain every single pixel that is part of the contour. We can use the Douglas–Peucker algorithm to iteratively reduce the number of points in the contour and approximate the contour with a simple polygon. We can choose a minimum distance between points for the algorithm.

If we assume that even for some of the most skewed rectangle, the shortest side is at least 10% of the circumference of the shape, we can filter the contours down to polygons with exactly four points.

contour_candidates: list[np.ndarray] = []
for cnt in contours:
    # Approximate the contour to a polygon
    epsilon = 0.1 * cv2.arcLength(curve=cnt, closed=True)
    approx = cv2.approxPolyDP(curve=cnt, epsilon=epsilon, closed=True)

    # Keep only polygons with 4 vertices
    if len(approx) == 4:
        contour_candidates.append(approx)

Finally we take the largest detected contour, presumably the final Sudoku grid. We sort the contours by area in reverse order and then take the first element, corresponding to the largest contour area.

best_contour = sorted(contour_candidates, key=cv2.contourArea, reverse=True)[0]

Filtered contour highlighted on original image

Perspective Transformation

Finally we need to transform the detected grid back to its square. To achieve this, we can use a perspective transformation. The transformation matrix can be calculated by specifying where the four points of our Sudoku grid contour need to be placed in the end: the four corners of the image.

rect_dst = np.array(
    [[0, 0], [width - 1, 0], [width - 1, height - 1], [0, height - 1]],
)

To match the contour points to the corners, they need to be ordered first, so they can be assigned correctly. Let’s define the following order for our corner points:

Variant A: Simple sort based on sum/diff

To sort the extracted corners and assign them to these target points, a simple algorithm could look at the sum and differences of the x and y coordinates for each corner.

p_sum = p_x + p_y
p_diff = p_x - p_y

Based on these values, it is now possible to differentiate the corners:

The top left corner has both a small x and y value, it has the smallest sum argmin(p_sum)
Bottom right corner has the largest sum argmax(p_sum)
Top right corner has the largest diff argmax(p_diff)
Bottom left corner has the smallest difference argmin(p_diff)

In the following animation, I tried to visualize this assignment of the four corners of a rotating square. The colored lines represent the respective image corner assigned to each square corner.

Animation of a rotating square, each corner with a different color and lines indicating the assignment to image corners

def order_points(pts: np.ndarray) -> np.ndarray:
    """
    Orders the four corner points of a contour in a consistent
    top-left, top-right, bottom-right, bottom-left sequence.

    Args:
        pts: A numpy array of shape (4, 2) representing the four corners.

    Returns:
        A numpy array of shape (4, 2) with the points ordered.
    """
    # Reshape from (4, 1, 2) to (4, 2) if needed
    pts = pts.reshape(4, 2)
    rect = np.zeros((4, 2), dtype=np.float32)

    # The top-left point will have the smallest sum, whereas
    # the bottom-right point will have the largest sum
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]

    # The top-right point will have the smallest difference,
    # whereas the bottom-left will have the largest difference
    diff = np.diff(pts, axis=1)
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]

    return rect

This works well unless the rectangle is heavily skewed, like the following one. In this case, you can clearly see that this method is flawed, as there the same rectangle corner is assigned multiple image corners.

Same assignment procedure fails with a skewed rotating quadrilateral shape

Variant B: Assignment Optimization Problem

Another approach would be to minimize the distances between each point and its assigned corner. This can be implemented using a pairwise_distances calculation between each point and the corners and the linear_sum_assignment function from scipy, which solves the assignment problem while minimizing a cost function.

def order_points_simplified(pts: np.ndarray) -> np.ndarray:
    """
    Orders a set of points to best match a target set of corner points.

    Args:
        pts: A numpy array of shape (N, 2) representing the points to order.

    Returns:
        A numpy array of shape (N, 2) with the points ordered.
    """
    # Reshape from (N, 1, 2) to (N, 2) if needed
    pts = pts.reshape(-1, 2)

    # Calculate the distance between each point and each target corner
    D = pairwise_distances(pts, pts_corner)

    # Find the optimal one-to-one assignment
    # row_ind[i] should be matched with col_ind[i]
    row_ind, col_ind = linear_sum_assignment(D)

    # Create an empty array to hold the sorted points
    ordered_pts = np.zeros_like(pts)

    # Place each point in the correct slot based on the corner it was matched to.
    # For example, the point matched to target_corners[0] goes into ordered_pts[0].
    ordered_pts[col_ind] = pts[row_ind]

    return ordered_pts

Animated rotating skewed quadrilaterl with corners assigned correctly to image corners — Animated rotating skewed quadrilateral with corners assigned correctly to image corners

Even though this solution works, it is not ideal, as it relies on the image distance between the shape points and the corners and it is computationally more expensive because a distance matrix has to be constructed. Of course here in the case of four points assigned this is negligible, but this solution would not be well suited for a polygon with many points!

Variant C: Cyclic sorting with anchor

This third variant is a very lightweight and efficient way to sort and assign the points of the shape to the image corners. The idea is to calculate an angle for each point of the shape based on the centroid position.

Sketch of angles assigned to each corner

Since the angles are cyclic, we need to choose an anchor to guarantee the absolute order of the points. We simply select the point with the lowest sum of x and y.

def order_points(self, pts: np.ndarray) -> np.ndarray:
    """
    Orders points by angle around the centroid, then rotates to start from top-left.

    Args:
        pts: A numpy array of shape (4, 2).

    Returns:
        A numpy array of shape (4, 2) with points ordered."""
    pts = pts.reshape(4, 2)
    center = pts.mean(axis=0)
    angles = np.arctan2(pts[:, 1] - center[1], pts[:, 0] - center[0])
    pts_cyclic = pts[np.argsort(angles)]
    sum_of_coords = pts_cyclic.sum(axis=1)
    top_left_idx = np.argmin(sum_of_coords)
    return np.roll(pts_cyclic, -top_left_idx, axis=0)

Animated rotating skewed quadrilaterl with corners assigned correctly with angle assignment method

We can now use this function to sort our contour points:

rect_src = order_points(grid_contour)

Applying the Perspective Transformation

Now that we know which points need to go where, we can finally move on to the most interesting part: creating and actually applying the perspective transformation to the image.

Animation of applying perspective transformation

Since we already have our list of points for the detected quadrilateral sorted in rect_src, and we have our target corner points in rect_dst, we can use the OpenCV method for calculating the transformation matrix:

warp_mat = cv2.getPerspectiveTransform(rect_src, rect_dst)

The result is a 3×3 warp matrix, defining how to transform from a skewed 3D perspective view to a 2D flat top-down view. To get this flat top-down view of our Sudoku grid, we can apply this perspective transformation to our original image:

warped = cv2.warpPerspective(img, warp_mat, (side_len, side_len))

And voilà, we have our perfectly square Sudoku grid!

Final flat top-down view of Sudoku square after perspective transformation

Conclusion

In this project we walked through a simple pipeline using classical Computer Vision techniques to extract Sudoku grids from pictures. These methods provide a simple way to detect the bounds of the Sudoku grids. Of course due to its simplicity there are some limitations to how well this approach generalizes to different settings and extreme environments such as low light or hard shadows. Using a deep-learning based approach could make sense if the detection needs to generalize to a vast amount of different settings.

Next, a perspective transformation is used to get a flat top-down view of the grid. This image can now be used in further processing, such as extracting the numbers in the grid and actually solving the Sudoku. In a next article we will look further into these natural next steps in this project.

Check out the source code of the project below and let me know if you have any questions or thoughts on this project. Until then, happy coding!

For more details and the full implementation including the code for the all the animations and visualizations, check out the source code of this project on my GitHub:

https://github.com/trflorian/sudoku-extraction

All visualizations in this post were created by the author.

Source link

The post Classical Computer Vision and Perspective Transformation for Sudoku Extraction first appeared on TechToday.

This post originally appeared on TechToday.