問題描述
我正在執行一項 OCR 任務,以從多個身份證明文件中提取信息.一個挑戰是掃描圖像的方向.需要固定 PAN、Aadhaar、駕駛執照或任何身份證明的掃描圖像的方向.
已經在 Stackoverflow 和其他論壇上嘗試過所有建議的方法,例如 OpenCV minAreaRect、霍夫線變換、FFT、單應性、具有 psm 0 的 tesseract osd.沒有一個有效.
邏輯應返回文本方向的角度 - 0、90 和 270 度.附上0、90、270度的圖片.這與確定偏度無關.
這是一種基于大部分文本偏向一側的假設的方法.這個想法是我們可以根據主要文本區域的位置來確定角度
- 將圖像轉換為灰度和高斯模糊
- 獲取二值圖像的自適應閾值
- 使用輪廓區域查找輪廓和過濾
- 在蒙版上繪制過濾輪廓
- 根據方向水平或垂直分割圖像
- 計算每一半的像素數
轉換為灰度和高斯模糊后,我們自適應閾值得到二值圖像
從這里我們找到輪廓并使用輪廓區域進行過濾以去除小的噪聲顆粒和大的邊界.我們將通過此過濾器的任何輪廓繪制到蒙版上
為了確定角度,我們根據圖像的尺寸將圖像分成兩半.如果 <代碼> 寬度 >height 那么它必須是水平圖像,所以我們垂直分成兩半.如果 <代碼> 高度 >寬度 那么它必須是垂直圖像所以我們水平分割成兩半
現在我們有兩半,我們可以使用 cv2.countNonZero()
來確定每一半的白色像素的數量.以下是確定角度的邏輯:
如果是水平的如果左 >= 右度->0別的度->180如果垂直如果頂部 >= 底部度->270別的度->90
<塊引用>
離開9703
右 3975
因此圖像是 0 度.這是其他方向的結果
<塊引用>離開 3975
右 9703
我們可以得出結論,圖像翻轉了 180 度
這是垂直圖像的結果.注意因為它是一個垂直的圖像,我們水平分割
<塊引用>前 3947 個
底部 9550
因此結果是90度
導入 cv2將 numpy 導入為 npdef 檢測角度(圖像):掩碼 = np.zeros(image.shape,dtype=np.uint8)灰色 = cv2.cvtColor(圖像,cv2.COLOR_BGR2GRAY)模糊 = cv2.GaussianBlur(灰色, (3,3), 0)自適應 = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)cnts = cv2.findContours(自適應,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)cnts = cnts[0] 如果 len(cnts) == 2 否則 cnts[1]對于 cnts 中的 c:面積 = cv2.contourArea(c)如果面積 <45000 和區域 >20:cv2.drawContours(掩碼,[c],-1,(255,255,255),-1)掩碼 = cv2.cvtColor(掩碼,cv2.COLOR_BGR2GRAY)h, w = mask.shape# 水平的如果 w >H:左 = 掩碼[0:h, 0:0+w//2]右 = 掩碼 [0:h, w//2:]left_pixels = cv2.countNonZero(左)right_pixels = cv2.countNonZero(右)如果 left_pixels >= right_pixels 則返回 0 否則 180# 垂直的別的:頂部 = 掩碼[0:h//2, 0:w]底部 = 掩碼[h//2:, 0:w]top_pixels = cv2.countNonZero(top)bottom_pixels = cv2.countNonZero(底部)如果 bottom_pixels >= top_pixels 則返回 90,否則返回 270如果 __name__ == '__main__':圖像 = cv2.imread('1.png')角度 = 檢測角度(圖像)打印(角度)
I am working on a OCR task to extract information from multiple ID proof documents. One challenge is the orientation of the scanned image. The need is to fix the orientation of the scanned image of PAN, Aadhaar, Driving License or any ID proof.
Already tried all suggested approaches on Stackoverflow and other forums such as OpenCV minAreaRect, Hough Lines Transforms, FFT, homography, tesseract osd with psm 0. None are working.
The logic should return the angle of the text direction - 0, 90 and 270 degrees. Attached are the images of 0, 90 and 270 degrees. This is not about determining the skewness.
Here's an approach based on the assumption that the majority of the text is skewed onto one side. The idea is that we can determine the angle based on the where the major text region is located
- Convert image to grayscale and Gaussian blur
- Adaptive threshold to get a binary image
- Find contours and filter using contour area
- Draw filtered contours onto mask
- Split image horizontally or vertically based on orientation
- Count number of pixels in each half
After converting to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image
From here we find contours and filter using contour area to remove the small noise particles and the large border. We draw any contours that pass this filter onto a mask
To determine the angle, we split the image in half based on the image's dimension. If width > height
then it must be a horizontal image so we split in half vertically. if height > width
then it must be a vertical image so we split in half horizontally
Now that we have two halves, we can use cv2.countNonZero()
to determine the amount of white pixels on each half. Here's the logic to determine angle:
if horizontal
if left >= right
degree -> 0
else
degree -> 180
if vertical
if top >= bottom
degree -> 270
else
degree -> 90
left 9703
right 3975
Therefore the image is 0 degrees. Here's the results from other orientations
left 3975
right 9703
We can conclude that the image is flipped 180 degrees
Here's results for vertical image. Note since its a vertical image, we split horizontally
top 3947
bottom 9550
Therefore the result is 90 degrees
import cv2
import numpy as np
def detect_angle(image):
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
adaptive = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)
cnts = cv2.findContours(adaptive, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 45000 and area > 20:
cv2.drawContours(mask, [c], -1, (255,255,255), -1)
mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
h, w = mask.shape
# Horizontal
if w > h:
left = mask[0:h, 0:0+w//2]
right = mask[0:h, w//2:]
left_pixels = cv2.countNonZero(left)
right_pixels = cv2.countNonZero(right)
return 0 if left_pixels >= right_pixels else 180
# Vertical
else:
top = mask[0:h//2, 0:w]
bottom = mask[h//2:, 0:w]
top_pixels = cv2.countNonZero(top)
bottom_pixels = cv2.countNonZero(bottom)
return 90 if bottom_pixels >= top_pixels else 270
if __name__ == '__main__':
image = cv2.imread('1.png')
angle = detect_angle(image)
print(angle)
這篇關于根據文本方向檢測圖像方向角度的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!