問題描述
我有這些圖片
我想刪除背景中的文本.只有 captcha 字符
應該保留(即 K6PwKA、YabVzu).任務是稍后使用 tesseract 識別這些字符.
這是我嘗試過的方法,但準確性并不高.
導入 cv2導入 pytesseractpytesseract.pytesseract.tesseract_cmd = r"C:UsersHPO2KORAppDataLocalTesseract-OCR esseract.exe"img = cv2.imread("untitled.png")gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)gray_filtered = cv2.inRange(gray_image, 0, 75)cv2.imwrite("cleaned.png", gray_filtered)
我該如何改進?
注意:我嘗試了所有關于這個問題的建議,但沒有一個對我有用.
根據 Elias 的說法,我嘗試使用 Photoshop 將驗證碼文本的顏色轉換為介于 [100, 105] 之間的灰度.然后我根據這個范圍對圖像進行閾值處理.但是我得到的結果并沒有從 tesseract 中得到令人滿意的結果.
gray_filtered = cv2.inRange(gray_image, 100, 105)cv2.imwrite("cleaned.png", gray_filtered)gray_inv = ~gray_filteredcv2.imwrite("cleaned.png", gray_inv)數據 = pytesseract.image_to_string(gray_inv, lang='eng')
輸出:
'KEP wKA'
結果:
編輯 2:
def get_text(img_name):較低 = (100, 100, 100)上 = (104, 104, 104)img = cv2.imread(img_name)img_rgb_inrange = cv2.inRange(img,下,上)neg_rgb_image = ~img_rgb_inrangecv2.imwrite('neg_img_rgb_inrange.png', neg_rgb_image)數據 = pytesseract.image_to_string(neg_rgb_image, lang='eng')返回數據
給:
文本為
GXuMuUZ
有什么辦法可以緩和一點
這里有兩種可能的方法和一種糾正扭曲文本的方法:
方法一:形態學運算+輪廓濾波
獲取二進制圖像.
輪廓區域過濾
->
反轉->
應用模糊得到結果OCR 的結果
YabVzu
代碼
導入 cv2導入 pytesseract將 numpy 導入為 nppytesseract.pytesseract.tesseract_cmd = rC:Program FilesTesseract-OCR esseract.exe"# 加載圖片,灰度,Otsu的閾值圖像 = cv2.imread('2.png')灰色 = cv2.cvtColor(圖像,cv2.COLOR_BGR2GRAY)thresh = cv2.threshold(灰色, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]# 變形打開以消除噪音內核 = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))開放 = cv2.morphologyEx(閾值,cv2.MORPH_OPEN,內核,迭代 = 1)# 尋找輪廓并去除小噪聲cnts = cv2.findContours(打開,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)cnts = cnts[0] 如果 len(cnts) == 2 否則 cnts[1]對于 cnts 中的 c:面積 = cv2.contourArea(c)如果面積 <50:cv2.drawContours(開口,[c],-1,0,-1)# 反轉并應用輕微的高斯模糊結果 = 255 - 打開結果 = cv2.GaussianBlur(結果, (3,3), 0)# 執行 OCR數據 = pytesseract.image_to_string(結果,lang='eng',config='--psm 6')打印(數據)cv2.imshow('thresh', thresh)cv2.imshow('開場', 開場)cv2.imshow('結果', 結果)cv2.waitKey()
方法二:顏色分割
觀察到要提取的所需文本與圖像中的噪聲具有可區分的對比度,我們可以使用顏色閾值來隔離文本.這個想法是轉換為 HSV 格式然后顏色閾值以獲得使用較低/較高顏色范圍的掩碼.從我們是否使用相同的過程到 Pytesseract 進行 OCR.
輸入圖像
->
掩碼->
結果代碼
導入 cv2導入 pytesseract將 numpy 導入為 nppytesseract.pytesseract.tesseract_cmd = rC:Program FilesTesseract-OCR esseract.exe"# 加載圖片,轉換為HSV,顏色閾值得到mask圖像 = cv2.imread('2.png')hsv = cv2.cvtColor(圖像,cv2.COLOR_BGR2HSV)較低 = np.array([0, 0, 0])上 = np.array([100, 175, 110])掩碼 = cv2.inRange(hsv, 下, 上)# 反轉圖像和 OCR反轉 = 255 - 掩碼數據 = pytesseract.image_to_string(反轉,lang='eng',config='--psm 6')打印(數據)cv2.imshow('掩碼', 掩碼)cv2.imshow('反轉',反轉)cv2.waitKey()
糾正扭曲的文字
OCR 在圖像水平時效果最佳.為了確保文本是 OCR 的理想格式,我們可以執行透視變換.在去除所有噪聲以隔離文本之后,我們可以執行變形關閉以將單個文本輪廓組合成單個輪廓.從這里我們可以使用
與其他圖像一起輸出
更新代碼以包含透視變換
導入 cv2導入 pytesseract將 numpy 導入為 np從 imutils.perspective 導入four_point_transformpytesseract.pytesseract.tesseract_cmd = rC:Program FilesTesseract-OCR esseract.exe"# 加載圖片,轉換為HSV,顏色閾值得到mask圖像 = cv2.imread('1.png')hsv = cv2.cvtColor(圖像,cv2.COLOR_BGR2HSV)較低 = np.array([0, 0, 0])上 = np.array([100, 175, 110])掩碼 = cv2.inRange(hsv, 下, 上)# 變形關閉以將單個文本連接成單個輪廓內核 = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))關閉 = cv2.morphologyEx(掩碼,cv2.MORPH_CLOSE,內核,迭代 = 3)# 找到旋轉的邊界框,然后進行透視變換cnts = cv2.findContours(關閉,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)cnts = cnts[0] 如果 len(cnts) == 2 否則 cnts[1]矩形 = cv2.minAreaRect(cnts[0])box = cv2.boxPoints(rect)盒子 = np.int0(盒子)cv2.drawContours(圖像,[框],0,(36,255,12),2)扭曲 =four_point_transform(255 - 掩碼,box.reshape(4, 2))# 字符識別數據 = pytesseract.image_to_string(扭曲,lang='eng',config='--psm 6')打印(數據)cv2.imshow('掩碼', 掩碼)cv2.imshow('關閉',關閉)cv2.imshow('扭曲',扭曲)cv2.imshow('圖像', 圖像)cv2.waitKey()
注意:顏色閾值范圍是使用此 HSV 閾值腳本確定的
導入 cv2將 numpy 導入為 np什么都沒有(x):經過# 加載圖片圖像 = cv2.imread('2.png')# 創建一個窗口cv2.namedWindow('圖像')# 創建顏色變化的軌跡欄# Opencv 的色調為 0-179cv2.createTrackbar('HMin', 'image', 0, 179, 沒有)cv2.createTrackbar('SMin', 'image', 0, 255, nothing)cv2.createTrackbar('VMin', 'image', 0, 255, nothing)cv2.createTrackbar('HMax', 'image', 0, 179, 沒有)cv2.createTrackbar('SMax', 'image', 0, 255, nothing)cv2.createTrackbar('VMax', 'image', 0, 255, nothing)# 設置 Max HSV 軌跡欄的默認值cv2.setTrackbarPos('HMax', '圖像', 179)cv2.setTrackbarPos('SMax', '圖像', 255)cv2.setTrackbarPos('VMax', 'image', 255)# 初始化 HSV 最小/最大值hMin = sMin = vMin = hMax = sMax = vMax = 0phMin = psMin = pvMin = phMax = psMax = pvMax = 0而(1):# 獲取所有trackbar的當前位置hMin = cv2.getTrackbarPos('HMin', 'image')sMin = cv2.getTrackbarPos('SMin', 'image')vMin = cv2.getTrackbarPos('VMin', 'image')hMax = cv2.getTrackbarPos('HMax', 'image')sMax = cv2.getTrackbarPos('SMax', 'image')vMax = cv2.getTrackbarPos('VMax', 'image')# 設置要顯示的最小和最大 HSV 值較低 = np.array([hMin, sMin, vMin])上 = np.array([hMax, sMax, vMax])# 轉換為HSV格式和顏色閾值hsv = cv2.cvtColor(圖像,cv2.COLOR_BGR2HSV)掩碼 = cv2.inRange(hsv, 下, 上)結果= cv2.bitwise_and(圖像,圖像,掩碼=掩碼)# 如果 HSV 值發生變化,打印如果((phMin!= hMin)|(psMin!= sMin)|(pvMin!= vMin)|(phMax!= hMax)|(psMax!= sMax)|(pvMax!= vMax)):print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax, vMax))phMin = hMinpsMin = sMinpvMin = vMinphMax = hMaxpsMax = sMaxpvMax = vMax# 顯示結果圖片cv2.imshow('圖像', 結果)如果 cv2.waitKey(10) &0xFF == ord('q'):休息cv2.destroyAllWindows()
I have these images
For which I want to remove the text in the background. Only the
captcha characters
should remain(i.e K6PwKA, YabVzu). The task is to identify these characters later using tesseract.This is what I have tried, but it isn't giving much good accuracy.
import cv2 import pytesseract pytesseract.pytesseract.tesseract_cmd = r"C:UsersHPO2KORAppDataLocalTesseract-OCR esseract.exe" img = cv2.imread("untitled.png") gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) gray_filtered = cv2.inRange(gray_image, 0, 75) cv2.imwrite("cleaned.png", gray_filtered)
How can I improve the same?
Note : I tried all the suggestion that I was getting for this question and none of them worked for me.
EDIT : According to Elias, I tried finding the color of the captcha text using photoshop by converting it to grayscale which came out to be somewhere in between [100, 105]. I then threshold the image based on this range. But the result which I got did not give satisfactory result from tesseract.
gray_filtered = cv2.inRange(gray_image, 100, 105) cv2.imwrite("cleaned.png", gray_filtered) gray_inv = ~gray_filtered cv2.imwrite("cleaned.png", gray_inv) data = pytesseract.image_to_string(gray_inv, lang='eng')
Output :
'KEP wKA'
Result :
EDIT 2 :
def get_text(img_name): lower = (100, 100, 100) upper = (104, 104, 104) img = cv2.imread(img_name) img_rgb_inrange = cv2.inRange(img, lower, upper) neg_rgb_image = ~img_rgb_inrange cv2.imwrite('neg_img_rgb_inrange.png', neg_rgb_image) data = pytesseract.image_to_string(neg_rgb_image, lang='eng') return data
gives :
and the text as
GXuMuUZ
Is there any way to soften it a little
解決方案Here are two potential approaches and a method to correct distorted text:
Method #1: Morphological operations + contour filtering
Obtain binary image. Load image, grayscale, then Otsu's threshold.
Remove text contours. Create a rectangular kernel with
cv2.getStructuringElement()
and then perform morphological operations to remove noise.Filter and remove small noise. Find contours and filter using contour area to remove small particles. We effectively remove the noise by filling in the contour with
cv2.drawContours()
Perform OCR. We invert the image then apply a slight Gaussian blur. We then OCR using Pytesseract with the
--psm 6
configuration option to treat the image as a single block of text. Look at Tesseract improve quality for other methods to improve detection and Pytesseract configuration options for additional settings.
Input image
->
Binary->
Morph openingContour area filtering
->
Invert->
Apply blur to get resultResult from OCR
YabVzu
Code
import cv2 import pytesseract import numpy as np pytesseract.pytesseract.tesseract_cmd = r"C:Program FilesTesseract-OCR esseract.exe" # Load image, grayscale, Otsu's threshold image = cv2.imread('2.png') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] # Morph open to remove noise kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2)) opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1) # Find contours and remove small noise cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = cnts[0] if len(cnts) == 2 else cnts[1] for c in cnts: area = cv2.contourArea(c) if area < 50: cv2.drawContours(opening, [c], -1, 0, -1) # Invert and apply slight Gaussian blur result = 255 - opening result = cv2.GaussianBlur(result, (3,3), 0) # Perform OCR data = pytesseract.image_to_string(result, lang='eng', config='--psm 6') print(data) cv2.imshow('thresh', thresh) cv2.imshow('opening', opening) cv2.imshow('result', result) cv2.waitKey()
Method #2: Color segmentation
With the observation that the desired text to extract has a distinguishable contrast from the noise in the image, we can use color thresholding to isolate the text. The idea is to convert to HSV format then color threshold to obtain a mask using a lower/upper color range. From were we use the same process to OCR with Pytesseract.
Input image
->
Mask->
ResultCode
import cv2 import pytesseract import numpy as np pytesseract.pytesseract.tesseract_cmd = r"C:Program FilesTesseract-OCR esseract.exe" # Load image, convert to HSV, color threshold to get mask image = cv2.imread('2.png') hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) lower = np.array([0, 0, 0]) upper = np.array([100, 175, 110]) mask = cv2.inRange(hsv, lower, upper) # Invert image and OCR invert = 255 - mask data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6') print(data) cv2.imshow('mask', mask) cv2.imshow('invert', invert) cv2.waitKey()
Correcting distorted text
OCR works best when the image is horizontal. To ensure that the text is in an ideal format for OCR, we can perform a perspective transform. After removing all the noise to isolate the text, we can perform a morph close to combine individual text contours into a single contour. From here we can find the rotated bounding box using
cv2.minAreaRect
and then perform a four point perspective transform usingimutils.perspective.four_point_transform
. Continuing from the cleaned mask, here's the results:Mask
->
Morph close->
Detected rotated bounding box->
ResultOutput with the other image
Updated code to include perspective transform
import cv2 import pytesseract import numpy as np from imutils.perspective import four_point_transform pytesseract.pytesseract.tesseract_cmd = r"C:Program FilesTesseract-OCR esseract.exe" # Load image, convert to HSV, color threshold to get mask image = cv2.imread('1.png') hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) lower = np.array([0, 0, 0]) upper = np.array([100, 175, 110]) mask = cv2.inRange(hsv, lower, upper) # Morph close to connect individual text into a single contour kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5)) close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=3) # Find rotated bounding box then perspective transform cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = cnts[0] if len(cnts) == 2 else cnts[1] rect = cv2.minAreaRect(cnts[0]) box = cv2.boxPoints(rect) box = np.int0(box) cv2.drawContours(image,[box],0,(36,255,12),2) warped = four_point_transform(255 - mask, box.reshape(4, 2)) # OCR data = pytesseract.image_to_string(warped, lang='eng', config='--psm 6') print(data) cv2.imshow('mask', mask) cv2.imshow('close', close) cv2.imshow('warped', warped) cv2.imshow('image', image) cv2.waitKey()
Note: The color threshold range was determined using this HSV threshold script
import cv2 import numpy as np def nothing(x): pass # Load image image = cv2.imread('2.png') # Create a window cv2.namedWindow('image') # Create trackbars for color change # Hue is from 0-179 for Opencv cv2.createTrackbar('HMin', 'image', 0, 179, nothing) cv2.createTrackbar('SMin', 'image', 0, 255, nothing) cv2.createTrackbar('VMin', 'image', 0, 255, nothing) cv2.createTrackbar('HMax', 'image', 0, 179, nothing) cv2.createTrackbar('SMax', 'image', 0, 255, nothing) cv2.createTrackbar('VMax', 'image', 0, 255, nothing) # Set default value for Max HSV trackbars cv2.setTrackbarPos('HMax', 'image', 179) cv2.setTrackbarPos('SMax', 'image', 255) cv2.setTrackbarPos('VMax', 'image', 255) # Initialize HSV min/max values hMin = sMin = vMin = hMax = sMax = vMax = 0 phMin = psMin = pvMin = phMax = psMax = pvMax = 0 while(1): # Get current positions of all trackbars hMin = cv2.getTrackbarPos('HMin', 'image') sMin = cv2.getTrackbarPos('SMin', 'image') vMin = cv2.getTrackbarPos('VMin', 'image') hMax = cv2.getTrackbarPos('HMax', 'image') sMax = cv2.getTrackbarPos('SMax', 'image') vMax = cv2.getTrackbarPos('VMax', 'image') # Set minimum and maximum HSV values to display lower = np.array([hMin, sMin, vMin]) upper = np.array([hMax, sMax, vMax]) # Convert to HSV format and color threshold hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) mask = cv2.inRange(hsv, lower, upper) result = cv2.bitwise_and(image, image, mask=mask) # Print if there is a change in HSV value if((phMin != hMin) | (psMin != sMin) | (pvMin != vMin) | (phMax != hMax) | (psMax != sMax) | (pvMax != vMax) ): print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax)) phMin = hMin psMin = sMin pvMin = vMin phMax = hMax psMax = sMax pvMax = vMax # Display result image cv2.imshow('image', result) if cv2.waitKey(10) & 0xFF == ord('q'): break cv2.destroyAllWindows()
這篇關于使用 OpenCV 進行圖像處理,從圖像中去除背景文本和噪點的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!
【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!