問題描述
如何使用 Selenium 和 Python 繞過 Google 驗證碼?
How can I bypass the Google CAPTCHA using Selenium and Python?
當我嘗試抓取某些內(nèi)容時,Google 會給我一個驗證碼.我可以使用 Selenium Python 繞過 Google 驗證碼嗎?
When I try to scrape something, Google give me a CAPTCHA. Can I bypass the Google CAPTCHA with Selenium Python?
例如,它是 Google reCAPTCHA.您可以通過以下鏈接查看此驗證碼:https://www.google.com/recaptcha/api2/演示
As an example, it's Google reCAPTCHA. You can see this CAPTCHA via this link: https://www.google.com/recaptcha/api2/demo
推薦答案
開始使用 Selenium 的 Python 客戶端,你應(yīng)該避免解決/繞過 Google 驗證碼.
To start with using Selenium's Python clients, you should avoid solving/bypass Google CAPTCHA.
Selenium 使瀏覽器自動化.現(xiàn)在,您想用這種能力實現(xiàn)什么完全取決于個人,但主要是為了通過瀏覽器客戶端自動化 Web 應(yīng)用程序以進行測試,當然不限于此.
Selenium automates browsers. Now, what you want to achieve with that power is entirely up to individuals, but primarily it is for automating web applications through browser clients for testing purposes and of coarse it is certainly not limited to that.
另一方面,CAPTCHA(縮寫為 ...完全自動化用于區(qū)分計算機和人類的公共圖靈測試...)是一種用于計算以確定用戶是否是人類的挑戰(zhàn)-響應(yīng)測試.
On the other hand, CAPTCHA (the acronym being ...Completely Automated Public Turing test to tell Computers and Humans Apart...) is a type of challenge–response test used in computing to determine if the user is human.
因此,Selenium 和 CAPTCHA 服務(wù)于兩個完全不同的目的,理想情況下不應(yīng)該用于完成任何相互關(guān)聯(lián)的任務(wù).
So, Selenium and CAPTCHA serves two completely different purposes and ideally shouldn't be used to achieve any interrelated tasks.
話雖如此,reCAPTCHA 可以輕松檢測網(wǎng)絡(luò)流量并將您的程序識別為 Selenium 驅(qū)動 bot.
Having said that, reCAPTCHA can easily detect the network traffic and identify your program as a Selenium driven bot.
但是,有一些通用方法可以避免在網(wǎng)頁抓取時被檢測到:
However, there are some generic approaches to avoid getting detected while web scraping:
- 網(wǎng)站可以確定您的腳本/程序的首要屬性是您的顯示器大小.所以建議不要使用常規(guī)的Viewport.
- 如果您需要向網(wǎng)站發(fā)送多個請求,請繼續(xù)更改每個請求的用戶代理.在這里您可以找到關(guān)于 如何在 Selenium 中更改 Google Chrome 用戶代理?
- 要模擬 類人 行為,您可能需要減慢腳本執(zhí)行速度,甚至超出 WebDriverWait 和 expected_conditions 誘導(dǎo)
time.sleep(secs)代碼>.在這里您可以找到關(guān)于如何的詳細討論在 Python 中休眠 Selenium WebDriver 幾毫秒
- The first and foremost attribute a website can determine your script/program by is through your monitor size. So it is recommended not to use the conventional Viewport.
- If you need to send multiple requests to a website, keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
- To simulate humanlike behavior, you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing
time.sleep(secs)
. Here you can find a detailed discussion on How to sleep Selenium WebDriver in Python for milliseconds
但是,在幾個用例中,我們能夠與 reCAPTCHA 進行交互使用 Selenium,您可以在以下討論中找到更多詳細信息:
However, in a couple of use cases we were able to interact with the reCAPTCHA using Selenium and you can find more details in the following discussions:
- 如何點擊使用 Selenium 和 Java 的 reCAPTCHA
- CSS 選擇器使用 Selenium 和 VBA Excel 進行 reCAPTCHA 支票簿
- 查找reCAPTCHA 元素并點擊它——Python + Selenium
您可以在以下位置找到一些相關(guān)的討論:
You can find a couple of related discussion in:
- 如何通過 Python 使用 GeckoDriver 和 Firefox 使 Selenium 腳本無法檢測?
- 是否存在無法檢測到的 Selenium WebDriver 版本?
- reCAPTCHA 3 如何知道我在使用 Selenium/chromedriver?
這篇關(guān)于如何使用 Selenium 和 Python 繞過 Google 驗證碼?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!