問題描述
我對 Python Selenium for Chrome 中的 --headless
模式有疑問.
I have a question about --headless
mode in Python Selenium for Chrome.
代碼
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
CHROME_DRIVER_DIR = "selenium/chromedriver"
chrome_options = webdriver.ChromeOptions()
caps = DesiredCapabilities().CHROME
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--remote-debugging-port=9222")
chrome_options.add_argument("--headless") # Runs Chrome in headless mode.
chrome_options.add_argument('--no-sandbox') # # Bypass OS security model
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
browser = webdriver.Chrome(desired_capabilities=caps, executable_path=CHROME_DRIVER_DIR, options=chrome_options)
browser.get("https://www.manta.com/c/mm2956g/mashuda-contractors")
print(browser.page_source)
browser.quit()
當(dāng)我刪除 chrome_options.add_argument("--headless")
一切正常,但有了這個 --headless*
得到下一個問題
When I'm remove chrome_options.add_argument("--headless")
all working good, but with this --headless*
got next issue
Please enable cookies.
Error 1020 Ray ID: 53fd62b4087d8116 ? 2019-12-04 11:19:28 UTC
Access denied
What happened?
This website is using a security service to protect itself from online attacks.
Cloudflare Ray ID: 53fd62b4087d8116 ? Your IP: 168.81.117.111 ? Performance & security by Cloudflare
普通模式和--headless
有什么區(qū)別?
What is the difference for normal mode and --headless
?
推薦答案
我拿走了你的代碼,刪除了可選的 arguments 并添加了一些 arguments 來執(zhí)行測試如下:
I took your code, removed the optional arguments and added a few arguments to execute the test as follows:
代碼塊:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:UtilityBrowserDriverschromedriver.exe')
driver.get("https://www.manta.com/c/mm2956g/mashuda-contractors")
print(driver.page_source)
driver.quit()
控制臺輸出:
Console Output:
<html class="js" lang="en-US" style="opacity: 1; visibility: visible;"><!--<![endif]--><head>
<title>Access denied | www.manta.com used Cloudflare to restrict access</title>
<meta charset="UTF-8">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">
<meta name="robots" content="noindex, nofollow">
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1">
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection">
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">body{margin:0;padding:0}</style>
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script><!--<![endif]-->
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
<div id="cf-error-details" class="cf-error-details-wrapper">
<div class="cf-wrapper cf-header cf-error-overview">
<h1>
<span class="cf-error-type" data-translate="error">Error</span>
<span class="cf-error-code">1020</span>
<small class="heading-ray-id">Ray ID: 53fd7c2fca12d5fc ? 2019-12-04 11:36:52 UTC</small>
</h1>
<h2 class="cf-subheadline">Access denied</h2>
</div><!-- /.header -->
<section></section><!-- spacer -->
<div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="what_happened">What happened?</h2>
<p>This website is using a security service to protect itself from online attacks.</p>
</div>
</div>
</div><!-- /.section -->
<div class="cf-error-footer cf-wrapper">
<p>
<span class="cf-footer-item">Cloudflare Ray ID: <strong>53fd7c2fca12d5fc</strong></span>
<span class="cf-footer-separator">?</span>
<span class="cf-footer-item"><span>Your IP</span>: 123.201.54.43</span>
<span class="cf-footer-separator">?</span>
<span class="cf-footer-item"><span>Performance & security by</span> <a id="brand_link" target="_blank">Cloudflare</a></span>
</p>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script type="text/javascript">
window._cf_translation = {};
</script>
</body></html>
從提取的頁面源中,使用 --headless
參數(shù)可以清楚地看到您正在訪問的頁面:
From the extracted page source it is pretty clear using --headless
argument you are reaching to a page with:
- 標(biāo)題為:拒絕訪問 |www.manta.com 使用 Cloudflare 限制訪問.
- 一些信息:發(fā)生了什么?:該網(wǎng)站正在使用安全服務(wù)來保護(hù)自己免受在線攻擊.
瀏覽上下文即Chrome瀏覽器會話被檢測為BOT,并且導(dǎo)航被阻止.
The Browsing Context i.e. Chrome Browser session is getting detected as a BOT and the navigation is blocked.
您可以在以下位置找到一些相關(guān)討論:
You can find a couple of relevant discussions in:
- 是否存在無法檢測到的硒版本?硒真的無法檢測到嗎?
- 檢測到通過 ChromeDriver 啟動的 Chrome 瀏覽器
- 網(wǎng)頁正在檢測 Selenium Webdriver使用 Chromedriver 作為機(jī)器人
這篇關(guān)于通過 Selenium Python 在正常/無頭模式下使用 ChromeDriver/Chrome 訪問 Cloudflare 網(wǎng)站有什么區(qū)別的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!