本文介紹了如何讓 BeautifulSoup 4 尊重自閉標簽?的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!
問題描述
這個問題是針對 BeautifulSoup4 的問題,這使得它不同于以前的問題:
This question is specific to BeautifulSoup4, which makes it different from the previous questions:
BeautifulSoup 為什么要修改我的自閉合元素?
BeautifulSoup 中的 selfClosingTags
由于 BeautifulStoneSoup
已經(jīng)消失(之前的 xml 解析器),我怎樣才能讓 bs4
尊重一個新的自閉合標簽?例如:
Since BeautifulStoneSoup
is gone (the previous xml parser), how can I get bs4
to respect a new self-closing tag? For example:
import bs4
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])
print soup.prettify()
不會自動關閉 bar
標簽,但會給出提示.bs4 所指的這個樹生成器是什么以及如何自我關閉標簽?
Does not self-close the bar
tag, but gives a hint. What is this tree builder that bs4 is referring to and how to I self-close the tag?
/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
"BS4 does not respect the selfClosingTags argument to the "
<html>
<body>
<foo>
<bar a="3">
</bar>
</foo>
</body>
</html>
推薦答案
解析你傳入的XMLxml"作為 BeautifulSoup 構造函數(shù)的第二個參數(shù).
soup = bs4.BeautifulSoup(S, 'xml')
您需要安裝 lxml.
您不再需要傳遞 selfClosingTags
:
In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar a="3"/>
</foo>
這篇關于如何讓 BeautifulSoup 4 尊重自閉標簽?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!
【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權益,請聯(lián)系我們刪除處理,感謝您的支持!