問題描述
我嘗試獲取打開的 xml 標記和它的關閉對應項之間的全部內容.
I try to get the whole content between an opening xml tag and it's closing counterpart.
像下面的 title
這樣直接獲取內容很容易,但是如果 mixed-content 被使用,我想保留內部標簽?
Getting the content in straight cases like title
below is easy, but how can I get the whole content between the tags if mixed-content is used and I want to preserve the inner tags?
<?xml version="1.0" encoding="UTF-8"?>
<review>
<title>Some testing stuff</title>
<text sometimes="attribute">Some text with <extradata>data</extradata> in it.
It spans <sometag>multiple lines: <tag>one</tag>, <tag>two</tag>
or more</sometag>.</text>
</review>
我想要的是兩個text
標簽之間的內容,包括任何標簽:Some text with <extradata>data</extradata>在里面.它跨越<sometag>多行:<tag>one</tag>、<tag>two</tag>或更多</sometag>.
現(xiàn)在我使用正則表達式,但它有點亂,我不喜歡這種方法.我傾向于基于 XML 解析器的解決方案.我查看了 minidom
、etree
、lxml
和 BeautifulSoup
,但找不到適合這種情況的解決方案(整個內容,包括內部標簽).
For now I use regular expressions but it get's kinda messy and I don't like this approach. I lean towards a XML parser based solution. I looked over minidom
, etree
, lxml
and BeautifulSoup
but couldn't find a solution for this case (whole content, including inner tags).
推薦答案
from lxml import etree
t = etree.XML(
"""<?xml version="1.0" encoding="UTF-8"?>
<review>
<title>Some testing stuff</title>
<text>Some text with <extradata>data</extradata> in it.</text>
</review>"""
)
(t.text + ''.join(map(etree.tostring, t))).strip()
這里的訣竅是 t
是可迭代的,并且在迭代時會產(chǎn)生所有子節(jié)點.因為etree避免了文本節(jié)點,所以還需要恢復第一個子標簽之前的文本,用t.text
.
The trick here is that t
is iterable, and when iterated, yields all child nodes. Because etree avoids text nodes, you also need to recover the text before the first child tag, with t.text
.
In [50]: (t.text + ''.join(map(etree.tostring, t))).strip()
Out[50]: '<title>Some testing stuff</title>
<text>Some text with <extradata>data</extradata> in it.</text>'
或者:
In [6]: e = t.xpath('//text')[0]
In [7]: (e.text + ''.join(map(etree.tostring, e))).strip()
Out[7]: 'Some text with <extradata>data</extradata> in it.'
這篇關于如何獲取 Python 中兩個 xml 標簽之間的全部內容?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!