問題描述
我正在使用 import org.jdom.* 編寫一個 java 應用程序;
I'm writing an application in java using import org.jdom.*;
我的 XML 是有效的,但有時它包含 HTML 標記.例如,像這樣:
My XML is valid,but sometimes it contains HTML tags. For example, something like this:
<program-title>Anatomy & Physiology</program-title>
<overview>
<content>
For more info click <a href="page.html">here</a>
<p>Learn more about the human body. Choose from a variety of Physiology (A&P) designed for complementary therapies.&#160; Online studies options are available.</p>
</content>
</overview>
<key-information>
<category>Health & Human Services</category>
所以我的問題在于 <p > overview.content 節點內的標簽.
So my problem is with the < p > tags inside the overview.content node.
我希望這段代碼可以工作:
I was hoping that this code would work :
Element overview = sds.getChild("overview");
Element content = overview.getChild("content");
System.out.println(content.getText());
但它返回空白.
如何從 overview.content 節點返回所有文本(嵌套標簽和所有)?
How do I return all the text ( nested tags and all ) from the overview.content node ?
謝謝
推薦答案
content.getText()
提供即時文本,該文本僅對帶有文本內容的葉子元素有用.
content.getText()
gives immediate text which is only useful fine with the leaf elements with text content.
技巧是使用 org.jdom.output.XMLOutputter
(帶文本模式 CompactFormat
)
Trick is to use org.jdom.output.XMLOutputter
( with text mode CompactFormat
)
public static void main(String[] args) throws Exception {
SAXBuilder builder = new SAXBuilder();
String xmlFileName = "a.xml";
Document doc = builder.build(xmlFileName);
Element root = doc.getRootElement();
Element overview = root.getChild("overview");
Element content = overview.getChild("content");
XMLOutputter outp = new XMLOutputter();
outp.setFormat(Format.getCompactFormat());
//outp.setFormat(Format.getRawFormat());
//outp.setFormat(Format.getPrettyFormat());
//outp.getFormat().setTextMode(Format.TextMode.PRESERVE);
StringWriter sw = new StringWriter();
outp.output(content.getContent(), sw);
StringBuffer sb = sw.getBuffer();
System.out.println(sb.toString());
}
輸出
For more info click<a href="page.html">here</a><p>Learn more about the human body. Choose from a variety of Physiology (A&P) designed for complementary therapies.&#160; Online studies options are available.</p>
請探索其他 格式化 選項并在上面進行修改根據您的需要編寫代碼.
Do explore other formatting options and modify above code to your need.
封裝XMLOutputter格式選項的類.典型用戶可以使用getRawFormat()(不改變空白)、getPrettyFormat()(空白美化)、getCompactFormat()(空白歸一化)得到的標準格式配置."
"Class to encapsulate XMLOutputter format options. Typical users can use the standard format configurations obtained by getRawFormat() (no whitespace changes), getPrettyFormat() (whitespace beautification), and getCompactFormat() (whitespace normalization). "
這篇關于如何從 JDOM 獲取節點內容的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!