問題描述
所以我目前正在使用 SAX 嘗試從我正在處理的許多 xml 文檔中提取一些信息.到目前為止,提取屬性值真的很容易.但是,我不知道如何從文本節點中提取實際值.
So I am currently using SAX to try and extract some information from a a number of xml documents I am working from. Thus far, it is really easy to extract the attribute values. However, I have no clue how to go about extracting actual values from a text node.
例如,在給定的 XML 文檔中:
For example, in the given XML document:
<w:rStyle w:val="Highlight" />
</w:rPr>
</w:pPr>
- <w:r>
<w:t>Text to Extract</w:t>
</w:r>
</w:p>
- <w:p w:rsidR="00B41602" w:rsidRDefault="00B41602" w:rsidP="007C3A42">
- <w:pPr>
<w:pStyle w:val="Copy" />
通過從 val 獲取值,我可以毫無問題地提取突出顯示".但我不知道如何進入該文本節點并退出要提取的文本".
I can extract "Highlight" no problem by getting the value from val. But I have no idea how to get into that text node and get out "Text to Extract".
這是我迄今為止提取屬性值的 Java 代碼...
private static final class SaxHandler extends DefaultHandler
{
// invoked when document-parsing is started:
public void startDocument() throws SAXException
{
System.out.println("Document processing starting:");
}
// notifies about finish of parsing:
public void endDocument() throws SAXException
{
System.out.println("Document processing finished.
");
}
// we enter to element 'qName':
public void startElement(String uri, String localName,
String qName, Attributes attrs) throws SAXException
{
if(qName.equalsIgnoreCase("Relationships"))
{
// do nothing
}
else if(qName.equalsIgnoreCase("Relationship"))
{
// goes into the element and if the attribute is equal to "Target"...
String val = attrs.getValue("Target");
// ...and the value is not null
if(val != null)
{
// ...and if the value contains "image" in it...
if (val.contains("image"))
{
// ...then get the id value
String id = attrs.getValue("Id");
// ...and use the substring method to isolate and print out only the image & number
int begIndex = val.lastIndexOf("/");
int endIndex = val.lastIndexOf(".");
System.out.println("Id: " + id + " & Target: " + val.substring(begIndex+1, endIndex));
}
}
}
else
{
throw new IllegalArgumentException("Element '" +
qName + "' is not allowed here");
}
}
// we leave element 'qName' without any actions:
public void endElement(String uri, String localName, String qName) throws SAXException
{
// do nothing;
}
}
但我不知道從哪里開始進入該文本節點并提取其中的值.有人有什么想法嗎?
But I have no clue where to start to get into that text node and pull out the values inside. Anyone have some ideas?
推薦答案
下面是一些偽代碼:
private boolean insideElementContainingTextNode;
private StringBuilder textBuilder;
public void startElement(String uri, String localName, String qName, Attributes attrs) {
if ("w:t".equals(qName)) { // or is it localName?
insideElementContainingTextNode = true;
textBuilder = new StringBuilder();
}
}
public void characters(char[] ch, int start, int length) {
if (insideElementContainingTextNode) {
textBuilder.append(ch, start, length);
}
}
public void endElement(String uri, String localName, String qName) {
if ("w:t".equals(qName)) { // or is it localName?
insideElementContainingTextNode = false;
String theCompleteText = this.textBuilder.toString();
this.textBuilder = null;
}
}
這篇關于在 JAVA 中使用 SAX 解析器從 XML 文件中提取文本節點的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!