問(wèn)題描述
我正在嘗試解析堆棧溢出數(shù)據(jù)轉(zhuǎn)儲(chǔ),其中一張表名為 posts.xml,其中包含大約 1000 萬(wàn)個(gè)條目.示例 xml:
I am trying to parse the stack overflow data dump, one of the tables is called posts.xml which has around 10 million entry in it. Sample xml:
<?xml version="1.0" encoding="utf-8"?>
<posts>
<row Id="1" PostTypeId="1" AcceptedAnswerId="26" CreationDate="2010-07-07T19:06:25.043" Score="10" ViewCount="1192" Body="<p>Now that the Engineer update has come, there will be lots of Engineers building up everywhere. How should this best be handled?</p>
" OwnerUserId="11" LastEditorUserId="56" LastEditorDisplayName="" LastEditDate="2010-08-27T22:38:43.840" LastActivityDate="2010-08-27T22:38:43.840" Title="In Team Fortress 2, what is a good strategy to deal with lots of engineers turtling on the other team?" Tags="<strategy><team-fortress-2><tactics>" AnswerCount="5" CommentCount="7" />
<row Id="2" PostTypeId="1" AcceptedAnswerId="184" CreationDate="2010-07-07T19:07:58.427" Score="5" ViewCount="469" Body="<p>I know I can create a Warp Gate and teleport to Pylons, but I have no idea how to make Warp Prisms or know if there's any other unit capable of transporting.</p>

<p>I would in particular like this to built remote bases in 1v1</p>
" OwnerUserId="10" LastEditorUserId="68" LastEditorDisplayName="" LastEditDate="2010-07-08T00:16:46.013" LastActivityDate="2010-07-08T00:21:13.163" Title="What protoss unit can transport others?" Tags="<starcraft-2><how-to><protoss>" AnswerCount="3" CommentCount="2" />
<row Id="3" PostTypeId="1" AcceptedAnswerId="56" CreationDate="2010-07-07T19:09:46.317" Score="7" ViewCount="356" Body="<p>Steam won't let me have two instances running with the same user logged in.</p>

<p>Does that mean I cannot run a dedicated server on a PC (for example, for Left 4 Dead 2) <em>and</em> play from another machine?</p>

<p>Is there a way to run the dedicated server without running steam? Is there a configuration option I'm missing?</p>
" OwnerUserId="14" LastActivityDate="2010-07-07T19:27:04.777" Title="How can I run a dedicated server from steam?" Tags="<steam><left-4-dead-2><dedicated-server><account>" AnswerCount="1" />
<row Id="4" PostTypeId="1" AcceptedAnswerId="14" CreationDate="2010-07-07T19:11:05.640" Score="10" ViewCount="201" Body="<p>When I get to the insult sword-fighting stage of The Secret of Monkey Island, do I have to learn every single insult and comeback in order to beat the Sword Master?</p>
" OwnerUserId="17" LastEditorUserId="17" LastEditorDisplayName="" LastEditDate="2010-07-08T21:25:04.787" LastActivityDate="2010-07-08T21:25:04.787" Title="Do I have to learn all of the insults and comebacks to be able to advance in The Secret of Monkey Island?" Tags="<monkey-island><adventure>" AnswerCount="3" CommentCount="2" />
我想解析這個(gè)xml,但只加載xml的某些屬性,即Id、PostTypeId、AcceptedAnswerId和其他2個(gè)屬性.SAX 中有沒有辦法讓它只加載這些屬性?如果有那怎么辦?我對(duì) SAX 很陌生,所以一些指導(dǎo)會(huì)有所幫助.
I would like to parse this xml, but only load certain attributes of the xml, which are Id, PostTypeId, AcceptedAnswerId and other 2 attributes. Is there a way in SAX so that it only loads these attributes?? If there is then how? I am pretty new to SAX, so some guidance would help.
否則加載整個(gè)東西會(huì)很慢,而且一些屬性無(wú)論如何都不會(huì)被使用,所以它是無(wú)用的.
Otherwise loading the whole thing would just be purely slow and some of the attributes won't be used anyways so it's useless.
另一個(gè)問(wèn)題是是否可以跳轉(zhuǎn)到具有行 ID X 的特定行?如果可能的話,我該怎么做?
One other question is that would it be possible to jump to a particular row that has a row Id X? If possible then how do I do this?
推薦答案
StartElement" Sax Event 允許處理單個(gè) XML 元素.
"StartElement" Sax Event permits to process a single XML ELement.
在java代碼中你必須實(shí)現(xiàn)這個(gè)方法
In java code you must implement this method
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
if("row".equals(localName)) {
//this code is executed for every xml element "row"
String id = attributes.getValue("id");
String PostTypeId = attributes.getValue("PostTypeId");
String AcceptedAnswerId = attributes.getValue("AcceptedAnswerId");
//others two
// you have your att values for an "row" element
}
}
對(duì)于每個(gè)元素,您可以訪問(wèn):
For every element, you can access:
- 命名空間 URI
- XML QName
- XML 本地名稱
- 屬性圖,這里可以提取你的兩個(gè)屬性...
具體細(xì)節(jié)見 ContentHandler 實(shí)現(xiàn).
see ContentHandler Implementation for specific deatils.
再見
更新:改進(jìn)了之前的片段.
UPDATED: improved prevous snippet.
這篇關(guān)于在 Java 中使用 SAX 解析大型 XML的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!