問題描述
我有一個這樣的 html 表格:
I have a html table like this :
<table ... >
<tbody ... >
<tr ... >
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
string...
</td>
</tr>
<tr ... >
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
</td>
<td ...>
string...
</td>
</tr>
..............
</tbody>
</table>
這是一個數據表,我需要從中獲取所有數據.該表將有許多行 (
).每行都有一個固定的列(
)(目前是 5 ).記住每個表、tr、td 標簽可能已格式化(其中說...")
This is a data table and I need to get all data from this.
The table will have many rows (<tr></tr>
) . each row will have a fixed columns (<td></td>
)(currently is 5 ).
remember each table,tr,td tag maybe formatted (where say "...")
我希望大家能幫我寫一個正則表達式用于 preg_match_all
函數來獲取這樣的數據:
And I hope everyone can help me to write a regex for preg_match_all
function to get the data like this :
array(
0 => array(
0=> 'some data0',
1=> 'some data1',
2=> 'some data2',
3=> 'some data3',
4=> 'some data4',
)
1 => array(
0=> 'some data0',
1=> 'some data1',
2=> 'some data2',
3=> 'some data3',
4=> 'some data4',
)
2 => array(
0=> 'some data0',
1=> 'some data1',
2=> 'some data2',
3=> 'some data3',
4=> 'some data4',
)
..........
)
現在是你的測試示例,希望你能幫助我?。?!
Now the example for your test, hopfully you can help me!!!
<table border="1" >
<tbody style="" >
<tr style="" >
<td style="color:blue;">
data0
</td>
<td style="font-size:15px;">
data1
</td>
<td style="font-size:15px;">
data2
</td>
<td style="color:blue;">
data3
</td>
<td style="color:blue;">
data4
</td>
</tr>
<tr style="" >
<td style="color:blue;">
data00
</td>
<td style="font-size:15px;">
data11
</td>
<td style="font-size:15px;">
data22
</td>
<td style="color:blue;">
data33
</td>
<td style="color:blue;">
data44
</td>
</tr>
<tr style="color:black" >
<td style="color:blue;">
data000
</td>
<td style="font-size:15px;">
data111
</td>
<td style="font-size:15px;">
data222
</td>
<td style="color:blue;">
data333
</td>
<td style="color:blue;">
data444
</td>
</tr>
</tbody>
</table>
推薦答案
你絕對不想用 Regex 解析 HTML.
You absolutely do NOT want to parse HTML with Regex.
有太多的變體,一方面,更重要的是,正則表達式對于 HTML 的層次結構不是很好.最好使用 XML 解析器或更好的 HTML 特定解析器.
There are far too many variations, for one, and more importantly, regex isn't very good with the hierarchal nature of HTML. It's best to use an XML parser or better-yet an HTML-specific parser.
每當我需要抓取 HTML 時,我傾向于使用 Simple HTML DOM Parser 庫,它需要一個HTML 樹并將其解析為可遍歷的 PHP 對象,您可以在該對象中查詢類似 JQuery 的內容.
Whenever I need to scrape HTML, I tend to use the Simple HTML DOM Parser library, which takes an HTML tree and parses it into a traversable PHP object, which you can query something like JQuery.
<?php
require 'simplehtmldom/simple_html_dom.php';
$sHtml = <<<EOS
<table border="1" >
<tbody style="" >
<tr style="" >
<td style="color:blue;">
data0
</td>
<td style="font-size:15px;">
data1
</td>
<td style="font-size:15px;">
data2
</td>
<td style="color:blue;">
data3
</td>
<td style="color:blue;">
data4
</td>
</tr>
<tr style="" >
<td style="color:blue;">
data00
</td>
<td style="font-size:15px;">
data11
</td>
<td style="font-size:15px;">
data22
</td>
<td style="color:blue;">
data33
</td>
<td style="color:blue;">
data44
</td>
</tr>
<tr style="color:black" >
<td style="color:blue;">
data000
</td>
<td style="font-size:15px;">
data111
</td>
<td style="font-size:15px;">
data222
</td>
<td style="color:blue;">
data333
</td>
<td style="color:blue;">
data444
</td>
</tr>
</tbody>
</table>
EOS;
$oHTML = str_get_html($sHtml);
$oTRs = $oHTML->find('table tr');
$aData = array();
foreach($oTRs as $oTR) {
$aRow = array();
$oTDs = $oTR->find('td');
foreach($oTDs as $oTD) {
$aRow[] = trim($oTD->plaintext);
}
$aData[] = $aRow;
}
var_dump($aData);
?>
和輸出:
array
0 =>
array
0 => string 'data0' (length=5)
1 => string 'data1' (length=5)
2 => string 'data2' (length=5)
3 => string 'data3' (length=5)
4 => string 'data4' (length=5)
1 =>
array
0 => string 'data00' (length=6)
1 => string 'data11' (length=6)
2 => string 'data22' (length=6)
3 => string 'data33' (length=6)
4 => string 'data44' (length=6)
2 =>
array
0 => string 'data000' (length=7)
1 => string 'data111' (length=7)
2 => string 'data222' (length=7)
3 => string 'data333' (length=7)
4 => string 'data444' (length=7)
這篇關于僅從 php 中使用的 preg_match_all 的 html 表中獲取數據的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!