Want to grab list of players from http://www.atpworldtour.com/Rankings/Singles.aspx
There is a table with class "bioTableAlt", we have to grab all the <tr> after the first one (class "bioTableHead"), which is used for table heading.
Wanted content looks like:
<tr class="oddRow">
<td>2</td>
<td>
<a href="/Tennis/Players/Top-Players/Novak-Djokovic.aspx">Djokovic, Novak</a>
(SRB)
</td>
<td>
<a href="/Tennis/Players/Top-Players/Novak-Djokovic.aspx?t=rb">6,905</a>
</td>
<td>0</td>
<td>
<a href="/Tennis/Players/Top-Players/Novak-Djokovic.aspx?t=pa&m=s">21</a>
</td>
</tr>
<tr>
<td>3</td>
<td>
<a href="/Tennis/Players/Top-Players/Roger-Federer.aspx">Federer, Roger</a>
(SUI)
</td>
<td>
<a href="/Tennis/Players/Top-Players/Roger-Federer.aspx?t=rb">6,795</a>
</td>
<td>0</td>
<td>
<a href="/Tennis/Players/Top-Players/Roger-Federer.aspx?t=pa&m=s">21</a>
</td>
</tr>
I think the best idea is to create an array(), make each <tr> an unique row and throw final code to the list.txt file, like:
Array (
[2] => stdClass Object (
[name] => Djokovic, Novak
[country] => SRB
[rank] => 6,905
)
[3] => stdClass Object (
[name] => Federer, Roger
[country] => SUI
[rank] => 6,795
)
)
We're parsing each <tr>:
[2]is a number from first<td>[name]is text of the link inside second<td>[country]is a value between (...) in second<td>[rank]is the text of the link inside third<td>
In final file list.txt should contain an array() with ~100 IDS (we are grabbing the page with first 100 players).
Additionally, will be amazing, if we make a small fix for each [name] before adding it to an array() - "Federer, Roger" should be converted to "Roger Federer" (just catch the word before comma, throw it to the end of the line).
Thanks.