sachin raj aryan has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks , I am looking for some help as i got stuck while scrapping website for data present in table Below is my html file data which i am fetching .I took it from developer option
<tbody><tr><th colspan="15" style="COLOR:RED;FONT-SIZE:12pt; FONT-WEIG +HT:BOLD; TEXT-ALIGN:center;">Amount </th></tr> <tr><th rowspan="2">Region</th><th colspan="2">Level 31.03.2016</th><t +h colspan="3">Sanction/Renewal<br>01.04.2016 to 28.02.2017</th><th co +lspan="2">Level 28.02.2017</th><th colspan="3">Sanction/Renewal<br>Du +ring Current Month</th><th colspan="2">Level 26.03.2017</th><th colsp +an="2">Growth<br>as on<br>26.03.2017</th></tr><tr><th>No.</th><th>Bal +ance</th><th>No.</th><th>Limit</th><th>Balance</th><th>No.</th><th>Ba +lance</th><th>No.</th><th>Limit</th><th>Balance</th><th>No.</th><th>B +alance</th><th>GDM</th><th>GUM</th></tr><tr> <td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820201.HT +M">TEMPORARY-01</a></td><td>19600</td><td>288.36</td><td>14306</td><t +d>272.25</td><td>194.22</td><td>19246</td><td>284.53</td><td>989</td> +<td>19.02</td><td>12.94</td><td>19450</td><td>290.33</td><td>5.80</td +><td>1.97</td></tr><tr> <td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820202.HT +M">TEMPORARY-02</a></td><td>17417</td><td>167.40</td><td>9466</td><td +>123.61</td><td>99.40</td><td>16717</td><td>167.24</td><td>823</td><t +d>11.71</td><td>9.11</td><td>16721</td><td>169.51</td><td>2.27</td><t +d>2.11</td></tr><tr> <td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820203.HT +M">TEMPORARY-03</a></td><td>13545</td><td>180.62</td><td>8395</td><td +>144.63</td><td>110.32</td><td>12675</td><td>179.13</td><td>333</td>< +td>7.38</td><td>5.38</td><td>12630</td><td>180.13</td><td>1.00</td><t +d>-0.49</td></tr><tr> <td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820204.HT +M">TEMPORARY-04</a></td><td>21826</td><td>225.82</td><td>10249</td><t +d>133.52</td><td>113.51</td><td>21558</td><td>230.69</td><td>624</td> +<td>10.07</td><td>7.84</td><td>21524</td><td>233.99</td><td>3.30</td> +<td>8.17</td></tr><tr> <td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820205.HT +M">TEMPORARY-05</a></td><td>41299</td><td>736.24</td><td>34023</td><t +d>732.70</td><td>601.55</td><td>40822</td><td>732.78</td><td>3177</td +><td>76.32</td><td>60.46</td><td>40794</td><td>736.45</td><td>3.67</t +d><td>0.21</td></tr><tr style="BACKGROUND-COLOR:YELLOW;FONT-WEIGHT:BO +LD;"> <td style="TEXT-ALIGN:LEFT;">TEMPORARY-TOTAL</td><td>1136 +87</td><td>1598.44</td><td>76439</td><td>1406.71</td><td>1119.00</td> +<td>111018</td><td>1594.37</td><td>5946</td><td>124.50</td><td>95.73< +/td><td>111119</td><td>1610.41</td><td>16.04</td><td>11.97</td></tr>< +/tbody>
#!usr/bin/perl ####extracting table having table id #### use Modern::Perl; use WWW::Mechanize; use HTML::TableExtract; open(my $OUT, '>>', 'papa') or die "Could not open file $!"; my $mech = WWW::Mechanize->new(); $mech->get('http://xxx.com/tempo/TYPE_CAT/AGL_CROPC0820200.HTM'); my $html_string = $mech->content(); my $te = HTML::TableExtract->new();####extracting all table #### $te->parse($html_string); foreach my $ts ( $te->tables ) { print "Table (", join( ',', $ts->coords ), "):\n"; foreach my $row ( $ts->rows ) { $OUT-> print( join( ',', @$row ), "\n"); } }
below is my output from abv fetched data in which top 9 lines are not properly formatted
SO i dont know how to extract data with row span and column span.Amount,,,,,,,,,,,,,, Region,Level 31.03.2016,,Sanction/Renewal 01.04.2016 to 28.02.2017,,,Level 28.02.2017,,Sanction/Renewal During Current Month ,,,Level 26.03.2017,,Growth as on 26.03.2017, ,No.,Balance,No.,Limit,Balance,No.,Balance,No.,Limit,Balance,No.,Balan +ce,GDM,GUM TEMPORARY-01,19600,288.36,14306,272.25,194.22,19246,284.53,989,19.02,1 +2.94,19450,290.33,5.80,1.97 TEMPORARY-02,17417,167.40,9466,123.61,99.40,16717,167.24,823,11.71,9.1 +1,16721,169.51,2.27,2.11 TEMPORARY-03,13545,180.62,8395,144.63,110.32,12675,179.13,333,7.38,5.3 +8,12630,180.13,1.00,-0.49 TEMPORARY-04,21826,225.82,10249,133.52,113.51,21558,230.69,624,10.07,7 +.84,21524,233.99,3.30,8.17 TEMPORARY-05,41299,736.24,34023,732.70,601.55,40822,732.78,3177,76.32, +60.46,40794,736.45,3.67,0.21 TEMPORARY-TOTAL,113687,1598.44,76439,1406.71,1119.00,111018,1594.37,59 +46,124.50,95.73,111119,1610.41,16.04,11.97
|
|---|