in reply to table within table

Hi and good gay to you sir

Programming 101 - look at the problem slowly and describe what needs to be done in simple steps

Take a look at perlre, try to implement a solution and come back to us with code that shows any problem you are having.

Anon below is correct, rolling your own regex for a pre-tokenised format is the wrong approach.

print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

Replies are listed 'Best First'.
Re^2: table within table
by Anonymous Monk on Feb 10, 2010 at 12:03 UTC
    perlre is not for html
Re^2: table within table
by johncute (Initiate) on Feb 11, 2010 at 09:35 UTC

    here is my code:

    $ctr=0; while(/(<table>[^\000]*?<\/table>)/){ $text=$1; while($text=~/<table>/){ $tag=$&; $ctr=$ctr+1; $tag=~s/(<table)/\1$ctr/; $text=~s/<table>/$tag/; } $text=~s/(<table)$ctr>/\1_level$ctr>/g; $text=~s/(<\/table)>/\1_level$ctr>/g; $ctr=0; $text=~s/(<table)[0-9]+>/\1>/g; $text=~s/(<\/?)(thead|tbody)([^>]*)?>//g; $text=~s/(<\/?)(th)([^>]*)?>/$1td>/g; while($text =~ /<a href="([^"]*)">[^\000]*?<\/a>/){ $href = $1; $class = ""; if($href =~ /^http/i){ $class = "http";} if($href =~ /^www/i){ $class = "nohttp";} if($href =~ /^mailto/i){$class = "mailto";} if($href =~ /^ftp/i){ $class = "ftp";} if($class eq ""){ $text =~ s/<a href="([^"]*)">([^\000]*?)<\/a>/\2/; }else{ $text =~ s/<a href="([^"]*)">([^\000]*?)<\/a>/<remotelink href +class="$class" href="\1" >\2<\/remotelink>/; } } s/<table>[^\000]*?<\/table>/$text/; } # Remove table and img tags inside table if an <img /> tag was encou +ntered while (/<table_level(2|3)>[^\000]*?<\/table_level\1>/) { $table2=$&; if ($table2 =~ /<img /) { # Remove all table tags including <img /> tag $table2=~s/<\/?(table_level(2|3)|tr|td)(\s+[^>]*)?>|<img\s+[^>]*\/ +>//g; s/<table_level(2|3)>[^\000]*?<\/table_level\1>/$table2/; } else { $table2=~s/(<\/?table_level\d)/$1_temp/g; s/<table_level(2|3)>[^\000]*?<\/table_level\1>/$table2/; } } s/(<\/?table_level\d)_temp/$1/g; # Extract table inside table if no <img /> tag was encountered # inside the inner table. while (/(<table_level1>[^\000]*?<\/table_level1>)/) { $table1=$1; #$table2=""; $table=""; while ($table1=~ /<table_level2>([^\000]*?)<\/table_level2>/) { $table2=$1; $table=$&; # Extract inner table and place it after the second level table + $extracted_table3=""; while ($table2 =~ s/(<table_level3>[^\000]*?<\/table_level3>)//) + { $extracted_table3="$extracted_table3\n$1"; } $table2=~s/<table_level2>([^\000]*?)<\/table_level2>/$table$extr +acted_table3/g; #$table2=~s/(<table_level2>[^\000]*?<\/table_level2>)/$1$extract +ed_table3/g; s/(<table_level2>[^\000]*?<\/table_level2>)//; #$table2=~s/(<\/?table)_level2/$1_2/g; } $table1=~s/<table_level2>([^\000]*?)<\/table_level2>//; $table1=~s/<table_2>([^\000]*?)<\/table_2>//; $table1=~s/(<\/?table)_level1/$1/g; s/<table_level1>[^\000]*?<\/table_level1>/$table2$table1/; } s/(<\/?table)_(level\d|\d)/$1/g;

    And here is my sample data

    <table> <tr> <td> <table> <thead> <tr> <th>Vill</th> <th>Hi</th> <th>Au</th> </tr> </thead> <tbody> <tr> <td>Aix</td> <td>40</td> <td>27</td> </tr> <tr> <td>Freib</td> <td>30</td> <td></td> </tr> <tr> <td>Gdan</td> <td>20</td> <td>13</td> </tr> <tr> <td>Gd</td> <td>44</td> <td>14</td> </tr> <tr> <td>Gren</td> <td>33</td> <td>22</td> </tr> <tr> <td>Karl</td> <td>26</td> <td></td> </tr> <tr> <td>La</td> <td>31</td> <td>18</td> </tr> <tr> <td></td> <td>30</td> <td>20</td> </tr> <tr> <td>Lyon</td> <td>41</td> <td>19</td> </tr> <tr> <td>Man</td> <td>22</td> <td></td> </tr> <tr> <td>Mar</td> <td>32</td> <td>18</td> </tr> <tr> <td>Mar</td> <td>17</td> <td>13</td> </tr> <tr> <td>Mon</td> <td>36</td> <td>26</td> </tr> <tr> <td>Mul</td> <td>30</td> <td>45</td> </tr> <tr> <td>Mun</td> <td>28</td> <td>23</td> </tr> <tr> <td>Nice</td> <td>41</td> <td>17</td> </tr> <tr> <td>Nims</td> <td>34</td> <td>25</td> </tr> <tr> <td>Nio</td> <td>29</td> <td>21</td> </tr> <tr> <td>Orleans</td> <td>32</td> <td>17</td> </tr> <tr> <td>Pad</td> <td>36</td> <td>20</td> </tr> <tr> <td>Paris</td> <td>24</td> <td>29</td> </tr> <tr> <td>Perk</td> <td>38</td> <td>29</td> </tr> <tr> <td>Poit</td> <td>27</td> <td>24</td> </tr> <tr> <td>Prag</td> <td>26</td> <td>16</td> </tr> <tr> <td></td> <td>23</td> <td>14</td> </tr> <tr> <td>Ren</td> <td>30</td> <td>18</td> </tr> <tr> <td>Rot</td> <td>36</td> <td>27</td> </tr> <tr> <td>Rou</td> <td>45</td> <td>22</td> </tr> <tr> <td>Saint</td> <td>33</td> <td>20</td> </tr> <tr> <td>Salon</td> <td>33</td> <td>18</td> </tr> <tr> <td>Sev</td> <td>63</td> <td>29</td> </tr> <tr> <td>Sop</td> <td>19</td> <td>8</td> </tr> <tr> <td>Stra</td> <td>28</td> <td>26</td> </tr> <tr> <td>Stut</td> <td>26</td> <td></td> </tr> <tr> <td>logne</td> <td>22</td> <td>11</td> </tr> <tr> <td>lon</td> <td>31</td> <td>22</td> </tr> <tr> <td>use</td> <td>28</td> <td>17</td> </tr> <tr> <td>Ts</td> <td>29</td> <td>22</td> </tr> <tr> <td>Val</td> <td>36</td> <td>23</td> </tr> <tr> <td>Zur</td> <td>29</td> <td>22</td> </tr> </tbody> </table> </td> <td> <table> <tr> <td><span><strong>Legend</strong></span></td> </tr> <tr> <td> <table> <thead> <tr> <th>head1</th> <th>head2</th> </tr> </thead> <tbody> <tr> <td>bon</td> <td></td> <td>0 / 25</td> </tr> <tr> <td>Ton</td> <td></td> <td>25 / 50</td> </tr> <tr> <td>Don</td> <td></td> <td>50 / 75</td> </tr> <tr> <td>Con</td> <td></td> <td>75 / 100</td> </tr> <tr> <td>Trs</td> <td></td> <td> 100</td> </tr> </tbody> </table> </td> </tr> </table> </td> </tr> <tr> <td colspan="2"></td> </tr> <tr> <td colspan="2">This is a sample content</td> </tr> <tr> <td colspan="2"></td> </tr> <tr> <td colspan="2">Site : <a href="http://www.yahoo.com" target=" +_blank">www.yahoo.com</a></td> </tr> </table>

    The output should be like, if a table is consists of 3 levels. the level 2 should be at the top of level 1 then level 3 should be at the bottom of level 2.

    The output will be:

    level2

    level3

    level1

    If there will be a table greater than the 3rd level or the deepest level, it will be outputted at the bottom of the 3rd level.

    Example

    level2

    level3

    level4

    level5

    level6

    level7

    level8

    level1

    Hope I explained it well.

      Seriously, this is better achieved using one of the HTML::Parser modules. For example take a look at HTML::TokeParser
      use strict; use warnings; use HTML::TokeParser; my $p = HTML::TokeParser->new("file.html") # your source file ||die "Cant open: $!"; my $depth=0; while (my $token = $p->get_token) { if (lc(${$token}[1]) eq "table"){ $depth++ if (${$token}[0] eq "S"); $depth-- if (${$token}[0] eq "E"); print "$depth\n"; } }
      Try out the code above and see where it takes you

      print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."