saro has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am novice to perl and in learning mode now. I am trying the usecase of updating the HTML table contents saved in the text file. I need to search for specific strings in the html table and need to get the next occuring 12 strings. Once retrieved, i need to update those 12 strings with new values and update the same back to html table. Below are the steps i tried:
<table class=\"relative-table wrapped\" style=\"width: 100.0%;\"><colg +roup><col style=\"\" /><col style=\"width: 5.3799%;\" /><col style=\" +width: 3.34865%;\" /><col style=\"width: 5.17157%;\" /><col style=\"w +idth: 5.22365%;\" /><col style=\"width: 9.49449%;\" /><col style=\"wi +dth: 20.0674%;\" /><col style=\"\" /><col style=\"width: 5.90074%;\" +/><col style=\"width: 3.45282%;\" /><col style=\"width: 15.1716%;\" / +><col style=\"width: 5.08824%;\" /></colgroup><tbody><tr><td>Envviron +ment</td><td colspan=\"1\">Last Updated</td><td colspan=\"1\">RPD</td +><td colspan=\"1\">Usage</td><td>Status</td><td>ADW</td><td>Schema</t +d><td colspan=\"1\">Wallet</td><td colspan=\"1\">FA Source</td><td>OA +C</td><td>OA4F-GA-BITBUCKET-REPO</td><td>Comments</td></tr><tr><th cl +ass=\"highlight-green\" colspan=\"1\" data-highlight-colour=\"green\" +><a href=\"https://confluence.domaincorp.com/confluence/display/FAW/d +en00pyy\" title=\"\" rel=\"nofollow\">bus00eqa</a></th><th class=\"hi +ghlight-green\" colspan=\"1\" data-highlight-colour=\"green\">25-Apr- +2020</th><th class=\"highlight-green\" colspan=\"1\" data-highlight-c +olour=\"green\">5.6</th><th class=\"highlight-green\" colspan=\"1\" d +ata-highlight-colour=\"green\"><pre title=\"\"><span style=\"color: r +gb(29,28,29);\"><span style=\"color: rgb(23,43,77);text-decoration: n +one;\" title=\"\"><span style=\"color: rgb(23,43,77);text-decoration: + none;\" title=\"\"><span style=\"color: rgb(29,28,29);\">CRM<br /></ +span></span></span></span></pre></th><th class=\"highlight-green\" co +lspan=\"1\" data-highlight-colour=\"green\">Available</th><th class=\ +"highlight-green\" colspan=\"1\" data-highlight-colour=\"green\"><spa +n title=\"\">oax487389368_low</span><br title=\"\" /><strong title=\" +\"> </strong></th><th class=\"highlight-green\" colspan=\"1\" data-hi +ghlight-colour=\"green\"><span style=\"color: rgb(23,43,77);text-deco +ration: none;\" title=\"\"><span style=\"color: rgb(23,43,77);text-de +coration: none;\" title=\"\"><span style=\"color: rgb(29,28,29);\"><s +pan style=\"color: rgb(0,51,102);\" title=\"\"><span style=\"color: r +gb(29,28,29);\"><span style=\"color: rgb(23,43,77);text-decoration: n +one;\" title=\"\"><span style=\"color: rgb(29,28,29);\"><span><span s +tyle=\"color: rgb(23,43,77);text-decoration: none;\" title=\"\"><span + style=\"color: rgb(29,28,29);\"><span>OAX$OAC/U20O2LRDjg9MN9QSySJov9 +2LF7CrZ3<br /><br /></span></span></span></span></span></span></span> +</span><br /></span></span></span></th><th class=\"highlight-green\" +colspan=\"1\" data-highlight-colour=\"green\"><a class=\"external-lin +k\" href=\"http://slc08ulx.us.domain.com:8000/integ1/test3dp3jan31ext +ernalWallet.zip\" title=\"\" rel=\"nofollow\">http://slc08ulx.us.doma +in.com:8000/integ1/test3dp3jan31externalWallet.zip</a></th><th class= +\"highlight-green\" colspan=\"1\" data-highlight-colour=\"green\">FA5 +.0-20.01</th><th class=\"highlight-green\" colspan=\"1\" data-highlig +ht-colour=\"green\">5.0</th><th class=\"highlight-green\" colspan=\"1 +\" data-highlight-colour=\"green\"><a title=\"\" rel=\"nofollow\" cla +ss=\"external-link\" href=\"http://bus00cyb.us.domain.com:8080/job/OA +4F-3.1-BITBUCKET-REPO/25/\">http://bus00cyb.us.domain.com:8080/job/OA +4F-3.1-BITBUCKET-REPO/25/</a></th><th class=\"highlight-green\" colsp +an=\"1\" data-highlight-colour=\"green\">NONE</th></tr><tr><th class= +\"highlight-green\" data-highlight-colour=\"green\"><a rel=\"nofollow +\" href=\"https://confluence.domaincorp.com/confluence/display/FAW/bu +s00eqz\" title=\"\">bus00eqz</a></th><th class=\"highlight-green\" co +lspan=\"1\" data-highlight-colour=\"green\">29-Feb-2020</th><th class +=\"highlight-green\" colspan=\"1\" data-highlight-colour=\"green\">3. +1</th><th class=\"highlight-green\" colspan=\"1\" data-highlight-colo +ur=\"green\"><p title=\"\">HCM</p></th><th class=\"highlight-green\" +data-highlight-colour=\"green\">Available</th><th class=\"highlight-g +reen\" data-highlight-colour=\"green\"><span title=\"\">dbdoublecore< +/span></th><th class=\"highlight-green\" data-highlight-colour=\"gree +n\"><span style=\"color: rgb(23,43,77);text-decoration: none;\" title +=\"\"><span style=\"color: rgb(29,28,29);\"><span><span style=\"color +: rgb(23,43,77);text-decoration: none;\" title=\"\"><span style=\"col +or: rgb(29,28,29);\"><span>C5.632920_OAX$DW/Welcome12345<br /></span> +</span></span></span></span></span></th><th class=\"highlight-green\" + colspan=\"1\" data-highlight-colour=\"green\"><a title=\"\" class=\" +external-link\" href=\"http://slc08ulx.us.domain.com:8000/wallets/faw +dev/Wallet_oax487389368_low1.zip\" rel=\"nofollow\">http://slc08ulx.u +s.domain.com:8000/wallets/fawdev/Wallet_oax487389368_low1.zip</a></th +><th class=\"highlight-green\" colspan=\"1\" data-highlight-colour=\" +green\">FA5.0-20.01</th><th class=\"highlight-green\" data-highlight- +colour=\"green\">5.0</th><th class=\"highlight-green\" data-highlight +-colour=\"green\"><a class=\"external-link\" rel=\"nofollow\" href=\" +http://bus00cyb.us.domain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/10/\"> +http://bus00cyb.us.domain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/10/</a +></th><th class=\"highlight-green\" data-highlight-colour=\"green\">N +ONE</th></tr><tr><td class=\"highlight-green\" title=\"Background col +our : Green\" colspan=\"1\" data-highlight-colour=\"green\"><p title= +\"\"><strong><a rel=\"nofollow\" href=\"https://confluence.domaincorp +.com/confluence/display/FAW/bus00eqz\" title=\"\">den00pyy</a></stron +g></p><p title=\"\"><br /></p></td><td class=\"highlight-green\" titl +e=\"Background colour : Green\" colspan=\"1\" data-highlight-colour=\ +"green\"><strong>25-Apr-2020</strong></td><td class=\"highlight-green +\" title=\"Background colour : Green\" colspan=\"1\" data-highlight-c +olour=\"green\"><strong>5.6</strong></td><td class=\"highlight-green\ +" title=\"Background colour : Green\" colspan=\"1\" data-highlight-co +lour=\"green\"><strong>FPHDEV</strong></td><td class=\"highlight-gree +n\" title=\"Background colour : Green\" colspan=\"1\" data-highlight- +colour=\"green\"><strong>Available</strong></td><td class=\"highlight +-green\" title=\"Background colour : Green\" colspan=\"1\" data-highl +ight-colour=\"green\"><strong>oax487389368_low1</strong></td><td clas +s=\"highlight-green\" title=\"Background colour : Green\" colspan=\"1 +\" data-highlight-colour=\"green\"><strong>test$username/Welcome12345 +</strong></td><td class=\"highlight-green\" title=\"Background colour + : Green\" colspan=\"1\" data-highlight-colour=\"green\"><strong>http +://slc08ulx.us.domain.com:8000/integ1/test3dp3jan31externalWallet.zip +</strong></td><td class=\"highlight-green\" title=\"Background colour + : Green\" colspan=\"1\" data-highlight-colour=\"green\"><p title=\"\ +"><strong>FA5.6-20.01</strong></p></td><td class=\"highlight-green\" +title=\"Background colour : Green\" colspan=\"1\" data-highlight-colo +ur=\"green\"><strong>5.0</strong></td><td class=\"highlight-green\" t +itle=\"Background colour : Green\" colspan=\"1\" data-highlight-colou +r=\"green\"><strong><a rel=\"nofollow\" href=\"http://bus00cyb.us.dom +ain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/10/\" title=\"\">http://bus0 +0cyb.us.domain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/25/</a></strong>< +/td><td class=\"highlight-green\" title=\"Background colour : Green\" + colspan=\"1\" data-highlight-colour=\"green\"><strong>NONE</strong>< +/td></tr></tbody></table>

here is the perl code that i tried to remove the html tags and print only the plain texts

$htmlCode=`cat update_table.txt`; $htmlCode =~ s|<.+?>| |g; $htmlCode =~ s/^\s+|\s+$//g; print "$htmlCode";

I got this output now which is not in proper format.

Envvironment Last Updated RPD Usage Status ADW Schema Wallet F +A Source OAC OA4F-GA-BITBUCKET-REPO Comments bus00eqa 25-Apr +-2020 5.6 CRM Available oax487389368_low + OAX$OAC/U20O2LRDjg9MN9QSySJov92LF7CrZ3 http://sl +c08ulx.us.domain.com:8000/integ1/test3dp3jan31externalWallet.zip FA +5.0-20.01 5.0 http://bus00cyb.us.domain.com:8080/job/OA4F-3.1-BITB +UCKET-REPO/25/ NONE bus00eqz 29-Feb-2020 3.1 HCM Availab +le dbdoublecore C5.632920_OAX$DW/Welcome12345 http +://slc08ulx.us.domain.com:8000/wallets/fawdev/Wallet_oax487389368_low +1.zip FA5.0-20.01 5.0 http://bus00cyb.us.domain.com:8080/job/OA4 +F-3.1-BITBUCKET-REPO/10/ NONE den00pyy 25-Apr-2020 + 5.6 FPHDEV Available oax487389368_low1 test$username/Wel +come12345 http://slc08ulx.us.domain.com:8000/integ1/test3dp3jan31e +xternalWallet.zip FA5.6-20.01 5.0

Now i want to search for a string (eg. bus00eqa) in the output and print its next occuring 12 strings. Based upon the output, i need to update those strings with new values and the same values i need to update in above html table rows. How to achieve this?

Replies are listed 'Best First'.
Re: Update html table rows with new values at runtime
by choroba (Cardinal) on Apr 23, 2020 at 17:58 UTC
    I'd probably use HTML::TableExtract or a real parser (XML::LibXML can parse HTML, too). But for that, I'd need the real HTML table, the one you posted is kind of quoted (note all the backslashes before double quotes). How was the table obtained? Can't you obtain the real HTML? As a poor man's solution, you can just replace each \" with a plain ".
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use HTML::TableExtract; my $te = 'HTML::TableExtract'->new; $te->parse(do { local $/; <> =~ s/\\"/"/gr }); for my $table ($te->tables) { for my $row ($table->rows) { chomp @$row; say "@$row[0, 1]"; } }
    Output:
    Envvironment Last Updated bus00eqa 25-Apr-2020 bus00eqz 29-Feb-2020 den00pyy 25-Apr-2020

    Or, similarly:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use XML::LibXML; my $dom = 'XML::LibXML'->load_html( string => do { local $/; <> =~ s/\\"/"/gr }); for my $row ($dom->findnodes('//table/tbody/tr')) { my @cells = $row->findnodes('td | th'); say join ' ', map $_->textContent, @cells[0, 1]; }

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      I ran rest api to get that html table data in the json format. With that response, i am trying manipulate the html table row values.

      curl -H "Authorization: Basic XXX"  "https://confluence.domainname.com/confluence/rest/api/content/1873691329?expand=body.storage" | python -mjson.tool

        "I ran rest api to get that html table data in the json format."

        You didn't show this, you just provided some messy html. The code below prints out data matching your criteria

        #!/usr/bin/perl use strict; use warnings; use Mojo::DOM; use feature 'say'; my $html = <DATA>; my $dom = Mojo::DOM->new( $html ); foreach my $row ( $dom->at('tr > th > a[href^=bus00eqa/')->parent->par +ent ){ foreach my $e ( $row->find('th')->each ){ say $e->all_text; } }; __DATA__ <table class=\"relative-table wrapped\" style=\"width: 100.0%;\"><colg +roup><col style=\"\" /><col style=\"width: 5.3799%;\" /><col style=\" +width: 3.34865%;\" /><col style=\"width: 5.17157%;\" /><col style=\"w +idth: 5.22365%;\" /><col style=\"width: 9.49449%;\" /><col style=\"wi +dth: 20.0674%;\" /><col style=\"\" /><col style=\"width: 5.90074%;\" +/><col style=\"width: 3.45282%;\" /><col style=\"width: 15.1716%;\" / +><col style=\"width: 5.08824%;\" /></colgroup><tbody><tr><td>Envviron +ment</td><td colspan=\"1\">Last Updated</td><td colspan=\"1\">RPD</td +><td colspan=\"1\">Usage</td><td>Status</td><td>ADW</td><td>Schema</t +d><td colspan=\"1\">Wallet</td><td colspan=\"1\">FA Source</td><td>OA +C</td><td>OA4F-GA-BITBUCKET-REPO</td><td>Comments</td></tr><tr><th cl +ass=\"highlight-green\" colspan=\"1\" data-highlight-colour=\"green\" +><a href=\"https://confluence.domaincorp.com/confluence/display/FAW/d +en00pyy\" title=\"\" rel=\"nofollow\">bus00eqa</a></th><th class=\"hi +ghlight-green\" colspan=\"1\" data-highlight-colour=\"green\">25-Apr- +2020</th><th class=\"highlight-green\" colspan=\"1\" data-highlight-c +olour=\"green\">5.6</th><th class=\"highlight-green\" colspan=\"1\" d +ata-highlight-colour=\"green\"><pre title=\"\"><span style=\"color: r +gb(29,28,29);\"><span style=\"color: rgb(23,43,77);text-decoration: n +one;\" title=\"\"><span style=\"color: rgb(23,43,77);text-decoration: + none;\" title=\"\"><span style=\"color: rgb(29,28,29);\">CRM<br /></ +span></span></span></span></pre></th><th class=\"highlight-green\" co +lspan=\"1\" data-highlight-colour=\"green\">Available</th><th class=\ +"highlight-green\" colspan=\"1\" data-highlight-colour=\"green\"><spa +n title=\"\">oax487389368_low</span><br title=\"\" /><strong title=\" +\"> </strong></th><th class=\"highlight-green\" colspan=\"1\" data-hi +ghlight-colour=\"green\"><span style=\"color: rgb(23,43,77);text-deco +ration: none;\" title=\"\"><span style=\"color: rgb(23,43,77);text-de +coration: none;\" title=\"\"><span style=\"color: rgb(29,28,29);\"><s +pan style=\"color: rgb(0,51,102);\" title=\"\"><span style=\"color: r +gb(29,28,29);\"><span style=\"color: rgb(23,43,77);text-decoration: n +one;\" title=\"\"><span style=\"color: rgb(29,28,29);\"><span><span s +tyle=\"color: rgb(23,43,77);text-decoration: none;\" title=\"\"><span + style=\"color: rgb(29,28,29);\"><span>OAX$OAC/U20O2LRDjg9MN9QSySJov9 +2LF7CrZ3<br /><br /></span></span></span></span></span></span></span> +</span><br /></span></span></span></th><th class=\"highlight-green\" +colspan=\"1\" data-highlight-colour=\"green\"><a class=\"external-lin +k\" href=\"http://slc08ulx.us.domain.com:8000/integ1/test3dp3jan31ext +ernalWallet.zip\" title=\"\" rel=\"nofollow\">http://slc08ulx.us.doma +in.com:8000/integ1/test3dp3jan31externalWallet.zip</a></th><th class= +\"highlight-green\" colspan=\"1\" data-highlight-colour=\"green\">FA5 +.0-20.01</th><th class=\"highlight-green\" colspan=\"1\" data-highlig +ht-colour=\"green\">5.0</th><th class=\"highlight-green\" colspan=\"1 +\" data-highlight-colour=\"green\"><a title=\"\" rel=\"nofollow\" cla +ss=\"external-link\" href=\"http://bus00cyb.us.domain.com:8080/job/OA +4F-3.1-BITBUCKET-REPO/25/\">http://bus00cyb.us.domain.com:8080/job/OA +4F-3.1-BITBUCKET-REPO/25/</a></th><th class=\"highlight-green\" colsp +an=\"1\" data-highlight-colour=\"green\">NONE</th></tr><tr><th class= +\"highlight-green\" data-highlight-colour=\"green\"><a rel=\"nofollow +\" href=\"https://confluence.domaincorp.com/confluence/display/FAW/bu +s00eqz\" title=\"\">bus00eqz</a></th><th class=\"highlight-green\" co +lspan=\"1\" data-highlight-colour=\"green\">29-Feb-2020</th><th class +=\"highlight-green\" colspan=\"1\" data-highlight-colour=\"green\">3. +1</th><th class=\"highlight-green\" colspan=\"1\" data-highlight-colo +ur=\"green\"><p title=\"\">HCM</p></th><th class=\"highlight-green\" +data-highlight-colour=\"green\">Available</th><th class=\"highlight-g +reen\" data-highlight-colour=\"green\"><span title=\"\">dbdoublecore< +/span></th><th class=\"highlight-green\" data-highlight-colour=\"gree +n\"><span style=\"color: rgb(23,43,77);text-decoration: none;\" title +=\"\"><span style=\"color: rgb(29,28,29);\"><span><span style=\"color +: rgb(23,43,77);text-decoration: none;\" title=\"\"><span style=\"col +or: rgb(29,28,29);\"><span>C5.632920_OAX$DW/Welcome12345<br /></span> +</span></span></span></span></span></th><th class=\"highlight-green\" + colspan=\"1\" data-highlight-colour=\"green\"><a title=\"\" class=\" +external-link\" href=\"http://slc08ulx.us.domain.com:8000/wallets/faw +dev/Wallet_oax487389368_low1.zip\" rel=\"nofollow\">http://slc08ulx.u +s.domain.com:8000/wallets/fawdev/Wallet_oax487389368_low1.zip</a></th +><th class=\"highlight-green\" colspan=\"1\" data-highlight-colour=\" +green\">FA5.0-20.01</th><th class=\"highlight-green\" data-highlight- +colour=\"green\">5.0</th><th class=\"highlight-green\" data-highlight +-colour=\"green\"><a class=\"external-link\" rel=\"nofollow\" href=\" +http://bus00cyb.us.domain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/10/\"> +http://bus00cyb.us.domain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/10/</a +></th><th class=\"highlight-green\" data-highlight-colour=\"green\">N +ONE</th></tr><tr><td class=\"highlight-green\" title=\"Background col +our : Green\" colspan=\"1\" data-highlight-colour=\"green\"><p title= +\"\"><strong><a rel=\"nofollow\" href=\"https://confluence.domaincorp +.com/confluence/display/FAW/bus00eqz\" title=\"\">den00pyy</a></stron +g></p><p title=\"\"><br /></p></td><td class=\"highlight-green\" titl +e=\"Background colour : Green\" colspan=\"1\" data-highlight-colour=\ +"green\"><strong>25-Apr-2020</strong></td><td class=\"highlight-green +\" title=\"Background colour : Green\" colspan=\"1\" data-highlight-c +olour=\"green\"><strong>5.6</strong></td><td class=\"highlight-green\ +" title=\"Background colour : Green\" colspan=\"1\" data-highlight-co +lour=\"green\"><strong>FPHDEV</strong></td><td class=\"highlight-gree +n\" title=\"Background colour : Green\" colspan=\"1\" data-highlight- +colour=\"green\"><strong>Available</strong></td><td class=\"highlight +-green\" title=\"Background colour : Green\" colspan=\"1\" data-highl +ight-colour=\"green\"><strong>oax487389368_low1</strong></td><td clas +s=\"highlight-green\" title=\"Background colour : Green\" colspan=\"1 +\" data-highlight-colour=\"green\"><strong>test$username/Welcome12345 +</strong></td><td class=\"highlight-green\" title=\"Background colour + : Green\" colspan=\"1\" data-highlight-colour=\"green\"><strong>http +://slc08ulx.us.domain.com:8000/integ1/test3dp3jan31externalWallet.zip +</strong></td><td class=\"highlight-green\" title=\"Background colour + : Green\" colspan=\"1\" data-highlight-colour=\"green\"><p title=\"\ +"><strong>FA5.6-20.01</strong></p></td><td class=\"highlight-green\" +title=\"Background colour : Green\" colspan=\"1\" data-highlight-colo +ur=\"green\"><strong>5.0</strong></td><td class=\"highlight-green\" t +itle=\"Background colour : Green\" colspan=\"1\" data-highlight-colou +r=\"green\"><strong><a rel=\"nofollow\" href=\"http://bus00cyb.us.dom +ain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/10/\" title=\"\">http://bus0 +0cyb.us.domain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/25/</a></strong>< +/td><td class=\"highlight-green\" title=\"Background colour : Green\" + colspan=\"1\" data-highlight-colour=\"green\"><strong>NONE</strong>< +/td></tr></tbody></table>

        As far as manipulating the DOM goes see Re: Batch remove URLs or Super Search for more examples. If your data is coming from an online source just use Mojo::UserAgent to get it directly.

        Update: A further explanation, using Mojo::UserAgent you can grab your data, and access the resulting DOM (illustrated above by reading in the HTML you provided, stored within __DATA__) like so:

        use Mojo::UserAgent; my $ua = Mojo::UserAgent->new; my $url = 'https://target/url/or/endpoint'; my $dom = $ua->get( $url )->res->dom; # if the result is HTML my $dom = $ua->get( $url )->res->json; #if the result is JSON

        The documentation shows examples of authentication. With the method above you can select the row you wish to edit, rather than being hard coded this selector could be something you prompt for. Some small additional steps are required if you want to change both the href attribute and the link text, but it's not a big deal. say $dom->content will dump the updated DOM, you can simply save this to file or send it on to another endpoint, whatever your end goal is. Like I said, Super Search will find lots of examples. Some more to get you started:

Re: Update html table rows with new values at runtime
by marto (Cardinal) on Apr 23, 2020 at 17:28 UTC

    By 'the next occurring 12 strings' do you mean:

    25-Apr-2020  5.6       CRM        Available   oax487389368_low                  OAX$OAC/U20O2LRDjg9MN9QSySJov92LF7CrZ3                 http://slc08ulx.us.domain.com:8000/integ1/test3dp3jan31externalWallet.zip   FA5.0-20.01  5.0   http://bus00cyb.us.domain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/25/   NONE     bus00eqzM

      I mean including bus00eqa string. The expected output should be like as follows:

      bus00eqa 25-Apr-2020 5.6 CRM Available oax487389368_low OAX$OAC/U20O2LRDjg9MN9QSySJov92LF7CrZ3 http://slc08ulx.us.domain.com:8000/integ1/test3dp3jan31externalWallet.zip FA5.0-20.01 5.0 http://bus00cyb.us.domain.com:8080/job/OA4F-3.1-BITBUCKET-REPO/25/ NONE