in reply to Strip specific html sequence

Hi, you are using the match operator in void context where you want to use a string or a compiled regular expression in a variable assignment:

perl -wE ' my $remove = q{<div><div class="blue"></div></div>}; my $str = q{</div></div><div><div class="blue"></div></div>}; say $str =~ s/$remove//r;'
Or: (update2: ++Laurent_R pointed out that I had the quote operators reversed for string and substring in my OP):
perl -wE ' my $remove = qr{<div><div class="blue"></div></div>}; my $str = q{</div></div><div><div class="blue"></div></div>}; say $str =~ s/$remove//r;'
See http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators.

Note that if you were using warnings Perl would have told you about this:

perl -wE ' my $remove = m/<div><div class="blue"><\/div><\/div>/;' Use of uninitialized value $_ in pattern match (m//) at -e line 1.

(update) As far as your second question, you have to either get the result of the substitution in list context, or use the /r ("result") flag as I have showed above:

$ perl -wE ' my $remove = q{foo}; my $str = q{barfoobaz}; say $str =~ +s/$remove//;' 1 $ perl -wE ' my $remove = q{foo}; my $str = q{barfoobaz}; ( my $x = $s +tr ) =~ s/$remove//; say $x;' barbaz $ perl -wE ' my $remove = q{foo}; my $str = q{barfoobaz}; say $str =~ +s/$remove//r;' barbaz

Also note that you can use any character as quote delimiters in order to avoid "leaning toothpick syndrome" (<\/div>).

Finally, also note that there are modules for working with HTML parsing and processing, and trying to do it yourself with regular expressions is not generally recommended as you are unlikely to anticipate and handle all the edge cases.

Hope this helps!

The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^2: Strip specific html sequence
by koober (Novice) on Dec 10, 2017 at 17:27 UTC

    Thank you for replying. It has moved me forward a little. Upon using:

    my $remove = q{<div><div class="blue"></div></div>};

    the variable then works in the if statement.

    my $str = qr{$line};

    or

    my $str = q{$line}; <p>gives</p> <code>(?^:</div></div><div><div class="blue"></div></div> )

    to the console, and

    ( $line = $str ) =~ s/$remove//;

    gives

    (?^:</div></div> )

    You are right; I did get a warning before but misunderstood it. Now, adding r to the substition gives another warning:

     Useless use of non-destructive substitution (s///r) in void context at lr.pl line 76.

    So I'm still in void context, which is bad, right? And I now have this

    (?^: )

    to learn about. I also tried using

    while (<$HTML>)

    with

    $_

    and writing to a separate file, which is getting warmer, actually removing some of the correct things, but leaving behind

    (?^:</div></div> )

    I'm also still using print because say doesn't work for me; it asks for a package. If that little lot prompts no further clues to anyone I shall read on; thanks for your time on this.

      my $remove = q{<div><div class="blue"></div></div>};

      Don't use quoted string constructors to make regex patterns; use  qr// (update: to make honest-to-goodness regex objects) (see perlop, perlre, perlretut, and perlrequick). Using ordinary quoted string constructors sets you up for future puzzling bugs.

      my $str = q{$line};

      This is a meaningless statement; it just assigns a literal  $line to a string:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $str = q{$line}; print qq{'$str'}; " '$line'

      my $str = qr{$line};

      The problem here is that you seem to be trying to make the entire line you've just read from the file into a pattern. You then remove a piece of the pattern with a substitution:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $remove = qr{ now \s+ brown }xms; ;; my $line = qq{how now brown cow \n}; print qq{<$line>}; ;; my $str = qr{$line}; print $str; ;; ($line = $str) =~ s/$remove//; print qq{<$line>}; " <how now brown cow > (?^:how now brown cow ) <(?^:how cow )>
      Do you see where the extraneous  (?^: ... ) stuff comes from?

      Useless use of non-destructive substitution (s///r) in void context

      You have to use a  s///r substitution in a statement like
          my $new_line_changed = $old_line_not_changed =~ s/$remove//gr;
      (and I would recommend use of the  /g "global" modifier also).

      Update: Changed variable names in last code example to (hopefully!) clarify the point being made.


      Give a man a fish:  <%-{-{-{-<

      Hi, I made an error in the second example I showed above (pointed out to me by ++Laurent_R). I'll correct it in my earlier post. I committed the copy-pasta sin :-(

      When you compile a regexp using qr{} and then print it as a string, you get the output you showed here:

      $ perl -wE 'my $x = qr{ foo }; say $x' (?^u: foo )
      But again, that was only output in your program because I had string and match reversed in my example.

      say can be enabled with -E on the command line for one-liners, or with use feature 'say'; or use 5.010; in your program. It requires Perl 5.10 or newer.


      The way forward always starts with a minimal test.