http://qs1969.pair.com?node_id=261256


in reply to parsing question

Unfortunately your question isn't terribly clear, so I'm not entirely sure what you're looking for.
However, one thing I would suggest - if you want to match after the last occurance of something, it may be easier to apply a regex to a reversed string, e.g:
my $reverse = reverse $line; $reverse =~ s| \w* ; \s* > |>|x; $line = reverse $reverse;

Replies are listed 'Best First'.
Re: Re: parsing question
by Chady (Priest) on May 28, 2003 at 09:30 UTC

    or maybe a greedy regex?

    $line =~ s/^(.*>) ;.*/\1;/;

    He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.

    Chady | http://chady.net/

      In a word, no.

      Reversing the regex is much faster.
      Have a look at these benchmarks:

      #!/usr/bin/perl -w use strict; use Benchmark; my $string = "<<HTML>;nbsp dont_strip_me</HTML>> <xyzfdgfghgf> ;strip_ +me"; sub reversed { my $reverse = reverse(shift); $reverse =~ s| \w* ; \s* > |>|x; return scalar reverse $reverse; } sub greedy { my $line = shift; $line =~ s|^ (.*>) \s* ; \w* |$1|x; return $line; } print "Reversed: ", reversed($string), "\n"; print "Greedy: ", greedy($string), "\n"; timethese( -10,{ reversed => sub { reversed( $string ) }, greedy => sub { greedy( $string ) }, } );

      Output:

      Reversed: <<HTML>;nbsp dont_strip_me</HTML>> <xyzfdgfghgf>
      Greedy: <<HTML>;nbsp dont_strip_me</HTML>> <xyzfdgfghgf>
      Benchmark: running greedy, reversed, each for at least 10 CPU seconds...
          greedy: 10 wallclock secs ( 9.98 usr + 0.02 sys = 10.00 CPU) @ 78480.80/s (n=784808)
        reversed: 11 wallclock secs (10.46 usr + 0.00 sys = 10.46 CPU) @ 167660.04/s (n=1753724)

      As you can see, it's over twice the speed. On longer strings, the difference would be even greater.

      Also, your regex is wrong. Read through perldoc:perlre (specifically, the section marked 'Warning on \1 vs $1') to discover why.