comment on

OK, I'll try my best at explaining the "unrolling the loop" mechanism (I don't have MRE at hand, so feel free to correct me if I am blatantly wrong!):

I use this technique in 2 cases:

When I want to match a string between 2 1-char delimiters, but the end delimiter can be included in the string through an escape mechanism. For example a double-quoted string that can include the double-quote if backslashed. In this case the "naive" m/"[^"]*"/ would not work.
When I want to match a string between 2 delimiters but the end delimiter is several characters long, such as in C comments (/*comment*/). In this case /*(.*?)*/ would work but it might be slow (I think, I have not benchmarked it).

In the first case here is how you want to match:

the start delimiter,
then an "easy" string (the normal* part): anything that does not include the escape character,
if you find the escape character then you want to skip the next character (the special part), whatever it is, and then match the following "easy" string (the second normal*), until you get to the next escape character or the end delimiter,
finally you match the end delimiter

Now if you want to match a string with a multi-character end delimiter here is how to do it:

match the start delimiter,
match anything until you get to the first character of the end delimiter,
if that character is not followed by the rest of the end delimiter then you can keep on matching "regular" characters until the next time you find that character,
match the end delimiter

A potential pitfall is that you want to make sure you don't consume the characters just after the first character of the end delimiter, or things like **/ (the first character of the end delimiter is there twice in a row, once as a regular character and once as the start of the end delimiter) would not be processed properly.

I guess a couple of examples might be appropriate.

First matching a double-quoted string, double quotes can be escaped using \":

#!/bin/perl -w
use strict;

while( <DATA>)
  { next if(/^\s*#/); # skip comments in DATA
    chomp;
    # split data into string to match and expected result(s)
    my( $string, @expected)= split /\s*=>\s*/;
    while( $string=~ m{"                              # the start deli
+miter
                        ([^\\"]*                      # anything but t
+he end of the string or the escape char
                                (?:\\.                #     the escape
+ char preceeding an escaped char (any char)
                                      [^\\"]*         #     anything b
+ut the end of the string or the escape char
                                             )*)      #     repeat
                                                "}gx) # the end delimi
+ter
      { my $match= $1;
        my $expected= shift @expected;
        unless( $match eq $expected)
          { print "unexpected result line $.: found /$match/, expectin
+g /$expected/\n"; }
      }
  }

__DATA__
# string to match                 => expected results(s)
toto
"a string" tata                   => a string
toto "a string"                   => a string
toto "a string" tata              => a string
toto "a \" string" tata           => a \" string
toto "\" string" tata             => \" string
toto "a\"" tata                   => a\"
toto "\"" tata                    => \"
toto "\"\"" tata                  => \"\"
toto "string 1" "string 2" tata   => string 1 => string 2
toto "string 1                    =>
toto "string 1" "string           => string 1
toto "tata\\" tutu                => tata\\
toto "tata\\\"" tutu              => tata\\\"
[download]

And now how to match C-like comments:

#!/bin/perl -w
use strict;

while( <DATA>)
  { chomp;
    next if(^\s*#); # skip comments
    # split the data into the string to match and the expected result(
+s)
    my( $string, @expected)= split /\s*=>\s*/;
    while( $string=~ m{/\*                              # the delimite
+r
                        ([^*]*                          # anything but
+ the beginning of the delimiter
                                (?:\*(?!>/)             #   the beginn
+ing of the delimiter, not preceeding the rest of the delimiter
                                                        #      (?!>/) 
+means "not before /, do not use the next char)
                                      [^/]*             #   anything b
+ut the beginning of the delimiter
                                             )*)        #   repeat
                                                \*/}gx) # the end of t
+he delimiter
      { my $match= $1;
        my $expected= shift @expected || '';
        unless( $match eq $expected)
          { print "unexpected result line $.: found /$match/, expectin
+g /$expected/\n"; }
      }
  }

__DATA__
# string                  => result(s)
toto                      =>
toto /*foo*/ tata         => foo
/*foo*/ tata              => foo
toto /*foo*/              => foo
toto /*foo*bar*/          => foo*bar
toto /*foo**/             => foo*
toto /**/                 =>
/***/                     => *
/*/*/                     => /
[download]

In reply to Re: Unrolling the loop technique by mirod
in thread Unrolling the loop technique by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.