in reply to Get chars between 2 markers using regular expressions

$string="He0Hello~~He2World~~"; while ($string=~m/He\d(\w+)~~/g) { print "$1\n"; }

Update: actually, that's not quite correct, this will only work if your content consists of only alphanumeric or underscore characters. A more precise regex would be:

$string=~m/He\d([^~]+)~~/g

Which will capture all data and break on a single tilde sign. If you need to capture single tildes in your data as well you need

$string=~m/He\d(.+?)~~/g
which is more accurate but slower (because it needs to look ahead).

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

Replies are listed 'Best First'.
Re^2: Get chars between 2 markers using regular expressions
by pKai (Priest) on Dec 06, 2005 at 14:00 UTC
    which is more accurate but slower (because it needs to look ahead).
    According to this benchmark on my machine it performs like this:

    Rate [^~]+ .+? [^~]+ 236537/s -- -22% .+? 303038/s 28% --

    which says, that the non-greedy matchall is even faster than the inverted character class.

      You're only matching the regex once, not collecting all instances of the match. The difference gets more pronounced the longer the string becomes:

      use strict; use warnings; use Benchmark qw(:all); my $string="He0Hello~~He2W~orld~~He0Hello~~He2W~orld~~He0Hello~~He2W~o +rld~~He0Hello~~He2W~orld~~"; my $f; sub invertedCharclass { while($string=~m/He\d([^~]+)~~/g){$f=$1} } sub nonGreedy { while($string=~m/He\d(.+?)~~/g){$f=$1} } cmpthese (-10, { '[^~]+' => \&invertedCharclass, '.+?' => \&nonGreedy, } );
      gives this on my machine:
      Rate .+? [^~]+ .+? 105088/s -- -28% [^~]+ 146676/s 40% --

      Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
Re^2: Get chars between 2 markers using regular expressions
by Anonymous Monk on Dec 06, 2005 at 15:28 UTC
    Your help has been terific, thank you! I understand this better now. The next thing I want to do though kind of extends this...

    He0Title1~~Te1~~Te2~~Te3~~Te4~~He1Title1~~Te5~~Te6~~Te7~~Te8He1Title2~~Te9~~Te10~~Te11~~Te12~~

    Instead of getting what is between HeX and ~~ I need to get what is between Te and ~~, between the two HeX.

    So in this case, i'd have an array 1,2,3,4, an array 5,6,7,8 and an array 9,10,11,12.

    /Te(.+?)~~/ gets what is between Te and ~~ (not always digits)

    What I cannot do is get what is after the ~~ of each HeX, up to the next one (if there is one).../~~(.+?)He/ does until the next He, but at the end of the string there is no He so it misses that bit off.

    Can anyone show me how to do that? Thank yous for your help. Stephen.

      You'll want to make a couple of passes. The first one should split on He\d. That will give you a separate string to turn into each array. For each of those strings, you can use your Te regex to extract what you're looking for.

      Caution: Contents may have been coded under pressure.