hambo has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
(Example 1)
I have the following lines:
This is some text....\<Bluebird\>..
MBAAAEgAAAQAB
blaah,blaah
=+=+=+=+=+=+=+=+=+=+=+

(Example 2)
but sometimes it looks like this:
This is some text....<B
luebird>..MBAAAEgAAAQAoBA
AAQKAREDSCETRTBDFS
blaah,blaah
=+=+=+=+=+=+=+=+=+=+=+

Now, given this, I am wanting to skip over the text when I find it
(bounded by <Bluebird> and =+=+=+)
So, I have a loop as follows:
LINE: while ( <> ) { next LINE if (m/\.{4}<B.*ird>/s .. /(\+\=\+)$/); print; }
The "next" works fine if the <Bluebird> tag is not split over a newline (as in example(2) above), but fails when the <Bluebird> tag is split over a newline (as in example(1) above).

Can anyone show me what I am doing wrong.

Thanks
H
Open Source - Where everything is QED

20040908 Edit by ysth: code in code tags

  • Comment on Matching a range of lines using a regex (the .. and ... operators)
  • Download Code

Replies are listed 'Best First'.
Re: Matching a range of lines using a regex (the .. and ... operators)
by Zaxo (Archbishop) on Sep 01, 2004 at 12:24 UTC

    The initial condition in your flipflop is never met if the tag is split between lines. You've read the '<B' but not the 'ird'.

    Try setting local $/ = '=+=+=+=+=+=+=+=+=+=+=+'; and admitting optional whitespace between tag characters.

    { local ($/,$_) = ('=+=+=+=+=+=+=+=+=+=+=+'); while (<>) { s/<\s*?B\s*?l\s*?u\s*?e\s*?b\s*?i\s*?r\s*?d\s*?>.*$//s; print } }
    By dealing with the whole chunk, you don't need the flipflop operator.

    After Compline,
    Zaxo

Re: Matching a range of lines using a regex (the .. and ... operators)
by Random_Walk (Prior) on Sep 01, 2004 at 12:18 UTC
    Here is a small change to concat the next line to the current if the current has <B in it (you may even want to change this to $_ .= <> if /</; just to be sure). Should solve your prob and avoids slurping in the entire file at once. You are also dropping the text on the bluebird line preceding the <bluebird> tag, not sure if this is what you want.

    updated

    updated so it actualy reads its data from its data block!

    Regards,
    R.

    #!/usr/local/bin/perl -w use strict; LINE: while ( <DATA> ) { $_ .= <DATA> if /<B/; next LINE if (m/\.{4}<B.*ird>/s .. /(\+\=\+)$/); print; } __END__ (Example 1) This is Example 1 This is some text....<Bluebird>.. MBAAAEgAAAQAB blaah,blaah =+=+=+=+=+=+=+=+=+=+=+ (Example 2) but sometimes it looks like this: This is some text....<B luebird>..MBAAAEgAAAQAoBA AAQKAREDSCETRTBDFS blaah,blaah =+=+=+=+=+=+=+=+=+=+=+

    update

    I fixed this for a case where a line ends in something begining in <B other than a part of <Bluebird> and added another test case in the data. Also note the pattern match given by the OP matches <Blackbird> as well. Anyone know a more ellegant regexp to do /<(B|Bl|Blu|Blue|Blueb|Bluebi|Bluebird)$/ ?
    #!/usr/local/bin/perl -w use strict; LINE: while ( <DATA> ) { $_ .= <DATA> while /<(B|Bl|Blu|Blue|Blueb|Bluebi|Bluebird)$/; next LINE if (m/\.{4}<B.*ird>/s .. /(\+\=\+)$/); print; } __END__ (Example 1) This is Example 1 This is some text....<Bluebird>.. MBAAAEgAAAQAB blaah,blaah =+=+=+=+=+=+=+=+=+=+=+ (Example 2) but sometimes it looks like this: This is some text....<B luebird>..MBAAAEgAAAQAoBA AAQKAREDSCETRTBDFS blaah,blaah =+=+=+=+=+=+=+=+=+=+=+ (Example 3) This is designed to break it ha ha this will make trouble<B lackCrow> This is some text....<B luebird>..MBAAAEgAAAQAoBA AAQKAREDSCETRTBDFS blaah,blaah =+=+=+=+=+=+=+=+=+=+=+
Re: Matching a range of lines using a regex (the .. and ... operators)
by caedes (Pilgrim) on Sep 01, 2004 at 12:18 UTC
    The problem is that your while loop is iterating line-by-line, but you are attempting a multi-line match. Your regex never sees more than one line at a time. In order to do a multi-line match you'll have to slurp the whole file into a scalar and then use the 's' modifier for the regex.

    -caedes

Re: Matching a range of lines using a regex (the .. and ... operators)
by Roger (Parson) on Sep 01, 2004 at 12:33 UTC
    You will have to do some sort of buffering of your input data. Your program read line-by-line and will not handle (by design) tags spanning multiple lines.

    #!/usr/bin/perl -w use strict; my $text = do { local $/; <DATA> }; # read entire file print $text; $text =~ s/<B.*?ird>[^=+]*[=+]*\B//sg; # remove unwanted stuff print $text; __DATA__ (Example 1) I have the following lines: This is some text....<Bluebird>.. MBAAAEgAAAQAB blaah,blaah =+=+=+=+=+=+=+=+=+=+=+ (Example 2) but sometimes it looks like this: This is some text....<B luebird>..MBAAAEgAAAQAoBA AAQKAREDSCETRTBDFS blaah,blaah =+=+=+=+=+=+=+=+=+=+=+

    And the output is as expected:
    (Example 1) I have the following lines: This is some text.... (Example 2) but sometimes it looks like this: This is some text....

Re: Matching a range of lines using a regex (the .. and ... operators)
by ccn (Vicar) on Sep 01, 2004 at 12:21 UTC

    The trouble is not with regexp but with lines to match.

    But we can fix them:

    my $prev = ''; LINE: while ( <> ) { next LINE if "$prev$_" =~ m/\.{4}<B.*ird>/s .. /(\+\=\+)$/; print; $prev = $_; }
Re: Matching a range of lines using a regex (the .. and ... operators)
by Jaap (Curate) on Sep 01, 2004 at 12:21 UTC
    I guess you could remove ALL newlines (\r and \n) from the text first. This assumes you do not need the newlines in the text for anything else.

    If you DO need the newlines, you would want to remove only the newlines between < and Bluebird>, <B and luebird>, <Bl and uebird> etc.