legend has asked for the wisdom of the Perl Monks concerning the following question:

I have some data like this:
========KEYWORD 1=========== Text text text ========KEYWORD 1_END======= ========KEYWORD 2=========== Text text text ========KEYWORD 2_END======= ========KEYWORD 3=========== Text text text ========KEYWORD 3_END=======
I want to be able to grab the chunks of text between KEYWORD X to KEYWORD X_END so that I can use regex matching on the whole text. Slurp mode is not being any use. Can someone please suggest me an approach?

Replies are listed 'Best First'.
Re: Reading chunks of data and then working on it
by thundergnat (Deacon) on Feb 26, 2008 at 19:45 UTC

    Modify the input record separator appropriately to read in a chunk at a time.

    use warnings; use strict; $/ = "_END=======\n"; while (<DATA>) { s/========KEYWORD.+\n//g; print '-' x 80, $_, '-' x 80; } __DATA__ ========KEYWORD 1=========== Text text text ========KEYWORD 1_END======= ========KEYWORD 2=========== Text text text ========KEYWORD 2_END======= ========KEYWORD 3=========== Text text text ========KEYWORD 3_END=======
      Thanks for all the solutions. I'm currently trying out this code. However, how do I do some exclusive regex matching that spans multiple lines in this text that is present in each of the chunks? I mean, We now have the Text, text, text right? Now, I want to find some data from it. For example, consider the following data:
      1 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 2 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 3 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________
      What I want to do is to enclose the specific fields in a tag. For example, grab the title from the first chunk and then enclose it in <title></title> and then body (notice how it spans multiple lines. This is the problem for me now) into <body></body> and so on... Any suggestions on how to get over this one?

        Something like this?

        #! perl -slw use strict; my $data = do{ local $/; <DATA> }; 1 while $data =~ m[ DOCUMENTS \n\n ( .*? ) \n\n(?= AUTHOR: ) (?{ print "<title> $^N </title>" }) AUTHOR: ( .*? ) \n\n(?= SUBJECT: ) (?{ print "<author> $^N </author>" }) SUBJECT: ( .*? ) \n\n(?= BODY: ) (?{ print "<subject> $^N </subject>" }) BODY: ( .*? ) (?: \n\n(?=\d+ \s of ) | $ ) (?{ print "<body> $^N </body>" }) ]xsg; __DATA__ 1 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 2 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ 3 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ __________________ SUBJECT: ________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________

        Outputs:

        <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body> <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ </body> <title> TITLE HERE </title> <author> __________ __________________ </author> <subject> ________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body>

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Reading chunks of data and then working on it
by Roy Johnson (Monsignor) on Feb 26, 2008 at 19:45 UTC
    One way:
    #!perl use strict; use warnings; while (<DATA>) { if (my ($bbegin) = /^(=+KEYWORD \d+)/) { my $block_o_text; while (1) { $_ = <DATA>; /^${bbegin}_END/ ? last : ($block_o_text .= $_); } print "Do something with $block_o_text\n"; } } __DATA__ ========KEYWORD 1=========== Text text text ========KEYWORD 1_END======= ========KEYWORD 2=========== Text text text ========KEYWORD 2_END======= ========KEYWORD 3=========== Text text text ========KEYWORD 3_END=======

    Caution: Contents may have been coded under pressure.
Re: Reading chunks of data and then working on it
by kyle (Abbot) on Feb 26, 2008 at 19:47 UTC
    use strict; use warnings; use Data::Dumper; my $keyword; my %text_of; while (<DATA>) { if ( defined $keyword && /^={8}\Q$keyword\E_END={5}/ ) { undef $keyword; next; } if ( /^={8}([^=]+)={5}/ ) { $keyword = $1; next; } $text_of{ $keyword } .= $_ if defined $keyword; } print Dumper \%text_of; __DATA__ ========KEYWORD 1=========== key 1 Text 1 key 1 text 2 key 1 text 3 ========KEYWORD 1_END======= ========KEYWORD 2=========== key 2 Text 1 key 2 text 2 key 2 text 3 ========KEYWORD 2_END======= ========KEYWORD 3=========== key 3 Text 1 key 3 text 2 key 3 text 3 ========KEYWORD 3_END=======

    Output:

    $VAR1 = { 'KEYWORD 1' => 'key 1 Text 1 key 1 text 2 key 1 text 3 ', 'KEYWORD 3' => 'key 3 Text 1 key 3 text 2 key 3 text 3 ', 'KEYWORD 2' => 'key 2 Text 1 key 2 text 2 key 2 text 3 ' };
Re: Reading chunks of data and then working on it
by igelkott (Priest) on Feb 26, 2008 at 19:50 UTC