in reply to Reading chunks of data and then working on it

Modify the input record separator appropriately to read in a chunk at a time.

use warnings; use strict; $/ = "_END=======\n"; while (<DATA>) { s/========KEYWORD.+\n//g; print '-' x 80, $_, '-' x 80; } __DATA__ ========KEYWORD 1=========== Text text text ========KEYWORD 1_END======= ========KEYWORD 2=========== Text text text ========KEYWORD 2_END======= ========KEYWORD 3=========== Text text text ========KEYWORD 3_END=======

Replies are listed 'Best First'.
Re^2: Reading chunks of data and then working on it
by legend (Sexton) on Feb 26, 2008 at 20:18 UTC
    Thanks for all the solutions. I'm currently trying out this code. However, how do I do some exclusive regex matching that spans multiple lines in this text that is present in each of the chunks? I mean, We now have the Text, text, text right? Now, I want to find some data from it. For example, consider the following data:
    1 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 2 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 3 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________
    What I want to do is to enclose the specific fields in a tag. For example, grab the title from the first chunk and then enclose it in <title></title> and then body (notice how it spans multiple lines. This is the problem for me now) into <body></body> and so on... Any suggestions on how to get over this one?

      Something like this?

      #! perl -slw use strict; my $data = do{ local $/; <DATA> }; 1 while $data =~ m[ DOCUMENTS \n\n ( .*? ) \n\n(?= AUTHOR: ) (?{ print "<title> $^N </title>" }) AUTHOR: ( .*? ) \n\n(?= SUBJECT: ) (?{ print "<author> $^N </author>" }) SUBJECT: ( .*? ) \n\n(?= BODY: ) (?{ print "<subject> $^N </subject>" }) BODY: ( .*? ) (?: \n\n(?=\d+ \s of ) | $ ) (?{ print "<body> $^N </body>" }) ]xsg; __DATA__ 1 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 2 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ 3 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ __________________ SUBJECT: ________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________

      Outputs:

      <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body> <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ </body> <title> TITLE HERE </title> <author> __________ __________________ </author> <subject> ________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body>

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.