in reply to Re: Reading chunks of data and then working on it
in thread Reading chunks of data and then working on it

Thanks for all the solutions. I'm currently trying out this code. However, how do I do some exclusive regex matching that spans multiple lines in this text that is present in each of the chunks? I mean, We now have the Text, text, text right? Now, I want to find some data from it. For example, consider the following data:
1 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 2 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 3 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________
What I want to do is to enclose the specific fields in a tag. For example, grab the title from the first chunk and then enclose it in <title></title> and then body (notice how it spans multiple lines. This is the problem for me now) into <body></body> and so on... Any suggestions on how to get over this one?

Replies are listed 'Best First'.
Re^3: Reading chunks of data and then working on it
by BrowserUk (Patriarch) on Feb 26, 2008 at 23:15 UTC

    Something like this?

    #! perl -slw use strict; my $data = do{ local $/; <DATA> }; 1 while $data =~ m[ DOCUMENTS \n\n ( .*? ) \n\n(?= AUTHOR: ) (?{ print "<title> $^N </title>" }) AUTHOR: ( .*? ) \n\n(?= SUBJECT: ) (?{ print "<author> $^N </author>" }) SUBJECT: ( .*? ) \n\n(?= BODY: ) (?{ print "<subject> $^N </subject>" }) BODY: ( .*? ) (?: \n\n(?=\d+ \s of ) | $ ) (?{ print "<body> $^N </body>" }) ]xsg; __DATA__ 1 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 2 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ 3 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ __________________ SUBJECT: ________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________

    Outputs:

    <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body> <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ </body> <title> TITLE HERE </title> <author> __________ __________________ </author> <subject> ________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body>

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.