Re^2: Reading chunks of data and then working on it

Thanks for all the solutions. I'm currently trying out this code. However, how do I do some exclusive regex matching that spans multiple lines in this text that is present in each of the chunks? I mean, We now have the Text, text, text right? Now, I want to find some data from it. For example, consider the following data:

1 of 10 DOCUMENTS

TITLE HERE

AUTHOR: __________

SUBJECT: ___________________________
____________________________________

BODY: ______________________________
____________________________________
____________________________________
____________________________________

2 of 10 DOCUMENTS

TITLE HERE

AUTHOR: __________

SUBJECT: ___________________________
____________________________________

BODY: ______________________________
____________________________________
____________________________________
____________________________________

3 of 10 DOCUMENTS

TITLE HERE

AUTHOR: __________

SUBJECT: ___________________________
____________________________________

BODY: ______________________________
____________________________________
____________________________________
____________________________________
[download]

What I want to do is to enclose the specific fields in a tag. For example, grab the title from the first chunk and then enclose it in <title></title> and then body (notice how it spans multiple lines. This is the problem for me now) into <body></body> and so on... Any suggestions on how to get over this one?

Comment on Re^2: Reading chunks of data and then working on it Download Code

Replies are listed 'Best First'.
Re^3: Reading chunks of data and then working on it by BrowserUk (Patriarch) on Feb 26, 2008 at 23:15 UTC
Something like this? #! perl -slw use strict; my $data = do{ local $/; <DATA> }; 1 while $data =~ m[ DOCUMENTS \n\n ( .? ) \n\n(?= AUTHOR: ) (?{ print "<title> $^N </title>" }) AUTHOR: ( .? ) \n\n(?= SUBJECT: ) (?{ print "<author> $^N </author>" }) SUBJECT: ( .? ) \n\n(?= BODY: ) (?{ print "<subject> $^N </subject>" }) BODY: ( .? ) (?: \n\n(?=\d+ \s of ) \| $ ) (?{ print "<body> $^N </body>" }) ]xsg; __DATA__ 1 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 2 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ 3 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ __________________ SUBJECT: ________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ [download] Outputs: <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body> <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ </body> <title> TITLE HERE </title> <author> __________ __________________ </author> <subject> ________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body> [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Reading chunks of data and then working on it
by BrowserUk (Patriarch) on Feb 26, 2008 at 23:15 UTC

Something like this?

#! perl -slw
use strict;

my $data = do{ local $/; <DATA> };

1 while $data =~ m[
    DOCUMENTS \n\n  ( .*? )     \n\n(?= AUTHOR: )
        (?{ print "<title> $^N </title>"     })
    AUTHOR:         ( .*? )     \n\n(?= SUBJECT: )
        (?{ print "<author> $^N </author>"   })
    SUBJECT:        ( .*? )     \n\n(?= BODY: )
        (?{ print "<subject> $^N </subject>" })
    BODY:           ( .*? ) (?: \n\n(?=\d+ \s of ) | $ )
        (?{ print "<body> $^N </body>"       })
]xsg;

__DATA__
1 of 10 DOCUMENTS

TITLE HERE

AUTHOR: __________

SUBJECT: ___________________________
____________________________________

BODY: ______________________________
____________________________________
____________________________________
____________________________________

2 of 10 DOCUMENTS

TITLE HERE

AUTHOR: __________

SUBJECT: ___________________________
____________________________________

BODY: ______________________________


3 of 10 DOCUMENTS

TITLE HERE

AUTHOR: __________
__________________

SUBJECT: ________

BODY: ______________________________
____________________________________
____________________________________
____________________________________
[download]

Outputs:

<title> TITLE HERE </title>
<author>  __________ </author>
<subject>  ___________________________
____________________________________ </subject>
<body>  ______________________________
____________________________________
____________________________________
____________________________________ </body>
<title> TITLE HERE </title>
<author>  __________ </author>
<subject>  ___________________________
____________________________________ </subject>
<body>  ______________________________
 </body>
<title> TITLE HERE </title>
<author>  __________
__________________ </author>
<subject>  ________ </subject>
<body>  ______________________________
____________________________________
____________________________________
____________________________________ </body>
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

[reply]
[d/l]
[select]