Reading chunks of data and then working on it

legend has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reading chunks of data and then working on it by thundergnat (Deacon) on Feb 26, 2008 at 19:45 UTC
Modify the input record separator appropriately to read in a chunk at a time. `use warnings; use strict; $/ = "_END=======\n"; while (<DATA>) { s/========KEYWORD.+\n//g; print '-' x 80, $_, '-' x 80; } __DATA__ ========KEYWORD 1=========== Text text text ========KEYWORD 1_END======= ========KEYWORD 2=========== Text text text ========KEYWORD 2_END======= ========KEYWORD 3=========== Text text text ========KEYWORD 3_END=======` [download]	[reply] [d/l]
Re^2: Reading chunks of data and then working on it by legend (Sexton) on Feb 26, 2008 at 20:18 UTC
Thanks for all the solutions. I'm currently trying out this code. However, how do I do some exclusive regex matching that spans multiple lines in this text that is present in each of the chunks? I mean, We now have the Text, text, text right? Now, I want to find some data from it. For example, consider the following data: 1 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 2 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 3 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ [download] What I want to do is to enclose the specific fields in a tag. For example, grab the title from the first chunk and then enclose it in <title></title> and then body (notice how it spans multiple lines. This is the problem for me now) into <body></body> and so on... Any suggestions on how to get over this one?	[reply] [d/l]
Re^3: Reading chunks of data and then working on it by BrowserUk (Patriarch) on Feb 26, 2008 at 23:15 UTC
Something like this? #! perl -slw use strict; my $data = do{ local $/; <DATA> }; 1 while $data =~ m[ DOCUMENTS \n\n ( .? ) \n\n(?= AUTHOR: ) (?{ print "<title> $^N </title>" }) AUTHOR: ( .? ) \n\n(?= SUBJECT: ) (?{ print "<author> $^N </author>" }) SUBJECT: ( .? ) \n\n(?= BODY: ) (?{ print "<subject> $^N </subject>" }) BODY: ( .? ) (?: \n\n(?=\d+ \s of ) \| $ ) (?{ print "<body> $^N </body>" }) ]xsg; __DATA__ 1 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ 2 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ SUBJECT: ___________________________ ____________________________________ BODY: ______________________________ 3 of 10 DOCUMENTS TITLE HERE AUTHOR: __________ __________________ SUBJECT: ________ BODY: ______________________________ ____________________________________ ____________________________________ ____________________________________ [download] Outputs: <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body> <title> TITLE HERE </title> <author> __________ </author> <subject> ___________________________ ____________________________________ </subject> <body> ______________________________ </body> <title> TITLE HERE </title> <author> __________ __________________ </author> <subject> ________ </subject> <body> ______________________________ ____________________________________ ____________________________________ ____________________________________ </body> [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re: Reading chunks of data and then working on it by Roy Johnson (Monsignor) on Feb 26, 2008 at 19:45 UTC
One way: `#!perl use strict; use warnings; while (<DATA>) { if (my ($bbegin) = /^(=+KEYWORD \d+)/) { my $block_o_text; while (1) { $_ = <DATA>; /^${bbegin}_END/ ? last : ($block_o_text .= $_); } print "Do something with $block_o_text\n"; } } __DATA__ ========KEYWORD 1=========== Text text text ========KEYWORD 1_END======= ========KEYWORD 2=========== Text text text ========KEYWORD 2_END======= ========KEYWORD 3=========== Text text text ========KEYWORD 3_END=======` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re: Reading chunks of data and then working on it by kyle (Abbot) on Feb 26, 2008 at 19:47 UTC
use strict; use warnings; use Data::Dumper; my $keyword; my %text_of; while (<DATA>) { if ( defined $keyword && /^={8}\Q$keyword\E_END={5}/ ) { undef $keyword; next; } if ( /^={8}([^=]+)={5}/ ) { $keyword = $1; next; } $text_of{ $keyword } .= $_ if defined $keyword; } print Dumper \%text_of; __DATA__ ========KEYWORD 1=========== key 1 Text 1 key 1 text 2 key 1 text 3 ========KEYWORD 1_END======= ========KEYWORD 2=========== key 2 Text 1 key 2 text 2 key 2 text 3 ========KEYWORD 2_END======= ========KEYWORD 3=========== key 3 Text 1 key 3 text 2 key 3 text 3 ========KEYWORD 3_END======= [download] Output: `$VAR1 = { 'KEYWORD 1' => 'key 1 Text 1 key 1 text 2 key 1 text 3 ', 'KEYWORD 3' => 'key 3 Text 1 key 3 text 2 key 3 text 3 ', 'KEYWORD 2' => 'key 2 Text 1 key 2 text 2 key 2 text 3 ' };` [download]	[reply] [d/l] [select]
Re: Reading chunks of data and then working on it by igelkott (Priest) on Feb 26, 2008 at 19:50 UTC
Look into the "flip-flop" operator. See Re: matching several lines for an example.	[reply]