Re: Regexp and reading a file n-lines at time

If I follow you correctly, you want to read in the raw text, xml tag the components and print it back out?

You have a fairly loose 'format', but if it looks like the example you posted, maybe there are a few workarounds:

use strict;
use warnings;

## buffer for holding each Recipe
my @buffer;

## read data
while(<DATA>){
    if (m/^\d+\.\U.+\E$/){
        ## begins with a number and is in all upper case
        ## must be a title, so process what we already have
        process_buffer(\@buffer);
        ## reset buffer
        @buffer = ($_,);
    }
    else {
        push @buffer, $_;
    }
}
## don't miss the last one!
process_buffer(\@buffer);

### SUBS ###

sub process_buffer{
    ## collect buffer
    my @buffer = @{$_[0]};
    ## need at least two lines...
    return 0 unless (scalar@buffer > 1);
    ## process it
    ...
    
}
__DATA__
1.TITLE OF FIRST RECIPE

abstract

Recipe 1.
Recipe 2.
Recipe ...

Procedure...
[download]

I'll leave it up to you how to further break down the sections, but hopefully this is a start.

Are you familiar with the various XML handling modules on CPAN? : XML (I am no expert, but XML::Twig seems popular here, but XML::Quick looks useful for you).

If you want more help on regexes, check out the perldoc tutorials : perlretut

Hope this helps, keep us posted on your progress!

Just a something something...

Comment on Re: Regexp and reading a file n-lines at time Download Code

Replies are listed 'Best First'.
Re^2: Regexp and reading a file n-lines at time by epimenidecretese (Acolyte) on Feb 01, 2010 at 12:47 UTC
I'm sorry,I'had maid a big mistake in describing the input data. I had written `Recipe 1.` insted of `Ingredient 1.`. Anyway,I've fixed it and I'm working on your tips.I think I will spend some time understanding your code.Thank you very much.	[reply] [d/l] [select]
Re^3: Regexp and reading a file n-lines at time by BioLion (Curate) on Feb 01, 2010 at 13:01 UTC
I guess the main thing to understand is using the buffer to break the 'recipes' up so they can be processed individually - Super Search should find you other examples of this idiom in use. The buffer will contain the lines individually, but they can so joined up or whatever you like! There are also many Parse modules on cpan, but i think probably your situation is specific and simple enough to not get confused with those! Anyway, good luck! Just a something something...	[reply]