in reply to Regexp and reading a file n-lines at time

If I follow you correctly, you want to read in the raw text, xml tag the components and print it back out?

You have a fairly loose 'format', but if it looks like the example you posted, maybe there are a few workarounds:

use strict; use warnings; ## buffer for holding each Recipe my @buffer; ## read data while(<DATA>){ if (m/^\d+\.\U.+\E$/){ ## begins with a number and is in all upper case ## must be a title, so process what we already have process_buffer(\@buffer); ## reset buffer @buffer = ($_,); } else { push @buffer, $_; } } ## don't miss the last one! process_buffer(\@buffer); ### SUBS ### sub process_buffer{ ## collect buffer my @buffer = @{$_[0]}; ## need at least two lines... return 0 unless (scalar@buffer > 1); ## process it ... } __DATA__ 1.TITLE OF FIRST RECIPE abstract Recipe 1. Recipe 2. Recipe ... Procedure...

I'll leave it up to you how to further break down the sections, but hopefully this is a start.

Are you familiar with the various XML handling modules on CPAN? : XML (I am no expert, but XML::Twig seems popular here, but XML::Quick looks useful for you).

If you want more help on regexes, check out the perldoc tutorials : perlretut

Hope this helps, keep us posted on your progress!

Just a something something...

Replies are listed 'Best First'.
Re^2: Regexp and reading a file n-lines at time
by epimenidecretese (Acolyte) on Feb 01, 2010 at 12:47 UTC

    I'm sorry,I'had maid a big mistake in describing the input data.

    I had written Recipe 1. insted of Ingredient 1..

    Anyway,I've fixed it and I'm working on your tips.I think I will spend some time understanding your code.Thank you very much.

      I guess the main thing to understand is using the buffer to break the 'recipes' up so they can be processed individually - Super Search should find you other examples of this idiom in use. The buffer will contain the lines individually, but they can so joined up or whatever you like!

      There are also many Parse modules on cpan, but i think probably your situation is specific and simple enough to not get confused with those! Anyway, good luck!

      Just a something something...