Selecting text between given words (multi-lines (/n) and multiple occurrences)

nylon has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,
The solution proposed in posting 7633 (How do I extract all text between two keywords like start and end?) is not working for me. I have tried it and failed. (probably because I'm a perl newbie) A try with grep failed also (the "s" switch didn't work). Neither do I find other good examples on the net. Can somebody solve the problem?

My problem is the following:

a 9 MB text file
Multiple parts that must be extracted between "labels"
The parts are not nested but successive
The substracted parts must be in a CSV form
It would be nice to have the label names in the csv

e.g.

start\n
blabla\n
...
blabla\n </i>
begin:\n
usefull information1\n
usefull information2\n
usefull information3\n
end:\n
blabla\n
...
blabla\n
\n
start
blabla\n
...
blabla\n
begin:\n
usefull information1\n
end:\n
blabla\n
etc etc
[download]

I hope you can help. Thanks in advance.
Firewall _{edited by ybiC: my best effort at intended formatting, with <code> instead of unbalanced & unopened </p>s}

Comment on Selecting text between given words (multi-lines (/n) and multiple occurrences) Download Code

Replies are listed 'Best First'.
Re: Selecting text between given words (multi-lines (/n) and multiple occurrences) by Paladin (Vicar) on Aug 25, 2003 at 17:29 UTC
Sounds like a good use for the `..` operator in scalar context. `open FH, "file.txt" or die "Couldn't open file.txt: $!"; while (<FH>) { if (/^begin:$/ .. /^end:$/) { print; # or do whatever else you want here. } }` [download]	[reply] [d/l] [select]
Re: Re: Selecting text between given words (multi-lines (/n) and multiple occurrences) by nylon (Acolyte) on Aug 26, 2003 at 08:59 UTC
First things first: Paladin thx :-) How do I get the actual data between begin & end in an output file? When I try, the file stays empty. `#----------------------------------------------------------- #!/usr/bin/perl (This code does not work !!!) # # between.pl <output file><input file><begin word><end word> # open (FH_output, ">> $ARGV[0]") or die "Couldn't open file.txt: $!"; open FH_input, "< $ARGV[1]" or die "Couldn't open file.txt: $!"; while (<FH_input>) { if (/^$ARGV[2]$/ .. /^$ARGV[3]$/) { $info = $_ ; print FH_output $info; } } close (FH_input) ; close (FH_output) ;` [download] Thanks, Firewall (Perl is fun. I'm going to study it some more. But still a long way to go :-)	[reply] [d/l]
Re: Selecting text between given words (multi-lines (/n) and multiple occurrences) by zentara (Cardinal) on Aug 26, 2003 at 15:57 UTC
Maybe you could use a flag to print in-between lines. Since it's a 9 meg file you may not want to slurp the whole file in, so detecting multi-line matches may be troublesome. `#!/usr/bin/perl $goprint=0; while (<>){ if ($_ =~ /^Start(.)/){$goprint=1} if ($_ =~ /^END(.)/){$goprint=0;next} print "$_" if $goprint == 1; }` [download] or if you can slurp `#Here's code that finds everything #between START and END in a paragraph: undef $/; # read in whole file, not just one line while ( <> ) { while (/START(.*?)END/sgm) { # /s makes . cross line boundaries print "$1\n"; } }` [download]	[reply] [d/l] [select]
Re: Re: Selecting text between given words (multi-lines (/n) and multiple occurrences) by nylon (Acolyte) on Aug 29, 2003 at 08:05 UTC
Zentare, Thanks for the help. I created a little script that works fine with your code but if I enters a begin/end-string with spaces (e.g "begin log1", "end log1") it takes the two first words (e.g begin & log1) as the begin and the end words. The use of quotation marks does not work. Thx, Firewall	[reply]
Re: Re: Re: Selecting text between given words (multi-lines (/n) and multiple occurrences) by nylon (Acolyte) on Sep 01, 2003 at 05:44 UTC
Solved the problem :-) I just used getopt with "". That works fine. Thx everyone, Nylon	[reply]