nylon has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,
The solution proposed in posting 7633 (How do I extract all text between two keywords like start and end?) is not working for me. I have tried it and failed. (probably because I'm a perl newbie) A try with grep failed also (the "s" switch didn't work). Neither do I find other good examples on the net. Can somebody solve the problem?

My problem is the following:
  1. a 9 MB text file
  2. Multiple parts that must be extracted between "labels"
  3. The parts are not nested but successive
  4. The substracted parts must be in a CSV form
  5. It would be nice to have the label names in the csv
e.g.

start\n blabla\n ... blabla\n </i> begin:\n usefull information1\n usefull information2\n usefull information3\n end:\n blabla\n ... blabla\n \n start blabla\n ... blabla\n begin:\n usefull information1\n end:\n blabla\n etc etc

I hope you can help. Thanks in advance.
Firewall

edited by ybiC: my best effort at intended formatting, with <code> instead of unbalanced & unopened </p>s

  • Comment on Selecting text between given words (multi-lines (/n) and multiple occurrences)
  • Download Code

Replies are listed 'Best First'.
Re: Selecting text between given words (multi-lines (/n) and multiple occurrences)
by Paladin (Vicar) on Aug 25, 2003 at 17:29 UTC
    Sounds like a good use for the .. operator in scalar context.
    open FH, "file.txt" or die "Couldn't open file.txt: $!"; while (<FH>) { if (/^begin:$/ .. /^end:$/) { print; # or do whatever else you want here. } }
      First things first: Paladin thx :-)

      How do I get the actual data between begin & end in an output file? When I try, the file stays empty.
      #----------------------------------------------------------- #!/usr/bin/perl (This code does not work !!!) # # between.pl <output file><input file><begin word><end word> # open (FH_output, ">> $ARGV[0]") or die "Couldn't open file.txt: $!"; open FH_input, "< $ARGV[1]" or die "Couldn't open file.txt: $!"; while (<FH_input>) { if (/^$ARGV[2]$/ .. /^$ARGV[3]$/) { $info = $_ ; print FH_output $info; } } close (FH_input) ; close (FH_output) ;
      Thanks,

      Firewall (Perl is fun. I'm going to study it some more. But still a long way to go :-)
Re: Selecting text between given words (multi-lines (/n) and multiple occurrences)
by zentara (Cardinal) on Aug 26, 2003 at 15:57 UTC
    Maybe you could use a flag to print in-between lines. Since it's a 9 meg file you may not want to slurp the whole file in, so detecting multi-line matches may be troublesome.
    #!/usr/bin/perl $goprint=0; while (<>){ if ($_ =~ /^Start(.*)/){$goprint=1} if ($_ =~ /^END(.*)/){$goprint=0;next} print "$_" if $goprint == 1; }
    or if you can slurp
    #Here's code that finds everything #between START and END in a paragraph: undef $/; # read in whole file, not just one line while ( <> ) { while (/START(.*?)END/sgm) { # /s makes . cross line boundaries print "$1\n"; } }
      Zentare, Thanks for the help. I created a little script that works fine with your code but if I enters a begin/end-string with spaces (e.g "begin log1", "end log1") it takes the two first words (e.g begin & log1) as the begin and the end words. The use of quotation marks does not work. Thx, Firewall
        Solved the problem :-) I just used getopt with "". That works fine.

        Thx everyone,

        Nylon