PerlGrok has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Folks:
I'm in a rut over something that should be straightforward but I can't get my head around. Since I've been working with other sections of this utility I'm developing (and some VB *cough*) I've been waiting for some devine inspiration to show me the light. Hopefuly a kind Monk can remove shadows.

Simply, I need to extract a the lines between two 'delimiters' if you will and save the extractions as separate files. The formating of the lines needs to stay in place and such. I found the dotdot (..) and dotdotdot (...) operator in the Cookbook thats finding the area right but I'm lost on capturing the match.

Eg. Input file is like this: The "===" is the delimiter. ================================\n some buncha formatted ascii text\n some buncha formatted ascii text\n some buncha formatted ascii text\n ================================\n second buncha formatted ascii text\n second buncha formatted ascii text\n second buncha formatted ascii text\n ================================\n

I was doing something like this

while(<ARGV>){ next unless ($_ =~ /$delimiter/) #look for the === push (@record, $_) #make a temp array &saveRecord(@record) #call the save sub } or while(<ARGV>){ next unless /$delimiter/ ... /delimiter/ while(#something){ push ... }

Anyways Im lost on how to grab up just the stuff between the delimiters, count em and save em...

Thanks in advance!

Replies are listed 'Best First'.
Re: Range Operator Mysteries
by cephas (Pilgrim) on Dec 01, 2000 at 04:47 UTC
    Well, if you know how many ==='s there are making up the record seperator, you can just set $/ appropriately (don't forget to local() it though). Then you can also use chomp to throw away the record seperator....

    you end up with something like this

    local($/) = '================='; while(<INPUT>) { chomp; #Do something with your record here }


    cephas
Re: Range Operator Mysteries
by repson (Chaplain) on Dec 01, 2000 at 05:09 UTC
    How about this (untested):
    my @data; my $count = 0; while (<FOO>) { /^=+$/ and $count++, next; push @{$data[$count]}, $_; }
    Then you have a 2D array of sections and items in @data.
Re: Range Operator Mysteries
by mrmick (Curate) on Dec 01, 2000 at 08:14 UTC
    How about:
    #Include the newline in the delimiter... $delimiter = "================================\n"; local($/) = $delimiter; my $cnt=1; while(<INPUT>){ my $outfile = "> output_file_" . $cnt; open(OUT,$outfile)||die"Cannot open $outfile\n$!\n"; print OUT; close(OUT); $cnt++; }
    Just a simple and dirty example of what I think you are asking for. I used an iterator for giving unique filenames. You may have other requirements. :-)

    Mick
Re: Range Operator Mysteries
by DrManhattan (Chaplain) on Dec 01, 2000 at 09:26 UTC

    This is untested but looks reasonable.

    #!/usr/bin/perl -w use strict; my $delimiter = qr/================================/; my $counter = 0; # If the first line of every input file is a delimiter, you # can comment this out and the first output file will be # opened on the first iteration of the while() loop. open FH, ">file$counter" die "could not open file$counter: $!"; # Read each file in @ARGV or stdin one line at a time while (<>) { if (/$delimiter/) { # If we hit a delimiter, close the currently # open file, and open a new one. $counter++; close FH; open FH, ">file$counter" or die "could not open file$counter: $!"; } else { # Any line other than a delimiter goes into # the currently open file print FH; } }

    -Matt

Re: Range Operator Mysteries
by jaymoo (Novice) on Dec 01, 2000 at 10:39 UTC
    Well here's my attempt.
    local $/; open(INPUT, shift) or die "Unable to open input file: $!\n"; $input = <INPUT>; close INPUT; @input = split /=+\n/, $input; foreach $count ( 1 .. $#input ) { open(OUT, ">$count.txt") or die "Unable to open output file: $!\n" +; print OUT $input[$count]; close OUT; }
    $count is set to start at 1 to disreguard the first array element (assuming the first line of the file is a delimeter line). It's not the most memory efficient, but it seems to do the trick. -Moo
Re: Range Operator Mysteries
by jeroenes (Priest) on Dec 01, 2000 at 18:04 UTC
    This is not intended as a flame, so no offence, please. Nevertheless, I would like remark the resemblence to a thread from a week ago: Control Flow Puzzle. Dominus' problem demanded a more sophisticated delimiter, but the essence was not very different. Does this mean we need an extra Q/A entry?

    How about Yet Another Solution? I proposed a pop kind'o'solution, then:

    undef $/; $_=<>; s/(\n\037.*?\n\* Menu.*?\n\037)//; $mymenu=$1;
    With a little molding:
    undef $/; $_=<>; while ( s/={10,}(\n.*?\n)(={10,}\n)/$2/s ) { push(@myblocks,$1); } print "I counted ".scalar(@myblocks)." blocks\n"; while ( $myblock=shift(@myblocks) ) { open( OUT, ">$myname" . $i++); print OUT $myblock; close OUT; }
    The use of {10,} makes the code insensitive for the number of ='s. If you would use =+, you would also match ='s that were used in the text blocks. I leave the $2 in place, to facilitate the next match. The question mark ungreedies the wildmark. I haven't run the code, but it seems quite straightforward (aka likely without flaws).

    Have fun,

    Jeroen
    I was dreaming of guitarnotes that would irritate an executive kind of guy (FZ)

    Update: chipmunk pointed to some mistakes. Have been fixed.

      A slight flaw: without the /s modifier on the substitution, this code will only match blocks that are one line long. (There's also a typo with the quoting of the string 'I counted'.)

      I like the use of {10,} to make sure the delimiter is at least a certain length.