harry34 has asked for the wisdom of the Perl Monks concerning the following question:

I have the follwing code which extracts data from a file containing 30 data sets having the following pattern (see code). The problem is that it outputs the data in a very long list, I have no idea where the first data set finishes and the second starts.
I need to modify this code so that each time the code sees the string ' Final graph set matrix' it extracts the data (pattern as given in the code) up until it sees the string ' PLUTO4 finished'.
It should then break out the loop and start over separating each block of data by a new line or title e.g. data set1 ..... data setn.
If interested I could send the file from which I am working from.

cheers harry
#!/usr/local/bin/perl $in_filename = "graph_set.out"; open (IN,"$in_filename") or die "Can't open $in_filename:$!\n"; local $/ = undef; # undef record seperator my $string = <IN>; # read whole file into string close (IN); my @gset_match = ($string =~ /([A-Z]\s\d+,\s\d+\(\s*\d+\))/g); open (TEXT, ">graph_set.txt") or die "Can't create graph_set.txt: $!\n +"; foreach $_(@gset_match) { print TEXT "$_\n"; } close TEXT;

Replies are listed 'Best First'.
Re: pattern matching between two specific strings
by Zaxo (Archbishop) on Jul 08, 2003 at 10:23 UTC

    You can break up the input file into records by setting the record seperator,

    #!/usr/local/bin/perl my ($in_filename, @records) = 'graph_set.out'; { local $/ = ' PLUTO4 finished'; my $start = ' Final graph set matrix'; open IN, '<' $in_filename or die "Can't open $in_filename: ", $!; @records = map { substr $_, index( $_, $start) + length( $start} } <IN>; chomp @records; close (IN); }
    Without seeing your data, I can't say much about your regex or how it parses.

    After Compline,
    Zaxo

      I'm geting some errors with this code ?
      Is it possible for me to have your email and I can send you the file I'm working from
Re: pattern matching between two specific strings
by zby (Vicar) on Jul 08, 2003 at 10:42 UTC
    I would do it on line basis (using the .. flip flop operator):
    . . . my $firsttime = 1; while(<IN>){ if(/Final graph set matrix/ .. /PLUTO4 finished/){ if($firstime--){ print "\nnew data set\n"; } my @gset_match = ($string =~ ([A-Z]\s\d+,\s\d+\(\s*\d+\))/g); foreach $_(@gset_match) { print TEXT "$_\n"; } } } . . .
    I feel there should be some more elegant way to print the delimiters, but can't figure it out.
      I could not get the code you provided to work
      I've listed a portion of the file I'm working from which contains numerous datasets having the following format, see below: I need to extract the data between Final graph set matrix and PLUTO4 finished each time it is present in the file and seperated by a title of some sort.

      hope you can help, cheers
      pattern:<br> /([A-Z]\s\d+,\s\d+\(\s*\d+\))/g
      FILE CONTAINS DATA IN THE FOLLOWING FORMAT

      Final graph set matrix
      ----------
      C 1, 1( 9)
      ----------
      C 2, 2(11) C 1, 1( 4)
      ----------
      C 2, 2(18) C 1, 2(11) C 1, 1( 9)
      R 2, 2( 8)
      ----------
      C 1, 2(11) C 2, 2(18) C 2, 2(11) C 1, 1( 4)
      R 2, 2(18)
      PLUTO4 finished


      Final graph set matrix
      ----------
      C 1, 1( 4)
      ----------
      C 2, 2(18) C 1, 1( 4)
      R 2, 2(18)
      PLUTO4 finished


      Final graph set matrix
      ----------
      C 1, 1( 4)
      ----------
      C 2, 2(18) C 1, 1( 4)
      R 2, 2(18)
      ----------
      C 2, 2(11) C 2, 2(11) C 1, 1( 9)
      ----------
      C 2, 2(11) C 2, 2(11) C 2, 2(18) C 1, 1( 9)
      R 2, 2( 4)
      PLUTO4 finished
        Please post the code you derived from mine - I'll see what is the problem with it.