Re: How best to strip text from a file?

I think the regex you are looking for, in this particular case is:

/^\s+Order ID:([a-zA-Z0-9-]+)\s+fiscal cycle:(\d+)/
[download]

perlrequick perlre

To approach the original problem, I think you should develop a routine which will read a record into a buffer, and have a separate routine which will handle the parsing of one record. You can then use different routines within your parsing "framework" to handle the parsing of different structures.

Some pseudocode:

my $in_rec = 0;
my ($head_re, $tail_re) = (qr/Start of record/, qr/End of record/);
my @record;

while (<>) {
  chomp;
  if ($in_rec) {
    if (/$tail_re/) { 
      $in_rec = 0; 
      parse_record(@record);
    }
    push @record, $_;
  } else {
    if (/$head_re/) {
      $in_rec = 1;
      @record = ();
    }
    push @record, $_ if $in_rec;
  }
}
[download]

I hope this makes sense, also bear in mind this is only pseudocode, trying to demonstrate the logic I would go for, not actual parsing.

Comment on Re: How best to strip text from a file? Select or Download Code

Replies are listed 'Best First'.
Re^2: How best to strip text from a file? by bobdabuilda (Beadle) on Nov 07, 2012 at 02:47 UTC
Thanks for that! I did actually grab a copy of your pseudocode in passing when I noticed your quick reply, so I could start mulling it over, to see how best to fit it in with what I'm doing. I think that that, combined with the code below, I should be able to sort something out. Thanks again :)	[reply]