jakeeboy has asked for the wisdom of the Perl Monks concerning the following question:

I was wondering if perl formats could be used in reverse, or rather I want to define a format that a report at work is in. Then have perl only input based upon that format - excluding the defined headers and footers in the report. Is that possible? Thanks for any help. Jake

Replies are listed 'Best First'.
Re: Input Formats
by Dominus (Parson) on Dec 28, 2000 at 09:55 UTC
    Says jakeeboy:
    I was wondering if perl formats could be used in reverse
    Yes; it's called a regex.

    Instead of this:

    FORMAT INPUT = @<<<<<<< Year Sales: @#######.## @||||| @>>>>>>>> .
    You write this:
    ($text1, $sales, $inv_number, $text2) = $input =~ /^(........) Year Sales: (\d\d\d\d\d\d\d\d\.\d\d) (. +.....) (.........)$/;
      After reading japhy's Code Smarter, I thought maybe unpack might do this job more efficiently than a regex? Something like this maybe:
      $record = "abcdefghk Year Sales: 0123456.01 peter x193v 20i39 + "; for (unpack "A9 A17 A10 A5 A6 A1 A*", $record) { $i++; print $_,"\n" if $i%2; }

      I don't really know if unpack would be more efficient or not, but it might show an increase in speed if there was a large record set.

      And just to give an example of how you might start doing this with substr:

      @format=(9, 17, 10, 5, 6, 1, 11); print_record($record, @format); sub print_record { $record = shift; @format = @_; for (@format) { $i++; $value = substr($record, $pointer, $_); $pointer += $_; print "value: '$value'\n" if $i % 2; } }

      This method means that you're liable to have spaces padding variable length data though. Not exactly difficult to get rid of, but not quite as neat.

      I guess it would also be fairly easy to use whatever method you choose to deconstruct a format definition itself.

Re: Input Formats
by chipmunk (Parson) on Dec 28, 2000 at 08:14 UTC
    Although Perl formats can't be used for input, reading in text and extracting data from it is exactly the kind of task that Perl was designed for. Some Perl features that may be especially useful in your script include split, substr, unpack, and of course regular expressions.

    Here's a very simple script that does something like that:

    #!/usr/local/bin/perl -w use strict; my %data; while (<>) { next if /Header/; next if /Footer/; my($id, $name) = split ' ', $_; $data{$id} = $name; }
    I hope that will get you started!
Re: Input Formats
by myocom (Deacon) on Dec 28, 2000 at 09:33 UTC
    If the form is beyond what a regex and/or split can easily do, you might check out Parse::RecDescent, which can parse pretty complex formats.