rvf has asked for the wisdom of the Perl Monks concerning the following question:

This is my first question to the monks so apologies ahead of time if I break any cardinal rules.

My problem is that I'm trying to parse several files of an odd (to me at least) format. The file begins with an ASCII header which is then followed by binary data (a FITS image to be exact). These values, when viewed in an 80 column display, appear to be on multiple lines, while in fact they are one very long line.

Here is a trimmed down example of the headers(with artificial newlines for readability)...

SIMPLE = T / file does conform to FITS standard BZERO = 32768 / zero point LOISVERS= 'LOIS V1.0.1' / LOIS Version DETECTOR= 'NAVY TI 800x800' / detector name INSTRUM = 'Lowell 10 Filter Wheel' / instrument name FILTNAME= 'V ' / Filter Name END [...lots of whitespace...] [binary data]

Keep in mind that each line you see above is part of a single string that pads each statement with 80 chars to create the illusion of seperate lines. What I need to do is extract certain FOO      = bar values, while ignoring others and tossing out the binary data after END. Some values are in quotes while some are not. I also can't rely on a static length of chars for the values (as INSTRUM demonstrates). I'm somewhat stumped as to what sort of regexp (or something else?) that I need to approach this with - any ideas?

Replies are listed 'Best First'.
Re: Parsing multiple values in a singe line
by extremely (Priest) on Apr 05, 2001 at 23:40 UTC
    use read and set it to 80 chars, then read a line at a time till you match m/END\s{77}/. That leaves your file pointer ready to read the rest of the file in different chunks later. Handy.

    --
    $you = new YOU;
    honk() if $you->love(perl)

Re: Parsing multiple values in a singe line
by thabenksta (Pilgrim) on Apr 05, 2001 at 22:22 UTC

    I think what you need is split

    my @lines = split(/\n/, $datastring); foreach my $line (@lines) { my ($key, $val) = split(/=/, $line); }
    my $name = 'Ben Kittrell'; $name=~s/^(.+)\s(.).+$/\L$1$2/g; my $nick = 'tha' . $name . 'sta';
Re: Parsing multiple values in a singe line
by suaveant (Parson) on Apr 05, 2001 at 22:38 UTC
    if each line is padded to 80 bytes, you could use unpack to split it up individual lines, then attack it from there...
                    - Ant
Re: Parsing multiple values in a singe line
by Anonymous Monk on Apr 05, 2001 at 22:17 UTC
    Assuming that you've read the file into $data, the following should work (untested):
    my @lines = grep(defined, split(/.{80}/, $data)); my %data; foreach (@lines) { last if(/^END /); if(/^([^\s]+)\s*=\s*(.*?)\s*$/) { $data{$1} = $2; } }
    That will get everything to the right of the = into the %data hash. You may need to process this data further (such as removing what look like /comments as well as dealing with the quotes.

      split(/.{80}/, $data)

      Actually, that would store one empty field, throw out the first 80 characters of $data as the first delimiter, store a second empty field, throw away the next 80 characters of $data as the second delimiter, store a third empty field, etc. Finally it would return int(length($data)/80) empty fields followed by the last length($data)%80 characters of $data. Not very useful.

      If you update it to: grep(defined, split(/(.{80})/, $data)) then it would at least return the "delimiters" (which are really the data you want). But empty fields are "" not undef so the grep above is pointless.

      So change that first line to: my @lines = grep(length, split(/(.{80})/, $data)); and you should be close.

              - tye (but my friends call me "Tye")