in reply to Reading structured records from a file

Masem has a very straightforward suggestion, but here is another.

If your records are somehow grouped, say, with an extra newline after the last line of one record and before the first line of the next, you can set the input record separator, $/ = "\n\n", to read each group at once. Then, for each one, split it on newline and assign each to separate hash keys, to wit:

$/ = ""; my @keys = qw(dir name desc date box num1 num2 num3 num4 ); my @recs; while (<>) { # read a group of lines my %rec; # split the line at newlines and assign to a hash slice @rec{@keys} = split /\n/; push @recs, \%rec; }
If you don't have the extra newlines, you can insert them first with a simple one-liner that inserts a newline before any slash that begins a line.

$ perl -pie 's:^/:\n/:' file

Update: Corrected $ to @ in @rec{@keys} and added missing trailing : delimiter in one-liner. Also, changed $/ to "", as suggested in this node.

dmm

If you GIVE a man a fish you feed him for a day
But,
TEACH him to fish and you feed him for a lifetime

Replies are listed 'Best First'.
Re: Re: Reading structured records from a file
by Anonymous Monk on Jan 18, 2002 at 12:52 UTC
    I was thinking just the same as you. But there's some tiny bugs. First, it should be @rec{@keys}. No scalar sigil. Second, having another delimiter than / in a regex often mean that you might forget a trailing delimiter if the last char in the pattern is a slash. That's the case here. You forgot the last colon.

    It could also be worth to mention the special behaviour for $/ = "".

      Whoops!

      Thanks for pointing out those silly errors. You are right about @rec{@keys} -- I actually spotted and corrected that after submitting it for the first time, but I got distracted by something (I was at work) and evidently closed the browser before submitting it again. Ditto the trailing colon in the one-liner.

      Appreciate the catch, however.

      From perlvar:

      input_record_separator HANDLE EXPR
      $INPUT_RECORD_SEPARATOR
      $RS
      $/

      The input record separator, newline by default. Works like awk's RS variable, including treating empty lines as delimiters if set to the null string. (Note: An empty line cannot contain any spaces or tabs.) You may set it to a multi-character string to match a multi-character delimiter, or to undef to read to end of file. Note that setting it to "\n\n" means something slightly different than setting it to "", if the file contains consecutive empty lines. Setting it to "" will treat two or more consecutive empty lines as a single empty line. Setting it to "\n\n" will blindly assume that the next input character belongs to the next paragraph, even if it's a newline. (Mnemonic: / is used to delimit line boundaries when quoting poetry.)

      Update: emphasis added.

      dmm