in reply to Design hints for a file processor

If you have no other occurences of Category, you can simply search for that:
#!/usr/bin/perl use strict; use warnings; for (<DATA>){ if (m/^\s*Category (.*)$/){ print $1, $/; } }

If you want to really parse the file, I'd recommend a simple recursive descending parser, see Parsing with Regexes and Beyond for an explanation. The tokens would be just the lines.

Replies are listed 'Best First'.
Re^2: Design hints for a file processor
by PhilHibbs (Hermit) on Jul 07, 2008 at 12:18 UTC
    Yes, that's exactly what I do currently - the script is basically a whole load of special cases with no real structure to it. Well, this is my actual code for that:

    $cat = $1 if /^\s+Category "(.+)"/;

    I prefer this notation, I know some people don't.

      If you want structure, use a real parser. Here is one, albeit a bit hacked up:

      It returns a sort of parse tree with an array ref for each block or line, where blocks look like ['BLOCK', $name_of_block, @lines_in_this_block] and lines look like ['LINE', $key, $value].

      Depending on your exact data format and what you want to extract, hashes might be more suitable.

        The file is up to half a gigabyte, I'm not keeping all that in memory. I could split it up by job, I suppose.