Re: Design hints for a file processor

If you have no other occurences of Category, you can simply search for that:

#!/usr/bin/perl
use strict;
use warnings;
for (<DATA>){
    if (m/^\s*Category (.*)$/){
        print $1, $/;
    }
}
[download]

If you want to really parse the file, I'd recommend a simple recursive descending parser, see Parsing with Regexes and Beyond for an explanation. The tokens would be just the lines.

Comment on Re: Design hints for a file processor Select or Download Code

Replies are listed 'Best First'.
Re^2: Design hints for a file processor by PhilHibbs (Hermit) on Jul 07, 2008 at 12:18 UTC
Yes, that's exactly what I do currently - the script is basically a whole load of special cases with no real structure to it. Well, this is my actual code for that: `$cat = $1 if /^\s+Category "(.+)"/;` I prefer this notation, I know some people don't.	[reply] [d/l]
Re^3: Design hints for a file processor by moritz (Cardinal) on Jul 07, 2008 at 12:32 UTC
If you want structure, use a real parser. Here is one, albeit a bit hacked up: Read more... (4 kB) It returns a sort of parse tree with an array ref for each block or line, where blocks look like `['BLOCK', $name_of_block, @lines_in_this_block]` and lines look like `['LINE', $key, $value]`. Depending on your exact data format and what you want to extract, hashes might be more suitable.	[reply] [d/l] [select]
Re^4: Design hints for a file processor by PhilHibbs (Hermit) on Jul 07, 2008 at 13:13 UTC
The file is up to half a gigabyte, I'm not keeping all that in memory. I could split it up by job, I suppose.	[reply]
Re^5: Design hints for a file processor by moritz (Cardinal) on Jul 07, 2008 at 13:21 UTC
Re^6: Design hints for a file processor by PhilHibbs (Hermit) on Jul 09, 2008 at 12:51 UTC