Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

How can I parse grouped data?

by Anonymous Monk
on Jul 28, 2005 at 08:41 UTC ( [id://478881]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: How can I parse grouped data?
by anonymized user 468275 (Curate) on Jul 28, 2005 at 09:30 UTC
    OK, I agree it's going too far to ask us to write the code. However, hints seem more than reasonable - here is a starting point:

    First define the grammar that your input obeys in a format that can be read into a parser. For example, if you choose yapp, Parse::Yapp will also tell you how to define the grammar file it needs to parse your input and how to run the relevant lexer and parser. And when you know that, the code generation should be a relatively trivial matter.

    Update: The lexer, albeit simple enough for your example input, also has to be written yourself, but the linked documentation explains that thoroughly. 'Tokens' refer to the named categories of substrings you will return as grammatical elements for the parser - these might be 'TK_DIGITS' or 'TK_ID' etc. and 'values' refer to the actual value found under that category for each execution of your lexer - each such execution therefore returning a single substring and the category under which it falls.

    The documentation doesn't mention that one also normally writes a throw routine to consume comments and whitespace.

    One world, one people

      Thanks for the hint .
Re: How can I parse grouped data?
by dorward (Curate) on Jul 28, 2005 at 09:10 UTC
Re: How can I parse grouped data?
by graff (Chancellor) on Jul 28, 2005 at 17:28 UTC
    The "yapp" style parsing suggestion is good. If you've used that sort of parsing approach before, or want to learn it, you won't regret doing it that way.

    But for some tasks it may be overkill, and this may be one of those tasks. You just want to keep a selected subset of features from the input file and format them a little differently for output. If the input is consistent in its structure and format, like your examples, then here's a different sort of hint:

    { local $/ = "}\n"; # end-of-record string # (might need "}\r\n", if data is CRLF style) while (<>) # this reads a whole multi-line record { my ( $cellname ) = ( /^\s*cell\s+(\S+)/ ); my ( $area ) = ( /area\s+(\d+)/ ); # and similarly for other items of interest... # print according to taste } }
    Read up on the "s" regex modifier and other useful tricks in "perldoc perlre" to work out how you want to handle the other items you're after in each record.

    Of course, if your input file format varies significantly from the examples you showed, this sort of approach will tend to be easy to break. Good luck.

      Hi graff,
      Thanks a lot for your valuable guidance.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://478881]
Approved by anonymized user 468275
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-28 18:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found