There's a few packages that might help, such Parse::Yapp and Parse::RecDescent. To use either of these you'll need a reasonable understanding of parser theory, such as what LR(1) means.
Depending on the situation, you may be able to get away with some simple regular expressions, but if you're dealing with a language of any complexity (such as having comment characters embedded in strings), regexes quickly run out of steam.
| [reply] |
Hello Monks, I usually have to parse a lot of data and usually the data would contain comments which should not be look at. Ie /* */ or #. After parsing the data, I should remake a file with most of the modified data (with the original comments). What idea/functions can help me with this type of stuff?
That's an awfully open-ended question. As VSarkiss
suggested, parsing in general is a difficult problem (even
the theory he cited solves a simplified version of the
general problem -- context-dependent parsing, where things
have different meanings depending on where they're used, is
extremely difficult), so a general answer would take a lot
of space (and time!). That said, not all parsing is hard.
For instance, if your data look sort of like
key: value # comment
your "parser" is going to be pretty trivial, something like
# NOTE: untested!
while(<INPUT>) {
my ($key, $val) = /^([^:]+):\s*([\n#]+)/;
&do_stuff_with($key, $val);
}
If your data are a bit more complex, you may still be able
to describe them with a regular expression. Without going
into too much detail, regexes can match data that don't
depend on "nesting" or "counting". (Perl's "regexes" can,
but they're actually more powerful, theoretically speaking,
than what geeks like me call regular expressions.)
Anything more complex than that, and you'll want a real
parser and a lot more theory.
So: what do your data look like?
Update: Oops, forgot to mention something. The
converse problem to parsing (turning a text file into some
sort of data structure) is "pretty-printing" (turning some
sort of data structure into a text file). Pretty-printing
isn't usually considered to be as difficult as parsing
(the hard part about parsing is extracting structure; when
you're pretty-printing something, you know its
structure), but you might run into problems replicating
the comments: most parsers strip out comments from their
input (since comments don't contribute to structure).
--
F
o
x
t
r
o
t
U
n
i
f
o
r
m
Found a typo in this node? /msg me
% man 3 strfry
| [reply] [d/l] [select] |