comment on

Hello Monks, I usually have to parse a lot of data and usually the data would contain comments which should not be look at. Ie /* */ or #. After parsing the data, I should remake a file with most of the modified data (with the original comments). What idea/functions can help me with this type of stuff?

That's an awfully open-ended question. As VSarkiss suggested, parsing in general is a difficult problem (even the theory he cited solves a simplified version of the general problem -- context-dependent parsing, where things have different meanings depending on where they're used, is extremely difficult), so a general answer would take a lot of space (and time!). That said, not all parsing is hard. For instance, if your data look sort of like

key: value     # comment
[download]

your "parser" is going to be pretty trivial, something like

# NOTE: untested!
while(<INPUT>) {
  my ($key, $val) = /^([^:]+):\s*([\n#]+)/;
  &do_stuff_with($key, $val);
}
[download]

If your data are a bit more complex, you may still be able to describe them with a regular expression. Without going into too much detail, regexes can match data that don't depend on "nesting" or "counting". (Perl's "regexes" can, but they're actually more powerful, theoretically speaking, than what geeks like me call regular expressions.) Anything more complex than that, and you'll want a real parser and a lot more theory.

So: what do your data look like?

Update: Oops, forgot to mention something. The converse problem to parsing (turning a text file into some sort of data structure) is "pretty-printing" (turning some sort of data structure into a text file). Pretty-printing isn't usually considered to be as difficult as parsing (the hard part about parsing is extracting structure; when you're pretty-printing something, you know its structure), but you might run into problems replicating the comments: most parsers strip out comments from their input (since comments don't contribute to structure).

-- F o x t r o t U n i f o r m Found a typo in this node? /msg me % man 3 strfry

In reply to Re: Help With Parsing and Commenting by FoxtrotUniform
in thread Help With Parsing and Commenting by EchoAngel

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.