sachaer has asked for the wisdom of the Perl Monks concerning the following question:

I need statistic infomation on lines of code of xml file. (preassumption is the syntax of the xml file is right.) Empty lines and comment lines should not be included. As the syntax for xml files is simple:

<!-- xxx -> as comment lines
or
<!--
xyz xyz xyz
->
as multiple line comment

What I have done is:

read the file line by line
skip empty line, comment line, muiltiline comment lines
close file

Is it enough? should I use xml parser ( remember, I don't need to check for syntax errors) and if so, how?

Thanks ahead.
  • Comment on How to count lines of code for XML file?

Replies are listed 'Best First'.
Re: How to count lines of code for XML file?
by BrowserUk (Patriarch) on Jun 06, 2003 at 19:00 UTC

    Define "line" for this exercise? How many lines do you see in the each of the following?

    <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <title>stuff</title> </head> <body> <div> <para>more stuff</para> </div> </body> </html>

    <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <title> stuff </title> </head> <body> <div> <para> more stuff </para> </div> </body> </html>

    <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html><head><title>stuff</title></head><body><div><para>more stuff</pa +ra></div></body></html>

    Is your answer the same for each, or different?

    If it's the same what is the value of the metric?

    If it's different what is the value of the metric?

    In other words, what are you trying to measure?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: How to count lines of code for XML file?
by graff (Chancellor) on Jun 07, 2003 at 04:42 UTC
    What I have done is: read the file line by line skip empty line, comment line, muiltiline comment lines close file Is it enough?
    I thought you said you needed to report some sort of statistic, too. ;^)

    The preference of whether or not to use a parser would depend on how simple or complex the input data is -- and how consistent or variable it is. Even if you don't need to worry about XML syntax errors (really? are you sure about that?), I would think that if you don't do some sort of sanity checking on the data, your statistics may turn out to be (ahem) inaccurate in ways you might not expect...

    You should just check the docs for one or another XML parser module on CPAN to see whether it's suitable for you, and to see how to use it.

    And... what is it exactly that you are trying to count? Lines? Elements? Tags? Text contained within tags? Depending on your goals, a parsing module could make things a lot easier in the long run.

    If you know the data well, and it's fairly simple and consistent, and your output statistics seem reasonable, then you probably are doing well enough with your current approach; if it ain't broke, don't fix it. (But it's hard for me to say for sure, since you don't show any of your perl code, or any real data, and you don't say whether you're having any problem with it.)

Re: How to count lines of code for XML file?
by hacker (Priest) on Jun 07, 2003 at 12:27 UTC
    It's important to remember that when dealing with SGML, XML, HTML and other "stream" methods of storage, that there really is only one line of data there. It may be visually separated by newlines in your editor or viewer to look like separate "lines", but there really is only one line.

    I think what you want to do is count how many ^ to $ you have, ignoring /^\s*<!-- .*?-->\s*$/ and pure blank lines, /^\s*$/;, and report the result. That shouldn't be that hard at all.