Xiong has asked for the wisdom of the Perl Monks concerning the following question:

I'm still hacking Smart::Comments to print to other than STDERR (as S::C::Any). The original source filter inserts replacement code all on one line, no matter how many statements it inserts or how complex. I find this disagreeable; I'm somewhat compulsive about breaking and indenting statements and expressions for readability.

I don't just want the replacement code readable in ::Any's source; I want it readable on those occasions when I dump the filtered code to screen or file.

So, I broke up the long (sometimes very long) lines. I've used two approaches: inserting literal newlines into the replacement code strings and joining multiple lines with:

join qq{\n}, qq< #some code >, qq< #more code { >, qq< #indented code >, qq< }; >, ;

Literal newlines look better in module source but I'd tolerate the join, although I find it ugly and full of manual work. Both produce equally well-formatted dumps.

Trouble is, both methods fail to preserve caller's line numbering. This throws off error reports and even S::C's own system for formatting smart output. This, I've discovered, is why TheDamian put all replacement code on one line, no matter what.

Is there a solution? I looked at the #line directive. I thought this pseudocode might work:

$saved_line = get_line(?); # get source line number somehow s{whatever;} {whatever; inserted code; and another line of the same; #line $saved_line }g

...but you can see right away why this will fail. The call from S::C to Filter::Simple::FILTER() consists of several global search-and-replace filtering rules; the entire code to be filtered is passed in, and out, as $_. So, as the filter acts, it makes all the replacements (of a given kind) at one time.

Obviously, I can rewrite the entire FILTER block to split $_ into an array and iterate through it, maintaining a private line count, filtering line-by-line, and restoring with a #line directive when needed. There is even some promise of efficiency, since if I do this, I will first do a match (m//, not s///) for each source line and if it doesn't contain '###', next LOOP. Compare this to Vanilla, where an entire source file is subjected to no less than 14 global regex search-and-replace filtering rules. I can also next LOOP on a successful search-replace, since for any given smart comment, only one filtering rule applies.

However, my understanding is that the regex engine is extremely efficient, while explicit iteration may not be so good. There's also the additional work of tearing up the FILTER call and implementing the line-by-line approach.

I'm willing to bite the bullet and do the work to get clean module code, clean source dumps, accurate error reporting, and more time efficiency from the filter. I'm not sure it's worth it for no improvement in efficiency and I'm certainly against paying a big efficiency penalty and doing extra work just for clean-looking code.

join q{ } would make for more readable module code but not help the dumps much. join qq{\t} might help the dumps a little but it's less than ideal.

My fantasy solution has a magic character that produces a newline onscreen but does not increment perl's line counter.

Endorsements? Guesstimates? Alternatives?

Uptades:

2010-06-30:

BrowserUk proposes Perl::Tidy for cleaning up dumps. This is an attractive solution but still requires work; it's not immediately clear how to get the dumps piped through perltidy since most smart output is not valid Perl code. Also, this does nothing towards cleaning up module source. I agree it's worth more than one or two thoughts.

Tidying up, though, is only part of the potential gain from splitting up the to-be-filtered code and managing it line-by-line. I'm hoping a Monk could guess whether to expect a significant improvement in performance.

2010-07-01:

I'm starting to feel a strong pull toward refactoring the FILTER call into a line-by-line approach. This would allow me to test whether a construct was needed before generating it. Something about global replacements over a source file possibly thousands of lines long gives me the willies. Does anybody have a gut feeling about this?

If I don't hear yea or nay, I'm headed toward a fork and benchmark. If iterating through source lines is obviously stupid (for some senior-Monk value of 'obviously'), I'd like to avoid that.

- the lyf so short, the craft so long to lerne -

Replies are listed 'Best First'.
Re: Line Numbers in Filtered Code
by BrowserUk (Patriarch) on Jun 29, 2010 at 15:53 UTC
    I don't just want the replacement code readable in ::Any's source; I want it readable on those occasions when I dump the filtered code to screen or file.

    I think you're allowing your compulsion get the better of your judgement.

    The biggest single problem with code generation, is the error reporting. The second, is people being tempted into hand editing the generated source. By making it readable, you're simply tempting people to read it. And once they do, it's a small step to just tweaking it here and there...with all the subsequent problems that creates.

    Generated source code should never be seen. You as the author of the module will need to look at it during development, but rather than expending energy on try to format it nicely, why not just Perl::Tidy it when you need to?

    You save yourself a bunch of work on the formatting; avoid creating (and having to solve) another bunch of problems with line numbering; and avoid leading users into the temptation of actually reading the generated code.

    Seems like a 3-way win for the inconvenience of having to tidy.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.