I'm still hacking Smart::Comments to print to other than STDERR (as S::C::Any). The original source filter inserts replacement code all on one line, no matter how many statements it inserts or how complex. I find this disagreeable; I'm somewhat compulsive about breaking and indenting statements and expressions for readability.

I don't just want the replacement code readable in ::Any's source; I want it readable on those occasions when I dump the filtered code to screen or file.

So, I broke up the long (sometimes very long) lines. I've used two approaches: inserting literal newlines into the replacement code strings and joining multiple lines with:

join qq{\n}, qq< #some code >, qq< #more code { >, qq< #indented code >, qq< }; >, ;

Literal newlines look better in module source but I'd tolerate the join, although I find it ugly and full of manual work. Both produce equally well-formatted dumps.

Trouble is, both methods fail to preserve caller's line numbering. This throws off error reports and even S::C's own system for formatting smart output. This, I've discovered, is why TheDamian put all replacement code on one line, no matter what.

Is there a solution? I looked at the #line directive. I thought this pseudocode might work:

$saved_line = get_line(?); # get source line number somehow s{whatever;} {whatever; inserted code; and another line of the same; #line $saved_line }g

...but you can see right away why this will fail. The call from S::C to Filter::Simple::FILTER() consists of several global search-and-replace filtering rules; the entire code to be filtered is passed in, and out, as $_. So, as the filter acts, it makes all the replacements (of a given kind) at one time.

Obviously, I can rewrite the entire FILTER block to split $_ into an array and iterate through it, maintaining a private line count, filtering line-by-line, and restoring with a #line directive when needed. There is even some promise of efficiency, since if I do this, I will first do a match (m//, not s///) for each source line and if it doesn't contain '###', next LOOP. Compare this to Vanilla, where an entire source file is subjected to no less than 14 global regex search-and-replace filtering rules. I can also next LOOP on a successful search-replace, since for any given smart comment, only one filtering rule applies.

However, my understanding is that the regex engine is extremely efficient, while explicit iteration may not be so good. There's also the additional work of tearing up the FILTER call and implementing the line-by-line approach.

I'm willing to bite the bullet and do the work to get clean module code, clean source dumps, accurate error reporting, and more time efficiency from the filter. I'm not sure it's worth it for no improvement in efficiency and I'm certainly against paying a big efficiency penalty and doing extra work just for clean-looking code.

join q{ } would make for more readable module code but not help the dumps much. join qq{\t} might help the dumps a little but it's less than ideal.

My fantasy solution has a magic character that produces a newline onscreen but does not increment perl's line counter.

Endorsements? Guesstimates? Alternatives?

Uptades:

2010-06-30:

BrowserUk proposes Perl::Tidy for cleaning up dumps. This is an attractive solution but still requires work; it's not immediately clear how to get the dumps piped through perltidy since most smart output is not valid Perl code. Also, this does nothing towards cleaning up module source. I agree it's worth more than one or two thoughts.

Tidying up, though, is only part of the potential gain from splitting up the to-be-filtered code and managing it line-by-line. I'm hoping a Monk could guess whether to expect a significant improvement in performance.

2010-07-01:

I'm starting to feel a strong pull toward refactoring the FILTER call into a line-by-line approach. This would allow me to test whether a construct was needed before generating it. Something about global replacements over a source file possibly thousands of lines long gives me the willies. Does anybody have a gut feeling about this?

If I don't hear yea or nay, I'm headed toward a fork and benchmark. If iterating through source lines is obviously stupid (for some senior-Monk value of 'obviously'), I'd like to avoid that.

- the lyf so short, the craft so long to lerne -

In reply to Line Numbers in Filtered Code by Xiong

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.