in reply to AutoMagic HTML

Well thanks, but those posts aren't really what I was hoping for -- perhaps I should abstract the problem a bit more to make it more Monk-y?

I have some replacements to perform on a text file which are best done as line-by-line processing on the file.

I have some more which are best done as applying patterns to a block of text which happens to contain newlines.

I could slurp it into an array, work on the array and do some, then join it and work on the others, or start by slurping into one big file and do it with multi-line regexes, which is best?



“Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
M-J D

Replies are listed 'Best First'.
Re: Re: AutoMagic HTML
by Skeeve (Parson) on Jun 24, 2003 at 06:08 UTC
    > which is best?

    Best is what YOU think is best. TMTOWTDI and no one can tell what's the best way if you don't define "best" ;-)
    Is best the code that is

    • fastest
    • shortest
    • least memory consuming
    • clearly to understand
    • at most obfuscated
    • ...

    Having said that, I would do it on a line-by-line basis, something like this

    while (<>) { # replace italics and bold s{^([ib])\s+(.*)}[<$1>$2</$1>]; # find ordered lists if (my $hit= /^\[\s*$/ .. /^\]\s*$/) { if ($hit==1) { print "<ol>\n"; next; } if ($hit=~ /e0/i) { print "</ol>\n"; next; } s[^][<li>]; s[$][</li>]; } }
    This won't work with cascaded, ordered lists, but it's a starting point.
      * fastest * shortest * least memory consuming

      Sorry, good point, I meant "fastest", as this rendering is supposed to happen on the fly as HTML is output to the browser.



      “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
      M-J D
        Trying it a few different ways and testing it out with Benchmark is the best way to know for sure.

        My intuition is that you'll go faster with smaller "bites", since you'll be slinging less data around, so I would suspect that reading it in a line at a time, then using a special sub or state when you need multiline input, would be fastest. But that's just a guess...