in reply to Re: stripping characters from html
in thread stripping characters from html

I agree with keeping the stuff utf8, etc.

s/[^\x00-\x7f]+//g;

may be more readable as (what I believe is the equivalent POSIX class)-

s/[^[:ascii:]]+//g;

Replies are listed 'Best First'.
Re^3: stripping characters from html
by graff (Chancellor) on Aug 04, 2010 at 02:55 UTC
    ... may be more readable as (what I believe is the equivalent POSIX class) ...

    Right -- and I totally agree (and yes I'm pretty sure the POSIX expression is equivalent). But "more readable" can be different things to different people; e.g. a specific numeric range can lead to less uncertainty or doubt, compared to having to recall the exact syntax and meaning of an expression consisting of extra punctuation around a term that tends to be misused or misunderstood by less experienced programmers...

      In this case your argument seems to better support a "more writeable" thesis than the "more readable" thesis you seem to be propounding. I find the POSIX version much more readable than the character range alternative, although I'd be very unlikely to write the POSIX version for exactly the "recall the exact syntax" issue you mention (combined of course with innate laziness).

      True laziness is hard work
      I agree w/ Grandfather on this one. While the 1st version may accurately specify the range, without additional comment it does not convey the purpose of the range. The 2nd version has the advantage of advertising the purpose of the range.