in reply to RE: form parsing, hex, HTML formatting
in thread form parsing, hex, HTML formatting

Yes, well, the regex actually works.
#!/usr/bin/perl -w my $str = "the %0A quick %2b brown %63 fox %63 jumped %24 over %7e the + %4f%45 lazy %25 dog"; print "MYWAY: ", myway(), "\nREGEXPWAY: ", regexpway(), "\n"; sub myway { $_ = $str; while (/%[0-9A-Fa-f]{2}/) { $_ = $&; /[0-9A-Fa-f]{2}/; $ob[0] = hex($&); $temp = pack("C*", @ob); $name =~ s/%[0-9A-Fa-f]{2}/$temp/; $_ = $name; } return $_; } sub regexpway { $_ = $str; s/%([a-fA-F0-9]{2})/pack("C", hex($1))/eg; return $_; } __END__ chh@scallop test> perl unescape Use of uninitialized value in substitution (s///) at unescape line 14. Use of uninitialized value in pattern match (m//) at unescape line 9. Use of uninitialized value in print at unescape line 5. MYWAY: REGEXPWAY: the quick + brown c fox c jumped $ over ~ the OE lazy % dog chh@scallop test>

Replies are listed 'Best First'.
RE: RE: RE: form parsing, hex, HTML formatting
by muppetBoy (Pilgrim) on May 22, 2000 at 12:30 UTC
    I'm not entirely sure what is going on here.
    Although its a particularly messy piece of code it does actually work! I was concerned by the results you got so I re-ran the code myself and was unable to get the same results.
    Because I am lazy I missed off the -w and use strict in the code I posted (in my actual code everthing is a little more strict). It could be something to do with this that causes the effect you have seen. Although with -w I do not get any errors and it works OK. Try this:
    #!/usr/local/bin/perl -w use strict; my $desc = "I %2Blike %3A cheese"; my ($str, @ob); print "TEST: ",test(),"\n";<br> sub test { $_ = $desc; while (/%[0-9A-Za-z]{2}/) { $_ = $&; /[0-9A-Za-z]{2}/; $ob[0] = hex($&); $str = pack("C*", @ob); $desc =~ s/%[0-9A-Za-z]{2}/$str/; $_ = $desc; } return $desc; } returns: TEST: I +like : cheese
    This definitely works OK for me.
    UPDATE: My apologies, I've just noticed what was wrong with the code that I originally posted. $name is not initially set up - so the loop runs through 1 iteration and fails. As I didn't want to actually change $str I changed the code a little - but forgot check it properly. This should work:
    ... 'myway' => sub { $_ = $str; $name = $str; while (/%[0-9A-Fa-f]{2}/) { $_ = $&; ...
    I guess the lessons learnt are:
    • Always use -w and use strict
    • Check the code works before you post it
    • Write clear, maintainable code
    BTW the new benchmark timings are:
    Benchmark: timing 500000 iterations of myway, regexpway... myway: 162 wallclock secs (162.07 usr + 0.00 sys = 162.07 CPU) regexpway: 58 wallclock secs (58.52 usr + 0.01 sys = 58.53 CPU)
    which make a lot more sense.
    I now feel older, wiser and more than a little bit stupid.
      I figured you had a version that actually did work. FWIW, this is the only thing I could come up with that is as fast as the substitution. Unfortunately, it slows down when you put in the check for legit hex values (and it is much less readable).
      sub noregex { local $_ = $str; my $pos = 0; while ( (my $idx = index($_, '%', $pos)) > -1) { $pos = $idx + 1; my $code = substr($_, $pos, 2); substr $_, $idx, 3, pack("C*", hex $code); } return $_; } __END__ Benchmark: timing 100000 iterations of NO_REGEX, REGEX... NO_REGEX: 8 wallclock secs ( 8.77 usr + 0.00 sys = 8.77 CPU) @ 11 +402.51/s (n=100000) REGEX: 9 wallclock secs ( 9.15 usr + 0.00 sys = 9.15 CPU) @ 10 +928.96/s (n=100000)