Re: Hex regex fails in subroutine

What encoding do you use to save the file? When I save it as as Windows-1252 and change it to actually do something, the string gets replaced:

La Conner&#8230;.Shelter Bay Fee Simple
[download]

Here's how the important bytes are represented:

00000000: 2321 2f75 7372 2f62 696e 2f70 6572 6c0a  #!/usr/bin/perl.
00000010: 7573 6520 7374 7269 6374 3b0a 7573 6520  use strict;.use 
00000020: 7761 726e 696e 6773 3b0a 0a0a 6d79 2024  warnings;...my $
00000030: 636c 6d6e 4e6d 203d 206d 7920 2463 6c6d  clmnNm = my $clm
00000040: 6e56 616c 203d 2027 4c61 2043 6f6e 6e65  nVal = 'La Conne
00000050: 72e2 80a6 2e53 6865 6c74 6572 2042 6179  r....Shelter Bay
00000060: 2046 6565 2053 696d 706c 6527 3b0a 6d79   Fee Simple';.my
[download]

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Comment on Re: Hex regex fails in subroutine Select or Download Code

Replies are listed 'Best First'.
Re^2: Hex regex fails in subroutine by Version7 (Novice) on Sep 30, 2023 at 00:34 UTC
It's UTF8. The data I receive is from a rest api so CP1252 characters are UTF8 encoded. This data is going into a database that will be used for the web with UFT8 encoding so I need to replace these characters accordingly.	[reply]
Re^3: Hex regex fails in subroutine by NERDVANA (Priest) on Sep 30, 2023 at 07:16 UTC
If the data is always UTF-8 encoded, it might save some effort to decode that first and then look for the problem characters? BTW, you can do all this in a single pass, if performance matters. `sub convert_to_html_entities { my $str= shift; utf8::decode($str); $str =~ s/[\x{201A}-\x{2122}]/ '&#'.ord($&).';' /ger; }` [download] You could even just wholesale replace all non-ascii characters to completely sidestep the encoding problem: `sub convert_nonascii_to_html_entities { my $str= shift; utf8::decode($str); $str =~ s/[^\x20-\x7E]/ '&#'.ord($&).';' /ger; }` [download]	[reply] [d/l] [select]
Re^4: Hex regex fails in subroutine by AnomalousMonk (Archbishop) on Sep 30, 2023 at 13:40 UTC
See also haukex's article on dynamic regex alternations. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re^5: Hex regex fails in subroutine by NERDVANA (Priest) on Sep 30, 2023 at 22:50 UTC