in reply to Is there a way to make these two regex lines cleaner?
You could replace the first line with a tr///cd, that should be a bit faster. The second line is the usual way to trim whitespace from a string in Perl so it's fine the way it is.
However, "" is the Byte order mark when the file is encoded in UTF-8 but was opened with the incorrect encoding. So instead of that first regex, you probably want to open the file with open my $fh, '<:raw:encoding(UTF-8)', $filename or die "$filename: $!";, and then do a $line =~ s/\A\N{U+FEFF}//; on the first line of the file. This has the major advantage that any other UTF-8 encoded characters in the file will be decoded correctly - meaning you won't get "strange characters", you'll get the correct Unicode characters, assuming no other encoding issues - and this really is the correct way to solve this issue. If you then still want to turn the text into ASCII-only, see e.g. Text::Unidecode.
Updated: A few edits for clarification. Also: If you have further issues with encoding, I have some brief advice on what to post to get the best answers here.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Is there a way to make these two regex lines cleaner?
by bartender1382 (Beadle) on Apr 16, 2022 at 19:17 UTC | |
by haukex (Archbishop) on Apr 16, 2022 at 19:30 UTC | |
by swl (Prior) on Apr 16, 2022 at 23:59 UTC |