I'm maintaining a web page that organizes a list of items into ranges based on first letter: 0-9, A-E, F-H, etc.
The existing code makes no provision for non-ascii characters and silently passes by any that do not match the current character class: m/^[A-Fa-f]/. The expected input range will be latin-1, but it would be nice to have a place for other characters if they come up.
After reading this thread, and googling, the best option seems to be to use Text::Unidecode to "convert" unicode to ascii before using ascii regexes. This has the advantage of being quick, simple, and ensuring that all items will fall under some category.
But this seems like a common problem, so how have others approached it?
tia, qq
update: added regex snippet for clarity. And typos.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.