I believe Modern Perl should have a core module that can easily parse these simple Unicode CSV records. It should handle them in any character encoding scheme of Unicode: UTF-8, UTF-16, or UTF-32. And it should handle the Unicode byte order mark seamlessly.
Why not?
🎥Film🎥🎬🎥Year🎥🎬🎥Awards🎥🎬🎥Nominations🎥🎬🎥Director🎥 🎥12 Years a Slave🎥🎬2013🎬3🎬9🎬🎥🎥🎥 Steve McQueen🎥 🎥Argo🎥🎬2012🎬3🎬7🎬🎥🎥🎥 Ben Affleck🎥 🎥The Artist🎥🎬2012🎬5🎬10🎬🎥🎥🎥 Michel Hazanavicius🎥 🎥The King's Speech🎥🎬2010🎬4🎬12🎬🎥🎥🎥 Tom Hooper🎥 🎥The Hurt Locker🎥🎬2009🎬6🎬9🎬🎥🎥🎥 Kathryn Bigelow🎥 🎥Slumdog Millionaire🎥🎬2008🎬8🎬10🎬🎥🎥🎥 Danny Boyle🎥 🎥No Country for Old Men🎥🎬2007🎬4🎬8🎬🎥🎥🎥 Joel Coen 🎥🎥 Ethan Coen🎥 🎥The Departed🎥🎬2006🎬4🎬5🎬🎥🎥🎥 Martin Scorsese🎥sep_char 🎬 U+1F3AC CLAPPER BOARD (UTF-8: F0 9F 8E AC) quote_char 🎥 U+1F3A5 MOVIE CAMERA (UTF-8: F0 9F 8E A5) escape_char 🎥 U+1F3A5 MOVIE CAMERA (UTF-8: F0 9F 8E A5)"Film","Year","Awards","Nominations","Director" "12 Years a Slave",2013,3,9,"🎥 Steve McQueen" "Argo",2012,3,7,"🎥 Ben Affleck" "The Artist",2012,5,10,"🎥 Michel Hazanavicius" "The King's Speech",2010,4,12,"🎥 Tom Hooper" "The Hurt Locker",2009,6,9,"🎥 Kathryn Bigelow" "Slumdog Millionaire",2008,8,10,"🎥 Danny Boyle" "No Country for Old Men",2007,4,8,"🎥 Joel Coen 🎥 Ethan Coen" "The Departed",2006,4,5,"🎥 Martin Scorsese"
I recognize that the current XS core module for parsing CSV records, Text::CSV_XS (marvelously maintained by Tux), may not be the right module to use as the basis for a new, fully Unicode-capable module. But because Perl's native Unicode capabilities exceed those of most other programming languages, Perl should have a proper FSM-based Unicode CSV parser, even if it's pure Perl and not XS.
I long ago accepted that Unicode conformance and comparative slowness go hand in hand 👫. So what? Look what you're trading a few seconds here and there for: the technological foundation of World Peace ☮ and Universal Love 💕.
UPDATE: Removed references to core module. I don't care about that. I just want a Unicode-capable Perl CSV module.
In reply to Re^4: Speeds vs functionality
by Jim
in thread Speeds vs functionality
by Tux
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |