in reply to Re: uparse - Parse Unicode strings
in thread uparse - Parse Unicode strings

Wow, very impressive! ... agree with kcott that it deserves its own CUFP page.

I played briefly with your command on Ubuntu using perl v5.38:

~/pm/Tux$ perl -CEO -wE'say "\x{1F468}\x{1F3FD}\x{200D}\x{2708}\x{FE0F}"'
👨🏽‍✈️

~/pm/Tux$ echo -e '\U1F468\U1F3FD\U200D\U2708\UFE0F'
👨🏽‍✈️

AFAICT, the output from the perl -CEO and the bash echo -e commands above is identical, namely:

👨🏽‍✈️

Running this command produced useful output (that seems to match yours), despite the error messages:

~/pm/Tux$ echo -e '\U1F468\U1F3FD\U200D\U2708\UFE0F' | xargs uchar -v
Can't exec "locate": No such file or directory at ~/pm/Tux/uchar line 103.
👨 U1f468 \N{MAN}
🏽 U1f3fd \N{EMOJI MODIFIER FITZPATRICK TYPE-4}
‍ U0200d \N{ZERO WIDTH JOINER}
✈ U02708 \N{AIRPLANE}
️ U0fe0f \N{VARIATION SELECTOR-16}

Using CODE blocks intead of pre:

~/pm/Tux$ echo -e '\U1F468\U1F3FD\U200D\U2708\UFE0F' | xargs uchar -v Can't exec "locate": No such file or directory at ~/pm/Tux/uchar line +103. 👨 U1f468 \N{MAN} 🏽 U1f3fd \N{EMOJI MODIFIER FITZPATRICK TYPE-4} ‍ U0200d \N{ZERO WIDTH JOINER} ✈ U02708 \N{AIRPLANE} ️ U0fe0f \N{VARIATION SELECTOR-16}

👁️🍾👍🦟

Replies are listed 'Best First'.
Re^3: uparse - Parse Unicode strings
by Tux (Canon) on Nov 20, 2023 at 08:54 UTC

    Fetch again. Now guarded. /me wonders how people work on a devel machine without mlocate :)


    Enjoy, Have FUN! H.Merijn

      > /me wonders how people work on a devel machine without mlocate :)

      Let me try to explain why I'd never heard of mlocate. :)

      We run the identical version of Perl with an identical set of CPAN modules on our many different Unix boxes (multiple versions of: AIX, HP-UX, Solaris, Red Hat Enterprise Linux (RHEL), Digital UNIX, Tru64 UNIX, IRIX, UnixWare, SCO Unix, ...).

      -- from Re: putting perl and modules in your source code repository

      When your typical work day for over twenty years has been spread across Windows boxes and many different Unix flavours, you naturally lean towards standard POSIX commands (such as find and xargs), rather than system-specific ones (such as locate/mlocate), because you know they're available out-of-the-box everywhere.

      Better, as indicated at Unix shell versus Perl, is to avoid a motley mix of Unix shell and Windows batch scripts by writing everything in Perl ("It's easier to port a shell than a shell script").

      If I had a job where I spent most of my day on a Linux development machine, it would make sense to invest considerable time in mastering Linux-specific dev tools (interested to learn BTW if you get to spend most of your work day beavering away on a Linux dev box).

      Now that I know about mlocate I might get around to installing it at home on my Ubuntu VM - more likely if you, or some other kind Perl monk, sold me with some examples of how it makes development more enjoyable. :)

      Updated: minor changes to wording.

      👁️🍾👍🦟

        locate (aka mlocate, aka slocate) is a lifesaver. I used it a couple of days ago (admittedly not on a dev box) to solve a very simple problem. Some user of a client's server had been uploading files to a website from a Mac and had managed to litter the filesystem with .DS_Store files. This is a BIG filesystem we're talking about and they were not just using a small fraction of it. My task was to clean up this mess. I could have constructed a find command but coupling that with rm is always dicey especially on a production box, so the slow user would run the find twice, once to check and another to delete. The faster user would run the find once and store the output in a file, eyeball the file and then use xargs from the file to do the deletion.

        The fastest user employs locate instead of find. It produced the output in under a second without hammering the disks and I was able to run it once to check the pattern:

        $ locate /.DS_Store

        and then again (because it's cheap) to clear the files:

        $ locate /.DS_Store | xargs rm

        Having eyeballed the output from the first command I could see that only the requisite files were matched and none of the paths had spaces or anything else that might trip up the pipeline.

        Using it locally on my dev box is equally useful. We may have variously patched versions of code kicking around in several locations and these might require updates. Trivial to search with locate and more flexible than find because the -r option allows for regex-based matching which is so much simpler than trying to wrestle globs for complex patterns.

        Occasionally I have to admin servers where the person who set them up hasn't installed locate and it doesn't take long before I'm swearing at them, installing it and running updatedb just so I can get on with my work. It's an invaluable tool and if it isn't present/available on MSWin32 then that's (yet another) black mark against that particular OS.


        🦛

        That long list is not too dissimilar from what I worked on during various contracts in the past: remove a few, but add in others that are probably now only of interest to museum curators, such as Primix and DG-UX.

        Like you, I tended to focus on having a very sound knowledge of general commands available everywhere; mostly ignoring those that were only useful on some specific systems.

        — Ken

      Thanks! Your new version is working nicely for me now.

      BTW, I found by experimenting that it seems to work fine for my simple needs even without xargs:

      ~/pm/Tux$ echo -e '\U1F468\U1F3FD\U200D\U2708\UFE0F' | xargs ./uchar -v
      👨 U1f468 \N{MAN}
      🏽 U1f3fd \N{EMOJI MODIFIER FITZPATRICK TYPE-4}
      ‍ U0200d \N{ZERO WIDTH JOINER}
      ✈ U02708 \N{AIRPLANE}
      ️ U0fe0f \N{VARIATION SELECTOR-16}
      
      ~/pm/Tux$ ./uchar -v '\U1F468\U1F3FD\U200D\U2708\UFE0F'
      👨 U1f468 \N{MAN}
      🏽 U1f3fd \N{EMOJI MODIFIER FITZPATRICK TYPE-4}
      ‍ U0200d \N{ZERO WIDTH JOINER}
      ✈ U02708 \N{AIRPLANE}
      ️ U0fe0f \N{VARIATION SELECTOR-16}
      

      Update: Note that \U1F3FD (🏽) is EMOJI MODIFIER FITZPATRICK TYPE-4 : skin color modifier character representing skin type 4 from the Fitzpatrick scale, used above to change the skin color of the airline pilot. Also used by Discipulus to change the skin color of man student at Re: Emojis for Perl Monk names (Discipulus and SpaceCowboy and LanX).

      👁️🍾👍🦟