in reply to Re^2: uparse - Parse Unicode strings
in thread uparse - Parse Unicode strings

Fetch again. Now guarded. /me wonders how people work on a devel machine without mlocate :)


Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^4: uparse - Parse Unicode strings (locate/find/xargs)
by eyepopslikeamosquito (Archbishop) on Dec 02, 2023 at 10:07 UTC

    > /me wonders how people work on a devel machine without mlocate :)

    Let me try to explain why I'd never heard of mlocate. :)

    We run the identical version of Perl with an identical set of CPAN modules on our many different Unix boxes (multiple versions of: AIX, HP-UX, Solaris, Red Hat Enterprise Linux (RHEL), Digital UNIX, Tru64 UNIX, IRIX, UnixWare, SCO Unix, ...).

    -- from Re: putting perl and modules in your source code repository

    When your typical work day for over twenty years has been spread across Windows boxes and many different Unix flavours, you naturally lean towards standard POSIX commands (such as find and xargs), rather than system-specific ones (such as locate/mlocate), because you know they're available out-of-the-box everywhere.

    Better, as indicated at Unix shell versus Perl, is to avoid a motley mix of Unix shell and Windows batch scripts by writing everything in Perl ("It's easier to port a shell than a shell script").

    If I had a job where I spent most of my day on a Linux development machine, it would make sense to invest considerable time in mastering Linux-specific dev tools (interested to learn BTW if you get to spend most of your work day beavering away on a Linux dev box).

    Now that I know about mlocate I might get around to installing it at home on my Ubuntu VM - more likely if you, or some other kind Perl monk, sold me with some examples of how it makes development more enjoyable. :)

    Updated: minor changes to wording.

    👁️🍾👍🦟

      locate (aka mlocate, aka slocate) is a lifesaver. I used it a couple of days ago (admittedly not on a dev box) to solve a very simple problem. Some user of a client's server had been uploading files to a website from a Mac and had managed to litter the filesystem with .DS_Store files. This is a BIG filesystem we're talking about and they were not just using a small fraction of it. My task was to clean up this mess. I could have constructed a find command but coupling that with rm is always dicey especially on a production box, so the slow user would run the find twice, once to check and another to delete. The faster user would run the find once and store the output in a file, eyeball the file and then use xargs from the file to do the deletion.

      The fastest user employs locate instead of find. It produced the output in under a second without hammering the disks and I was able to run it once to check the pattern:

      $ locate /.DS_Store

      and then again (because it's cheap) to clear the files:

      $ locate /.DS_Store | xargs rm

      Having eyeballed the output from the first command I could see that only the requisite files were matched and none of the paths had spaces or anything else that might trip up the pipeline.

      Using it locally on my dev box is equally useful. We may have variously patched versions of code kicking around in several locations and these might require updates. Trivial to search with locate and more flexible than find because the -r option allows for regex-based matching which is so much simpler than trying to wrestle globs for complex patterns.

      Occasionally I have to admin servers where the person who set them up hasn't installed locate and it doesn't take long before I'm swearing at them, installing it and running updatedb just so I can get on with my work. It's an invaluable tool and if it isn't present/available on MSWin32 then that's (yet another) black mark against that particular OS.


      🦛

        $ locate /.DS_Store

        and then again (because it's cheap) to clear the files:

        $ locate /.DS_Store | xargs rm

        find can do the same, without needing a possibly outdated database:

        find / -name .DS_Store -type f

        (-print is implicit, -type f restricts to regular files)

        And then, to avoid various traps with "funny" path names, pass found path names around ASCII-NUL separated:

        find / -name .DS_Store -type f -print0 | xargs -0 rm

        Or invoke rm directly from find:

        find / -name .DS_Store -type f -exec rm \;

        (Backslash or quotes around the semicolon are needed in bash)

        The same, but be smarter (like xargs, collect arguments instead of invoking rm for every single file):

        find / -name .DS_Store -type f -exec rm {} \+

        (Again, backslash or quotes around the plus are needed in bash)

        Or have find delete the file without forking a separate process:

        find / -name .DS_Store -type f -delete

        Optionally show what is deleted while deleting:

        find / -name .DS_Store -type f -print -delete

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      That long list is not too dissimilar from what I worked on during various contracts in the past: remove a few, but add in others that are probably now only of interest to museum curators, such as Primix and DG-UX.

      Like you, I tended to focus on having a very sound knowledge of general commands available everywhere; mostly ignoring those that were only useful on some specific systems.

      — Ken

Re^4: uparse - Parse Unicode strings
by eyepopslikeamosquito (Archbishop) on Nov 20, 2023 at 11:57 UTC

    Thanks! Your new version is working nicely for me now.

    BTW, I found by experimenting that it seems to work fine for my simple needs even without xargs:

    ~/pm/Tux$ echo -e '\U1F468\U1F3FD\U200D\U2708\UFE0F' | xargs ./uchar -v
    👨 U1f468 \N{MAN}
    🏽 U1f3fd \N{EMOJI MODIFIER FITZPATRICK TYPE-4}
    ‍ U0200d \N{ZERO WIDTH JOINER}
    ✈ U02708 \N{AIRPLANE}
    ️ U0fe0f \N{VARIATION SELECTOR-16}
    
    ~/pm/Tux$ ./uchar -v '\U1F468\U1F3FD\U200D\U2708\UFE0F'
    👨 U1f468 \N{MAN}
    🏽 U1f3fd \N{EMOJI MODIFIER FITZPATRICK TYPE-4}
    ‍ U0200d \N{ZERO WIDTH JOINER}
    ✈ U02708 \N{AIRPLANE}
    ️ U0fe0f \N{VARIATION SELECTOR-16}
    

    Update: Note that \U1F3FD (🏽) is EMOJI MODIFIER FITZPATRICK TYPE-4 : skin color modifier character representing skin type 4 from the Fitzpatrick scale, used above to change the skin color of the airline pilot. Also used by Discipulus to change the skin color of man student at Re: Emojis for Perl Monk names (Discipulus and SpaceCowboy and LanX).

    👁️🍾👍🦟