My goal is minification rather than obfuscation, but the more I try to minify it, the more it falls into this category :-)

I want to reproduce the output of the unix utility hexdump -C.

sub hexdump($data) { $data =~ s/\G(?| ( \0{16} )+ (?{ '*' }) | ( .{1,16} ) (?{ sprintf '%08X %-50s|%-16s|', $-[1], join(' ', unpack('(H2)8a0(H2)8', $1)), $1 =~ tr{[\0-\x1F\x7F-\xFF]}{.}r }) )/$^R\n/xgr; }

Do any fellow monks see opportunities for additional savings? Obviously whitespace can be removed, but I'm leaving it in for ease of discussion.

Replies are listed 'Best First'.
Re: hexdump -C
by tybalt89 (Monsignor) on Oct 13, 2025 at 15:25 UTC

    Here's a couple of tweaks...

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11166492 use warnings; use Path::Tiny; use feature 'signatures'; my $file = path('../tmp.tmp'); # FIXME for testing system "hexdump -C $file; echo"; print hexdump( $file->slurp ); sub hexdump($data) { $data =~ s/\G(?| ( \0{16} )+ (?{ '*' }) | ( .{1,16} ) (?{ sprintf '%08X %-50s|%s|', $-[1], "@{[unpack '(H2)8a0(H2)8', $1]}", $1 =~ y{ -~}{.}cr }) )/$^R\n/xsgr . sprintf "%08x\n", length $data; }

    Outputs:

    00000000 6f 6e 65 0a 74 77 6f 0a 74 68 72 65 65 20 91 92 |one.two.t +hree ..| 00000010 20 66 6f 75 72 0a 66 69 76 65 0a 6f 6e 65 0a 74 | four.fiv +e.one.t| 00000020 77 6f 0a 74 68 72 65 65 20 2c 2c 20 66 6f 75 72 |wo.three +,, four| 00000030 0a 66 69 76 65 0a |.five.| 00000036 00000000 6f 6e 65 0a 74 77 6f 0a 74 68 72 65 65 20 91 92 |one.two.t +hree ..| 00000010 20 66 6f 75 72 0a 66 69 76 65 0a 6f 6e 65 0a 74 | four.fiv +e.one.t| 00000020 77 6f 0a 74 68 72 65 65 20 2c 2c 20 66 6f 75 72 |wo.three +,, four| 00000030 0a 66 69 76 65 0a |.five.| 00000036
      Nice! I forgot that y/// existed, and didn't know about the /c flag.
Re: hexdump -C
by ikegami (Patriarch) on Oct 14, 2025 at 01:22 UTC

    The use of (?{}) doesn't help.

    s/\G(?|(\0{16})+(?{'*'})|(.{1,16})(?{...}))/$^R\n/sgr What you used. s/\G(\0{16})|\G.{1,16}/($1?"*":...)."\n"/segr Could be used. s/\G.{1,16}/($&eq"\0"x16?"*":...)."\n"/segr What I used.

    Combining this change and tybalt89's changes, we get

    #!/usr/bin/perl use v5.36; use Path::Tiny qw( path ); sub hexdump( $data ) { $data =~ s/\G.{1,16}/ ( $& eq "\0"x16 ? "*" : sprintf "%08X %-50s|%s|", $-[0], "@{[unpack q{(H2)8a0(H2)8}, $&]}", $& =~ y{ -~}{.}cr ) . "\n" /segr . sprintf "%08x\n", length $data } print hexdump( path( $0 )->slurp );
      Actually it was solving the problem that the run of \0 lines needs to be replaced by a single '*', not one '*' per every 16 zeroes.

      But, as I went to create an example for you, I realized I misunderstood the '*' behavior. It prints one full line of 00 00 00 ... and then the '*' means "repeat the previous line". I'd only ever seen it replace zeroes with '*' because that's the most likely repetition to appear in data files.

      $ perl -E 'say "\0"x64 ."A"x64' | hexdump -C 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |......... +.......| * 00000040 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAA +AAAAAAA| * 00000080 0a |.| 00000081

      Combining everyone's ideas, I now get:

      sub hexdump($data) { $data =~ s/\G(.{1,16})(\1+)?/ sprintf "%08X %-50s|%s|\n%s", $-[0], "@{[unpack q{(H2)8a0(H2)8},$1]}", $1 =~ y{ -~}{.}cr, "*\n"x!!$+[2] /segr . sprintf "%08X", $+[0] }
        That is so cool! I modified it slightly so it also runs on TinyPerl 5.8:

        sub hexdump { my $s; $_[0] =~ s/\G([\0-\xff]{1,16})/ $s = $1; $s =~ y|\x20-\x7e|.|c; spri +ntf("%08X %-50s|%s|\n", $-[0], "@{[unpack q{(H2)8a0(H2)8}, $1]}", $s +);/ge; return $_[0]; }

        I don't understand why I had to do this: ([\0-\xff]{1,16}) instead of (.{16}) This latter one won't capture anything, which is weird. The second thing I don't understand is why this works:

        "@{[unpack q{(H2)8a0(H2)8}, $1]}"