Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

One for the regexp fans

by Odud (Pilgrim)
on Aug 04, 2000 at 13:22 UTC ( [id://26155] : perlquestion . print w/replies, xml ) Need Help??

Odud has asked for the wisdom of the Perl Monks concerning the following question:

I have a variable that contains, for example, the following:


(If you're curious it represents the MAC address of a LAN card).
What I want to do is to turn it into the following format:


i.e. remove the dots and add leading zeroes where the original was only one hex digit. After some experiments I came up the following snippet:
($mac = $addr) =~ s/(\.|^)([0-9a-f])(?=\.|$)/${1}0$2/g; $mac =~ s/\.//g;
This works, but I was wondering if there is a neater way - and in particular can it be done in a single statement?

As usual I look forward to everyone's contributions and suggestions.


Replies are listed 'Best First'.
Re: One for the regexp fans
by davorg (Chancellor) on Aug 04, 2000 at 13:37 UTC

    Do MAC addresses always have six sections? If so, you could do something like this:

    my $fmt = '%02x' x 6; $mac = sprintf $fmt, map hex, split /\./, $mac;

    It's still two lines, but the first is just there so I don't have to type '%02x' six times :)

    Oh, and it's not really a regex solution - sorry!


    European Perl Conference - Sept 22/24 2000, ICA, London
      davorg says:
      It's still two lines, but the first is just there so I don't have to type '%02x' six times :)
      Well then, don't!
      $mac = sprintf "%02x" x 6, map hex, split /\./, $mac;
      There. One line. :)

      -- Randal L. Schwartz, Perl hacker

      Another common format for MAC address display is nnnn:nnnn:nnnn (common amongst Cisco gear, anyway).   In either case, it's the same 12 hex digits, only the delimiter and break-points change.

      Is simple code that handles both formats possible?

        You could probably do something like:
        if $str =~ m!\.! {$str =~ s/(\.?[0-9a-eA-E]*)/(substr("0$1", -2))/ge;} else {$str =~ s/(:?[0-9a-eA-E]*)/(substr("0$1", -4))/ge;}


      Nice idea - I'd got a bit hung up on the solution being an RE. I think that they are always "dotted-sex!" format (the string comes from calling netstat -i and I have to cope with the different formats produced by HP-UX, AIX, and OSF1). Perhaps we can have a pint together at yapc::Europe?
        Not much to offer except curiosity and frustration on my own efforts. On your HP-UX interface queries, does it render "raw" (no colon or dash delimiter) 12 characters, with leading zeroes, or does it provide you with the delimiter. I'm using "lanscan -a" on the HP-UX interfaces, getting my 12 char MAC address, trying to insert colons every two characters, then stripping the lead zeroes. That's my fallback position after I baked my brain on making the DEC OSF1 "netstat -i | grep '<Link>' |egrep -v "s10|lo0|ppp0"|awk '{print $1}' | sort -u`" system call for-loop kludge work, which at least renders multi-line output of valid, live interfaces. If I can get either to work, I'll die a happy I thrust my head through the display :-) ...---... SOS !!! -raddude
Re: One for the regexp fans
by Corion (Patriarch) on Aug 04, 2000 at 13:54 UTC

    I first tried a pack / unpack approach, that didn't work and then went to perlman:perlre and read a bit about the zero-width lookbehind operators ... And this is what I came up with :

    #!/usr/bin/perl -w use strict; my $Test = ""; print "\n"; # A feeble try with pack/unpack, that dosen't work # print join( ".", unpack( "H2" x 6, pack( "H2" x 6, split( /\./, $Tes +t )))), "\n"; # Now a regex which uses the zero-width lookbehind and works # the "defined $1 ? "0$1" : "" part is ugly, I admit this ... # but I see no other way around it ;-) $Test =~ s/(?(?<![a-f0-9])([a-f0-9]))?\./(defined $1?"0$1":"")/eig; print $Test, "\n";
Re: One for the regexp fans
by Maqs (Deacon) on Aug 04, 2000 at 14:42 UTC
    try this:
    $str =~ s/(.[^\.]*)(\.?)/(substr("0" x 2 .$1, -2)).$2/ge;

      For the substr part, why not just:
      (substr ("0$1", -2))

      Update: In fact I don't see any reason for the second bracketed expression in the regex. Also I can't see how it expands the first hex number if that's necessary. Is this better? or have I missed something subtle?

      $str =~ s/(\.?[0-9a-eA-E]*)/(substr("0$1", -2))/ge;


        yep. your variant is an enhanced one. I tried only to make a general idea. :)
      Is my interpretation correct? The first () matches one or more non-dot characters and the second () matches the trailing . or nothing at the end of the string. Then you build a string that has at least two leading zeros and extract the rightmost 2 characters and tag on the trailing . or nothing. I quite like this solution as well. Both you and davorg have come up with good alternatives, thanks. I should think about using substr more - unfortunately it sits in my mind sharing a location with peek and poke! and so doesn't come out to play much these days.
(Adam: JoinMapSplit) RE: One for the regexp fans
by Adam (Vicar) on Aug 04, 2000 at 20:55 UTC
    $mac = join '', map { /^([0-9a-f])$/i ? "0$1" : $_ } split /\./, $addr +;
    The nice thing about perl is that it allows you to code in a manner that simulates speech. You said something like, "I want to split up the address at the dots, prefix zeros to single digit hex values, and join it all together to make a value." Which, incidentally, is exactly what the above one-liner does.
      How about using sprinf: $mac = join "", map {sprintf "%02s", $_} split '\.', $addr
        Forgot to put CODE tags, so I am replying to myself.

        <CODE>$mac = join "", map {sprintf "%02s", $_} split '\.', $addr<CODE>

        Forgot to put CODE tags, so I am replying to myself. <CODE>$mac = join "", map {sprintf "%02s", $_} split '\.', $addr<CODE>
Simple RE answer
by tilly (Archbishop) on Aug 05, 2000 at 00:18 UTC
    $mac =~ s/(^|\.)([0-9a-f]?)(?=[0-9a-f])/$2 || 0/eg;

    That solves the original problem. ybiC pointed out that another format mac addresses come in has groups of 4 characters, separated by :. The amended question was whether both formats can be handled together. The following does that:

    $mac =~ s/[^0-9a-f]*([0-9a-f]?)([0-9a-f])/($1 || 0) . $2/eg;
    In both cases the delimiter is gobbled by not being cached in a backref, and if it does not appear in even groups then a 0 will appear.

    Danger alert. In the pair "05" the 0 gets replaced, so if you wanted to use this to insert anything other than 0 you would need to not just check the truth of $1, but rather it's length.