BUU has asked for the wisdom of the Perl Monks concerning the following question:

Two questions really. The first question is recently I found myself with an array of hashes and I need to create a new array, but each hash in this array had to had less elements. To solve this I eventually just did:
my @old_hashes; my @new_hashes; my @wanted_keys = qw/foo bar baz/; #etc @new_hashes = map{ my $x = $_; +{ map { $_=>$x->{$_} } @wanted_keys } +} @old_hashes;
which solved the problem, but nesting maps seemed a tad hackish and it seemd to be that some solution could be devised using hash slices and be much cleaner.. but I couldn't think of one, so I ask the assorted monks here if anyone can think of a better way to do this?


My second question is I have some strange non-fixed width and non-delimited strings I need to parse that look like this:
6 2 78 testing stuff 0 69.68.119.54:28960 34756 2500 +0 7 4 118 [:EsU:]|BLaZE| 0 24.86.4.164:28960 7248 500 +0 6 2 78 tessssssstinggggggggggg REAAAAA 40 69.68.119.54:28960 + 34756 25000
You'll notice that most of the fields are seperated by white space, except that the middle field can contain embedded whitespace! My solution to this was to devise a regex that basically looks like this:
my @cols = m/ (\d{1,3}) \ + (-?\d+) \ + (\d{1,4}|CNCT) \ (.+?)(?:\^7)? \ + (\d{1,6}) \ (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:-?\d{1,5}) \ + (\d{1,5}) \ + (\d{3,5}) /x;
Anyone see a better way to parse the above data?

After thinking about the above, it occurred to me that I could split on white space, pop off the last 4 fields and unshift the first three fields and then join whatever's left, but that would be destructive on the middle field as I would have no way of knowing exactly how much white space the split consumed before finding the next field.

Note, the desired output from the above input input lines, delimted by quotes and commas should be basically:
"6", "2", "78", "testing stuff", "0", "69.68.119.54:28960", "34756", " +25000" "7", "4", "118", "[:EsU:]|BLaZE|", "0", "24.86.4.164:28960", "7248", " +5000" "6", "2", "78", "tessssssstinggggggggggg REAAAAA", "40", "69.68.119.54 +:28960", "34756", "25000"


Note 2: You'll notice that the sixth field is an ip address, I just use a simple regex to match 4 sets of 1-3 digits followed by some kind of port as I already know the ip is a valid ip so I just need to extract it, not validate it.

Replies are listed 'Best First'.
Re: Parsing bizarre non delimted data and hash slices
by davorg (Chancellor) on Feb 01, 2004 at 10:18 UTC
    I don't know if it's any "better", but I'd probably solve your first problem something like this:
    my @new_hashes = map { my %h; @h{@wanted} = @{$_}{@wanted}; \%h } @old_hashes;
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Parsing bizarre non delimted data and hash slices
by ysth (Canon) on Feb 01, 2004 at 10:42 UTC
    With regard to your first question, you've done it about as easily as it can be done; only other option I see is to change:
    @new_hashes = map{ my $x = $_; +{ map { $_=>$x->{$_} } @wanted_keys } +} @old_hashes;
    to
    @new_hashes = map { my %x; @x{@wanted_keys}=@$_{@wanted_keys}; \%x } @ +old_hashes; # untested
    Update: I see someone else already said that. To provide some new content, here's a different answer to the second question:
    @cols = (split(' ', $data, 4), (map scalar reverse, reverse split(' ', reverse($data), 4))[1 +..3]);
    Update: I actually "tested" that, and completely failed to notice that it did nothing like what I wanted. Try this instead:
    @cols = split(' ', $data, 4); @cols[7,6,5,4,3] = map scalar reverse, split ' ', reverse($cols[3]), 5 +;
    or the all in one version:
    @cols = ((split(' ', $data, 4))[0..2], reverse map scalar reverse, spl +it ' ', reverse((split ' ', $data, 4)[-1]), 5);
    Side note: ${\(expr)} does the same thing as (expr)[-1], and in fewer characters if you can omit the parentheses.

      Doesn't work for me:

      while ( <DATA> ) { my $data = $_; my @cols = (split(' ', $data, 4), (map scalar reverse, reverse split(' ', reverse($data), 5)) +[1..3]); print "<$_>" for @cols; print "\n"; } __DATA__ 6 2 -78 testing stuff 0 69.68.119.54:28960 34756 250 +00 7 4 118 [:EsU:]|BLaZE| 0 24.86.4.164:28960 7248 500 +0 6 2 78 tesssssssting gg gggggggg REAAAAA 40 69.68.119.54: +28960 34756 25000

      Output:

      <6><2><-78><testing stuff 0 69.68.119.54:28960 34756 25000 ><0><69.68.119.54:28960><34756> <7><4><118><[:EsU:]|BLaZE| 0 24.86.4.164:28960 7248 5000 ><0><24.86.4.164:28960><7248> <6><2><78><tesssssssting gg gggggggg REAAAAA 40 69.68.119.54:28 +960 34756 25000 ><40><69.68.119.54:28960><34756>

      I had something similar, albeit so ridiculous that I didn't post it, but at least it seems to work :-)

      while ( <DATA> ) { my @tmp = split ' ', $_, 4; my @start = splice @tmp, 0, 3; @tmp = split ' ', (reverse $tmp[0]), 5; $_ = reverse for @tmp; my @parsed = ( @start, reverse @tmp ); print "<$_>" for @parsed; print "\n"; }

      Output (with the same DATA):

      <6><2><-78><testing stuff><0><69.68.119.54:28960><34756><25000> <7><4><118><[:EsU:]|BLaZE|><0><24.86.4.164:28960><7248><5000> <6><2><78><tesssssssting gg gggggggg REAAAAA><40><69.68.119.54:2896 +0><34756><25000>

      But as I'm still trying to understand your code, I can't see where the error is... ;-)

      dave

      Update: Eagle-eyed monks will already have noticed that my code is not quite the same as the original, namely here: reverse($data), 5, where ysth had: reverse($data), 4. This is (of course :) because I started playing with it before posting. However, the output in both cases appears to be the same...

Re: Parsing bizarre non delimted data and hash slices
by Roger (Parson) on Feb 01, 2004 at 10:40 UTC
    use strict; use warnings; use Data::Dumper; while (<DATA>) { my @fields = /(\d+)\s+(\d+)\s+(\d+)\s+(.*?)\s+(\d+)\s+(\d+\.\d+\.\ +d+\.\d+\:\d+)\s+(\d+)\s+(\d+)$/; local $" = '","'; print qq{"@fields"\n}; } __DATA__ 6 2 78 testing stuff 0 69.68.119.54:28960 34756 25000 7 4 118 [:EsU:]|BLaZE| 0 24.86.4.164:28960 7248 500 +0 6 2 78 tessssssstinggggggggggg REAAAAA 40 69.68.119.54:28960 + 34756 25000
    And the output -
    "6","2","78","testing stuff","0","69.68.119.54:28960","34756","25000" "7","4","118","[:EsU:]|BLaZE|","0","24.86.4.164:28960","7248","5000" "6","2","78","tessssssstinggggggggggg REAAAAA","40","69.68.119.54:2896 +0","34756","25000"