Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have written in the past this short routine to manipolate my data structure (array of hashes).

for my $h (@suggestionsTemp) { push @suggestionsUnsorted, ($targets{$$h{targetL}}={targetL=>$ +$h{targetL}}) unless $targets{$$h{targetL}}; $targets{$$h{targetL}}{origin}{$$h{origin}}++; $targets{$$h{targetL}}{count}++; } $$_{origin} = join ', ', sort keys %{$$_{origin}} for values %targ +ets;

It basically counts how many times the value of 'targetL' is repeated in the structure, sets the frequency in 'count' and joins all the values of 'origin' of all repeated 'targetL'. I am wondering if it is possible to use the same routine to do the match (here with unless) in a case insensitive way the 'targetL'.

Replies are listed 'Best First'.
Re: data manipulation case insensitive match
by Corion (Patriarch) on Nov 16, 2019 at 14:21 UTC

    If you want to count case-insensitively, count the number of a canonicalized version, like lc 'targetL' for example, or fc 'targetL'. You will need to check your hash for all keys who have lc $key eq lc 'targetL' then.

    But as you don't show us what your data structure is, or how it comes to be, it is very hard to give you more concrete advice.

    My best hint is to calculate the data before or while constructing your data structure, or, if this data structure is the result of a database query, to leverage the database functionality to calculate the number of targetL instances instead.

      Shame on me I forgot to post the data structure. Here it is:

      $VAR1 = [ { 'origin' => 'IB', 'targetL' => 'Ahnenforschung', 'sourceL' => '' }, { 'sourceL' => '', 'targetL' => 'akzent', 'origin' => 'IB' }, { 'sourceL' => '', 'origin' => 'EU', 'targetL' => 'Akzent' }, { 'origin' => 'IB', 'targetL' => 'Akzent', 'sourceL' => '' } ]

      Expecting:

      $VAR1 = [ { 'count' => 2, 'origin' => 'IB, EU', 'targetL' => 'Akzent' }, { 'origin' => 'IB', 'targetL' => 'Ahnenforschung', 'count' => '1' } ]

      My script works fine, but it counts as different entities 'Akzent' and 'akzent' which I would like to count and put togheter. The data structure is constructed in a complex way, so that this manipulation needs to be performed on the datastructure itself, and not before.

      Note that 'targetL' in my first record (output) could be both 'Akzent' or 'akzent' (do not care), but I do not want to put all my datastructure to lc, i.e. I want to preserve the original as much as possible

        Then do your counting in a case-insensitive approach:

        my %count; $count{ lc $_ }++ for map { $_->{targetL} } @$results; my $winner = $count{ "akzent" }; my @matched = grep { lc $_->{targetL} eq $winner } @$results;
Re: data manipulation case insensitive match
by tybalt89 (Monsignor) on Nov 16, 2019 at 16:47 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11108780 use warnings; use Data::Dump 'dd'; my @suggestionsTemp = ( { origin => "IB", sourceL => "", targetL => "Ahnenforschung" }, { origin => "IB", sourceL => "", targetL => "akzent" }, { origin => "EU", sourceL => "", targetL => "Akzent" }, { origin => "IB", sourceL => "", targetL => "Akzent" }, ); dd \@suggestionsTemp; my @suggestionsUnsorted; my %targets; for my $h (@suggestionsTemp) { push @suggestionsUnsorted, ($targets{lc $$h{targetL}}={targetL=>$$h{ +targetL}}) unless $targets{lc $$h{targetL}}; $targets{lc $$h{targetL}}{origin}{$$h{origin}}++; # $targets{lc $$h{targetL}}{count}++; } for ( values %targets ) { $$_{count} = keys %{$$_{origin}}; $$_{origin} = join ', ', sort keys %{$$_{origin}}; } @suggestionsTemp = values %targets; dd \@suggestionsTemp;

    Outputs:

    [ { origin => "IB", sourceL => "", targetL => "Ahnenforschung" }, { origin => "IB", sourceL => "", targetL => "akzent" }, { origin => "EU", sourceL => "", targetL => "Akzent" }, { origin => "IB", sourceL => "", targetL => "Akzent" }, ] [ { count => 1, origin => "IB", targetL => "Ahnenforschung" }, { count => 2, origin => "EU, IB", targetL => "akzent" }, ]
Re: data manipulation case insensitive match
by tangent (Parson) on Nov 16, 2019 at 16:41 UTC
    Keeping your original code you could try this:
    for my $h (@suggestionsTemp) { my $targetL = lc $h->{targetL}; push @suggestionsUnsorted, ($targets{$targetL}={targetL=>$targetL} +) unless $targets{$targetL}; $targets{$targetL}{origin}{$h->{origin}}++; $targets{$targetL}{count}++; } $_->{origin} = join ', ', sort keys %{$_->{origin}} for values %target +s;
    Note: changed the way hash references are dereferenced to use the arrow operator.

    Output:

    $VAR1 = [ { 'targetL' => 'ahnenforschung', 'count' => 1, 'origin' => 'IB' }, { 'count' => 3, 'origin' => 'EU, IB', 'targetL' => 'akzent' } ];