lampros21_7 has asked for the wisdom of the Perl Monks concerning the following question:

Hello

My query is quite different but still am sure it can be done with Perl although i don't know how

I have an array of all the bigrams that exist in the alphabet. A bigram is a two - letter combination.

I have all the bigram combinations in an array so $bigram[0] is aa and $bigram[675] is zz.

$bigram[0] = "aa"; $bigram[1] = "ab"; ...$bigram[675] = "zz"

Now i have an array that contains a bigram for every element. From that array i want to check which bigrams from my list exist in my @bigram_list array. For those bigrams that indeed exist once or more i want to make another array the same size as my @bigram array that will have a 1 for those bigrams that exist and a 0 for the bigrams that don't exist in my @bigram_list array.

For example:

@bigrams : as said previously

@bigram_list = ("ac","ab","wd","xs");

So here, the new array that i want to hold the numbers will have 0 in every element apart from the positions where "ac", "ab", "wd" and "xs" are where there should be a 1.

Sorry, if this is complicated i tried to make it as understandable as i could. Does anyone know of a way i could do this? Thanks for looking

Replies are listed 'Best First'.
Re: Identifying bigrams and making a note of which ones exist
by jdporter (Paladin) on Mar 10, 2006 at 20:41 UTC

    Well, first, we can make @bigram like this:

    @bigram = ( 'aa' .. 'zz' ); # magic!

    Next, do the searching and remembering (this is just one way to do it):

    # make a "set" of the ones in the list: my %bigram_list; @bigram_list{@bigram_list} = (); # corresponds to (and same size as) @bigram. my @bigrams_existing = map { exists $bigram_list{$_} ? 1 : 0 } @bigram;
    We're building the house of the future together.

      Fleshing your code out a little into a full example gives:

      use strict; use warnings; use Data::Dump::Streamer; my @bigram = ( 'aa' .. 'zz' ); # magic! my @testBigrams = split ' ', do {local $/; <DATA>;}; # make a "set" of the ones in the list: my %bigram_list; @bigram_list{@testBigrams} = (0) x @testBigrams; # corresponds to (and same size as) @bigram. my @bigrams_existing = map { exists $bigram_list{$_} ? 1 : 0 } @bigram; Dump (\@bigrams_existing); __DATA__ gr an df at he rj dp or te rj kv ap la ne ts ca pe ww

      Note in particular that the line @bigram_list{@bigram_list} = (); becomes @bigram_list{@testBigrams} = (0) x @testBigrams; and starts to make some sense.


      DWIM is Perl's answer to Gödel
Re: Identifying bigrams and making a note of which ones exist
by roboslug (Sexton) on Mar 11, 2006 at 03:09 UTC
    Another, similar, approach just because it was a simple and fun little excerise. Depending on the data sets, this might be faster or slower. Primary difference is that it uses a position hash for the bigrams. Then, there is the sexy output. * grin *
    # Setup vars - a zero map and position map. my ($pos,%bg_pos) = (0,map { ($_,$pos++); }('aa'..'zz')); my @bg_map = (0) x 676; # Input var my @bg_input = qw[ ab ba ca pe ne yt zz zy gh fg ui jk lk mn a +d ]; # Do the actual map of matching bigrams map { $bg_map[$bg_pos{$_}] = 1; } @bg_input; # Output print qq[ ===================================================== Row=1st Letter,Col=2nd Letter ===================================================== ], join(' ',('a'..'z')),"\n\n", map {( "$_ ", ( map { (shift @bg_map)." "; } ('a'..'z') ), "\n" )} ('a'..'z');
    Example Output:
Re: Identifying bigrams and making a note of which ones exist
by planetscape (Chancellor) on Mar 11, 2006 at 12:53 UTC
Re: Identifying bigrams and making a note of which ones exist
by Lu. (Hermit) on Mar 11, 2006 at 13:19 UTC
    do you have to use arrays? I would use a hash and write something like that :
    @bigram = ( 'aa' .. 'zz' ); @bigram_list = ("ac" , "ab" , "wd" , "xs" ); $bigram_str = join " " @bigram; my %bigram_exist; foreach $bigram_test (@bigram_list) { if ( $bigram_str =~ m/$bigram_test/ { $bigram_exist{$bigram_test}=1; } else { $bigram_exist{$bigram_test} = 0; { }
    If you really need an array, then change the foreach into a for
    (...) my @bigram_exist; for (i=0 ; i<=@bigram_list ; i++) { if ( $bigram_str =~ m/$bigram_list[i]/ { $bigram_exist[i]= 1 ; } else { $bigram_exist[i] = 0; { }
    either way you'll obtain a hash or an array with as many keys/items as your list.
Re: Identifying bigrams and making a note of which ones exist
by ff (Hermit) on Mar 12, 2006 at 05:14 UTC
    You could even tweak your resulting array a bit to accumulate counts of instances of the bigram rather than just saving '1' or '0':

    use strict; use warnings; my @bg_data = qw( aa aa aa sd gd sd bf bf sd gd sd df sb eb xd bf dr et bf bf +sd bd aa aa aa ab ab ab ab ab ab ab ab ab sb eb ad ad ad ad ad bf +sd bd ); # Note and accumulate how many times each bigram is used.... my %bg_stats; $bg_stats{$_}++ foreach @bg_data; # Store the data in the 676-element array.... my @bg_final = map{ $bg_stats{$_} ||= 0 } ( 'aa' .. 'zz' ); #print "$_\n" foreach @bg_final;