Linguist has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to do pattern matching for linguistic analysis using a 2 dimensional array. In one dimension are the words in English, in the second are all the grammar codes. I am new to PERL and programming in general, so I'm not sure if my problem is with the logic or the code.

Here is the part of the code that isn't doing what I expect it to. I have also left in my comments so you can see what I intend to do. Please let me know if you need more of my code to make any sense of this

Any suggestions would be great. Thanks!

$textarray[$iwrdnum][1] = $tagfields[0]; #the member located i +n the position 0 of the tagfields array, is placed into position 1 in + the textarray array $textarray[$iwrdnum][2] = $tagfields[1]; $textarray[$iwrdnum][3] = $tagfields[2]; $textarray[$iwrdnum][4] = $tagfields[3]; $textarray[$iwrdnum][5] = $tagfields[4]; $iwrdnum++; }#end loop 2 for ($i = 0; $i <@textarray; $i++) {#start loop 3 if ($textarray[$i][1] =~ /\b(nn|nns|nvbg|np|nps|npl)\b/) {#start if conditon for ($i3=1; $i3<=9; $i3++) #HERE IS WHERE I THINK + THE PROBLEM MUST BE. What I think I'm doing is starting at the next + row to the 9th row down, looking row by row { #start for loop if ($textarray[$i][1] =~ /\b(at|ati)\b/ ) #if +the tag is at or ati then {#start if condition print (OUTFILE "$textarray[$i][0]$textarra +y[$i+1][0]$textarray[$i+2][0]$textarray[$i+3][0]$textarray[$i+4][0]$t +extarray[$i+5]\n "); }#end if condition }#end for loop }#end if condition }#end loop 3

Replies are listed 'Best First'.
Re: looping in 2d array
by Laurent_R (Canon) on Nov 01, 2014 at 18:59 UTC
    Here is the part of the code that isn't doing what I expect it to.

    It would be nice to tell us what you expect your code to do, and in which respect it does not do that.

    A couple of comments on your code. It has already been pointed out that the $i3 variable is not used in your loop, which seems to be a possible error (but we do not know since we don't really know what you are trying to do). Having said that, that loop would be more perlish this way:

    for my $i3 (1..9) { # ...
    Similarly, the other loop:
    for my $i (0..$#textarray) { # ...
    The important thing above is not so much the different syntax, but the fact that this code declares the $i3 and $i variables with the my operator and gives them a lexical scope. All your variables should be declared and you should always use the following pragmas:
    use strict; use warnings;
    near the top of your program. They will help you finding errors or deprecated/dangerous constructs.

    Update: fixed a missing closing parenthesis in one of my loop examples. Thanks to AnomalousMonk for pointing out the typo through the chatterbox.
Re: looping in 2d array
by hippo (Archbishop) on Nov 01, 2014 at 16:42 UTC

    The one thing which looks a little odd is that the only references to $i3 are in the for statement itself. You have set up this variable to loop from 1 to 9 but then you never use the variable inside the loop. Is this intentional?

Re: looping in 2d array
by AnomalousMonk (Archbishop) on Nov 01, 2014 at 20:28 UTC

    Some general points which do not address the OPed question (BTW: I think hippo is on the mark with the observation that  $i3 is never used), but may be of interest:

    • $textarray[$iwrdnum][1] = $tagfields[0]; #the ... $textarray[$iwrdnum][2] = $tagfields[1]; $textarray[$iwrdnum][3] = $tagfields[2]; $textarray[$iwrdnum][4] = $tagfields[3]; $textarray[$iwrdnum][5] = $tagfields[4];
      This can, IMHO, be expressed with more clarity/concision/maintainability with array slices (see Slices in perldata) (untested):
          @{ $textarray[$iwrdnum] }[ 1 .. 5 ] = @tagfields[ 0 .. 4 ];

    • Also in the interest of concision, the statement
          print (OUTFILE "$textarray[$i][0]$textarray[$i+1][0]$textarray[$i+2][0]$textarray[$i+3][0]$textarray[$i+4][0]$textarray[$i+5]\n ");
      could be re-written as (also untested)
          print OUTPUT join '', map $textarray[ $i+$_ ][0], 0 .. 5;
          print OUTPUT "\n ";
      (I assume the final  $textarray[$i+5] in the OPed code should really have been  $textarray[$i+5][0]). See map.

    • The capturing groups in a couple of regexes that look like  /\b(nn|nns|nvbg|np|nps|npl)\b/ can be replaced by non-capturing  (?:pattern) groupings. The captured substrings never seem to be used, and if this is true, the conventional wisdom is that non-capturing grouping is a bit faster. This could make a significant difference if you're crunching lotsa data. See  "(?:pattern)" in Extended Patterns in perlre; also Non-capturing groupings in perlretut.

Re: looping in 2d array
by toolic (Bishop) on Nov 01, 2014 at 16:47 UTC
    using a 2 dimensional array. In one dimension are the words in English, in the second are all the grammar codes.
    This terse description leads me to believe a hash may be easier to work with than an array-of-arrays.

    More info in the form of http://sscce.org would be helpful.

Re: looping in 2d array
by GotToBTru (Prior) on Nov 01, 2014 at 21:51 UTC

    So $textarray[1, 2, 3, 4, ...] are the words, $textarray[2][1, 2, 3, ... 9] are the grammar codes for the third word. If the first code matches nn|nns|...npl then for each of the 9 grammar codes, if the first code matches at|ati, print the first 6 grammar codes. That's what it looks like you are trying to do. Perhaps you can explain what you want to do with one word, and then figure out how to do it with the whole array.

    1 Peter 4:10
      So $textarray[1, 2, 3, 4, ...] are the words, $textarray[2][1, 2, 3, ... 9] are the grammar codes ...

      NB: The Perlish syntax for these slices is (something like)  @textarray[1, 2, 3, 4] and  @{ $textarray[2] }[1, 2, 3, 7 .. 9] respectively.

Re: looping in 2d array
by FloydATC (Deacon) on Nov 02, 2014 at 09:28 UTC

    First of all, in most situations you'll find it much easier to use foreach instead of for. That way, you're letting Perl do all the tedious work with keeping track of indices for you.

    Second, as pointed out already, I would consider using a hash instead of an array. Arrays are for looping, hashes are for looking stuff up. I have a very strong gut feeling a data structure containing words will be used mostly for lookups, so let's try to model the data in a way most suitable for the purpose, whatever it is.

    Example 1

    Consider a hash of hashes. Since I have no idea what the tags 'nn', 'nns' etc. are, I have given them generic names. If naming the tags makes no sense, please just skip this example.

    my $words = { 'foo' => { named_property_x => 'nn', named_property_y => 'nns', }, 'bar' => { named_property_x => 'nvbg', named_property_y => 'np', }, }; my $lookup = 'foo'; print $words->{$lookup}->{'named_property_x'}; # Would print 'nn' foreach my $named_property (%{$words->{$lookup}}) { # This would print all the named properties for a specific word, in +no particular order print $named_property . ':' . $words->{$lookup}->{$named_property} . + "\n"; } # To add a new word and/or tag: $words->{'baz'}->{'named_property_z'} = 'nps'; # Do delete a tag delete $words->{'baz'}->{'named_property_z'}; # To check for a specific tag: if ( $words->{'baz'}->{'named_property_z'} eq 'nps' ) { ... }

    Example 1

    OK so let's say naming the tags makes no sense. Let's instead use a hash of hashes where the tags themselves are used as keys for easy lookup:

    my $words = { 'foo' => { nn => 1, nns => 1, }, 'bar' => { nvbg => 1, np => 1, }, }; my $lookup = 'foo'; my $tag = 'nn'; print $words->{$lookup}->{$tag}; # Would print '1' foreach my $tag (keys %{$words->{$lookup}}) { # This would print all the tags for a specified word, in no particul +ar order print $tag . "\n"; } # To add a new word and/or tag: $words->{'baz'}->{'nps'} = 1; # To delete a tag: delete $words->{'baz'}->{'nps'}; # To check for a specific tag: if ( $words->{'baz'}->{'nps'} ) { ... }

    Example 3

    Now, in your OP the code seems to indicate that the first tag holds a special importance, so let's try a hash of arrays to preserve the order:

    my $words = { 'foo' => [ 'nn', 'nns' ], 'bar' => [ 'nvbg', 'np' ], }; my $lookup = 'foo'; print $words->{$lookup}->[0]; # Would print 'nn', the first tag foreach my $tag (@{$words->{$lookup}}) { # This would print the tags for a specified word, in the order in wh +ich they were defined print $tag . "\n"; } # To add a new word and/or tag: push @{$words->{'baz'}}, 'nps'; # To delete a tag: @{$words->{'baz'}} = grep { $_ ne 'nps' } @{$words->{'baz'}}; # To check for a specific tag: if ( grep { $_ eq 'nps' } @{$words->{'baz'}} ) { ... }

    Notice that deleting and testing for specific tags now became a little bit trickier.

    NOTE: I have not tested this code so it may not even work properly, thus illustrating two points: 1) that arrays are trickier to work with than hashes and 2) don't simply pick one of these examples to work with, they're only meant to give you ideas for different ways to model your data until it fits your needs.

    Your best fit is probably not one of these examples anyway, because only you know the actual meaning of your data and how it will be used.

    -- FloydATC

    Time flies when you don't know what you're doing

      While I have no inclination to quarrel with FloydATC's code, his statement in para 1 may be misleading (I, for one, find it easier to type 3 letters rather than seven).

      In fact, while some programmers use for and foreach differently, they are indistinguishable except for the spelling ...and either can be used in a C-style loop or a Perlish version with or without an explicit iterator:

      C:\>perl -E "my @arr=qw(a b c d); my $a; for $a(@arr) {say $a;}" a b c d <c>C:\>perl -E "my @arr=qw(a b c d); my $a; foreach $a(@arr) {say $a;} +" a b c d

      And here's brian d foy's response to a question on SO ( http://stackoverflow.com/questions/2279471/whats-the-difference-between-for-and-foreach-in-perl ) back in 2010:

      "They used to be separate ideas, but now they are synonyms. Perl figures out what to do based on what's in the ( ) instead of using the keyword. Blame the people who couldn't type an extra four characters. :)
      C:\>perl -e "foreach ($i=0;$i<10;$i++) { print $i }" # NB: '-e' & 'pri +nt' 0123456789

      And though foy's statement doesn't offer a version or date when "used to be separate ideas" changed, nodes here in the Monastery, back at least as far as 2000, disagree -- some globally, and some only with respect to Perl 4 vs. Perl 5.

      Just for completeness or merely to be excessively didactic :-), if one changes foy's (5.8?) code to my (5.18) "for" ...

      C:>perl -E "for ($i=0;$i<10;$i++) { say $i }" # Irrelevant differences + '-E' & 'say' 0 1 2 3 4 5 6 7 8 9


      Come, let us reason together: Spirit of the Monastery
      -->

        This is true. And beside the point :-)

        My point was to let Perl worry about the incrementing and bounds checking, eliminating many common problems such as accidently using the wrong counter or off-by-one errors. Personally, I find the keyword foreach coupled with a clear, recognizable variable name to be more readable than the for keyword which I'm familiar with from several inferior languages. Although I know perfectly well that Perl will do what I mean either way, I keep a clear distinction between the two to avoid confusion.

        -- FloydATC

        Time flies when you don't know what you're doing