Re: arrays of arrays

I've been a little confused about all this too, so I googled textcorpora and came up with this Wikipedia link:

http://en.wikipedia.org/wiki/Text_corpus

in which case you can see that the data:

the/article book/noun
he/pronoun is/verb ill/adjective
[download]

are words tagged with the parts of speech they represent. So what he's wanting to do is deconvolute the word/part-of-speech pairs back into sentences followed by the equivalent parts of speech in the same order.

the book article noun
he is ill pronoun verb adjective
[download]

So what he's wanting is a program that would see (\w+)\/(\w+) pairs, split them, push each into an array and once the parse is complete, 'emit' the data in sequential order, first the array of words and second the array of parts of speech. This word-space-number example is just a step on the way to get his textcorpora stuff working.

That's the explanation; I hope it helps

This is my code example:

#!/usr/bin/perl
use warnings;
use strict;
my @words;
my @parts_of_speech;
while(my $sentence = <DATA>) {
    @words = ();
    @parts_of_speech = ();
    while ($sentence =~ /(\w+)\/(\w+)/g ) {
        push(@words, $1) if $1;
        push(@parts_of_speech, $2) if $2;
    }
    print $_, " " for @words;
    print $_, " " for @parts_of_speech;
    print "\n";
}

__DATA__
the/article book/noun
he/pronoun is/verb ill/adjective
[download]

The output is:

C:\Code>perl linguistic.pl
the book article noun
he is ill pronoun verb adjective
[download]

Update: cleanup

Comment on Re: arrays of arrays Select or Download Code

Replies are listed 'Best First'.
Re^2: arrays of arrays by jwkrahn (Abbot) on Sep 07, 2007 at 17:20 UTC
`my @words; my @parts_of_speech; while(my $sentence = <DATA>) { @words = (); @parts_of_speech = (); while ($sentence =~ /(\w+)\/(\w+)/g ) { push(@words, $1) if $1; push(@parts_of_speech, $2) if $2; } print $_, " " for @words; print $_, " " for @parts_of_speech; print "\n"; }` [download] The arrays `@words` and `@parts_of_speech` don't need to be in file scope, you should declare them inside the loop. The pattern `\w+` will always match at least one character so the only way it can be false is if that one character is `'0'` so the tests for `$1` and `$2` are superfluous. Your print statements are overly complicated, they could be simplified to: `print "@words @parts_of_speech\n";` [download]	[reply] [d/l] [select]
Re^2: arrays of arrays by monkantar (Initiate) on Sep 07, 2007 at 15:54 UTC
Dear grep, toolic, Gangabass and dwm04, thanks a lot for your advice!! I realize I still have to learn a lot, and this first visit to this website was very interesting. dwm04, your script does exactly what I want, thx! monkantar	[reply]