I've been a little confused about all this too, so I googled textcorpora and came up with this Wikipedia link:

http://en.wikipedia.org/wiki/Text_corpus

in which case you can see that the data:

the/article book/noun he/pronoun is/verb ill/adjective
are words tagged with the parts of speech they represent. So what he's wanting to do is deconvolute the word/part-of-speech pairs back into sentences followed by the equivalent parts of speech in the same order.

the book article noun he is ill pronoun verb adjective
So what he's wanting is a program that would see (\w+)\/(\w+) pairs, split them, push each into an array and once the parse is complete, 'emit' the data in sequential order, first the array of words and second the array of parts of speech. This word-space-number example is just a step on the way to get his textcorpora stuff working.

That's the explanation; I hope it helps

This is my code example:

#!/usr/bin/perl use warnings; use strict; my @words; my @parts_of_speech; while(my $sentence = <DATA>) { @words = (); @parts_of_speech = (); while ($sentence =~ /(\w+)\/(\w+)/g ) { push(@words, $1) if $1; push(@parts_of_speech, $2) if $2; } print $_, " " for @words; print $_, " " for @parts_of_speech; print "\n"; } __DATA__ the/article book/noun he/pronoun is/verb ill/adjective
The output is:

C:\Code>perl linguistic.pl the book article noun he is ill pronoun verb adjective
Update: cleanup

In reply to Re: arrays of arrays by dwm042
in thread arrays of arrays by monkantar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.