lobs has asked for the wisdom of the Perl Monks concerning the following question:

So I am trying to create a subroutine that either returns an array or a hash. The purpose is so then I can have a feature array and store the array into a hash table that keeps track of how many times the word repeats. It supposed to simulate a bag-of-words. This is what I have so far.
sub getFeatures { my @tokens = split /\s+/, $_[0]; return @tokens; } if($currentSentence =~ /<@> <s> ([\S\s]+) <head>lines?<\/head> ([\S\s] ++) <\/s> <@>/) { my @temp = getFeatures("$1 $2"); foreach my $variable (@temp) { $features{$variable} ++; } }
$currentSenctence is predefined.

Replies are listed 'Best First'.
Re: assign array to hash
by shmem (Chancellor) on Apr 06, 2016 at 09:55 UTC
    So I am trying to create a subroutine that either returns an array or a hash.

    A subroutine neither returns a hash, nor an array. It returns a list. See perlsub:

    The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. Any arrays or hashes in these call and return lists will collapse, losing their identities--but you may always use pass-by-reference instead to avoid this. Both call and return lists may contain as many or as few scalar elements as you'd like. (Often a function without an explicit return statement is called a subroutine, but there's really no difference from Perl's perspective.)

    You have @ sigils in your regular expression. You need to escape them with a backslash.

    So your captures $1, $2 may consist of one or more words, separated by spaces, which you split further in the sub. You can do without that sub, see map and a statement modifier, see perlsyn:

    if($currentSentence =~ /<\@> <s> ([\S\s]+) <head>lines?<\/head> ([\S\s +]+) <\/s> <\@>/) { $features{$_}++ for map { split } $1,$2; }
    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: assign array to hash
by Eily (Monsignor) on Apr 06, 2016 at 09:47 UTC

    Do you really need the array? A bag-of-word is independent of word order, but with the keys keyword you'll have access to all your words.

    sub toBagOfWords { my %bag; $bag{$_}++ for split /\s+/, shift; %bag; } my %hash = toBagOfWords "I see their knavery this is to make an ass of + me"; my @words = keys %hash; # if you still need the list of word, in a "ra +ndom" order

Re: assign array to hash
by NetWallah (Canon) on Apr 06, 2016 at 21:48 UTC
    What are you trying to achieve with the regular expression:
    ([\S\s]+)
    The doc says:
    \s Match a whitespace character
    \S Match a non-whitespace character
    So, that would be the same as:
    (.+)

            This is not an optical illusion, it just looks like one.

      There's a slight difference (I'm not sure whether that was the intent, though): . doesn't match a newline, unless used with the /s modifier, while \s matches it.

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: assign array to hash
by Anonymous Monk on Apr 06, 2016 at 08:28 UTC
    So you have accomplished that?