in reply to First word

Here is an SSCCE for you repurposing your own regex:

#!/usr/bin/env perl use strict; use warnings; my @AoA = ( ['first word', 'greek latin'], ['alpha omega', 'beta test'] ); my @firsties; for my $outer (@AoA) { for my $inner (@$outer) { push @firsties, $inner =~ /^(\w+)/; } } print "@firsties\n";

Replies are listed 'Best First'.
Re^2: First word
by afoken (Chancellor) on Oct 22, 2016 at 07:04 UTC

    Unfortunately, the definition of the \w character class does not match what natural languages consider as "word characters". See also perlre.

    Example:

    /tmp>cat 1174444-mod.pl #!/usr/bin/env perl use strict; use warnings; my @AoA = ( ['first word', 'greek latin'], ['alpha omega', 'beta test'], ["don't forget", "can't work", "won't fix" ], ["Kindergärten Kindergarten"] ); my @firsties; for my $outer (@AoA) { for my $inner (@$outer) { push @firsties, $inner =~ /^(\w+)/; } } print "@firsties\n"; /tmp>perl 1174444-mod.pl first greek alpha beta don can won Kinderg /tmp>perl -v This is perl 5, version 18, subversion 1 (v5.18.1) built for x86_64-li +nux-thread-multi Copyright 1987-2013, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using "man perl" or "perldoc perl". If you have access to + the Internet, point your browser at http://www.perl.org/, the Perl Home Pa +ge. /tmp>

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      well, ok, replace,
      $inner =~ /^(\w+)/
      with:
      $inner =~ /^(\S+)/
      and we get:
      print "@firsties\n"; #first greek alpha beta don't can't won't Kindergärten
      Good point.
      \w means the characters that can be used within a Perl identifier [0-9_A-Za-z]
      Sometimes, as in this case, \S (not a space) is useful.
Re^2: First word
by Anonymous Monk on Oct 21, 2016 at 12:02 UTC

    Thanks for your quick reply. I modified the script as follows:

    my(@kmers); for my$x (@AoA){ for my$y (@$x){ push @kmers, $y =~ /^(\w+)/; print @kmers."\n"; } }

    but I am getting this output: 1 2 3 4 5 6 7 8 9 10 11 12 ... Is it printing just the scalar value? If so, why is that? I am often having this problem in other situations

      Try:

      print "@kmers\n";

      But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

      Since you haven't provided an SSCCE it is rather hard to say with certainty but the suspicion has to be that your @AoA is incorrectly formed and may not even be an AoA at all. Perhaps it is a deeper structure but that's pure speculation since you have not shared it.

      Update: GotToBTru looks to have found the likely reason. I can't imagine why Anonymonk changed that from my example.

      GotToBTru is right. I'll tell you why...
      With print @kmers."\n";, the "." (string concatenation) forces @kemers into a scalar context, which is the number of elements in the array. If you use a comma, print @kmers,"\n"; you get the contents of the @kmers array, but without the automatic output space separator. Using, print "@kmers\n"; is probably what you want? See Examples below...

      Of course, you probably want to move the print outside of the loop so that you just get the final result, not the intermediate results for each element?

      #!/usr/bin/perl use strict; use warnings; my @AoA = ( ['firstRowCol1 asdf87534', 'firstRowCol2 junk lj6t90'], ['secondRowCol1 mnhibvygt7','secondRowCol2 7d7d5434'] ); my(@kmers); for my $x (@AoA) { for my $y (@$x) { push @kmers, $y =~ /^(\w+)/; #first word of each element print "@kmers\n"; } } __END__ In the above code, print @kmers."\n"; #prints... 1 2 3 4 print @kmers,"\n"; #prints... firstRowCol1 firstRowCol1firstRowCol2 firstRowCol1firstRowCol2secondRowCol1 firstRowCol1firstRowCol2secondRowCol1secondRowCol2 print "@kmers\n"; #prints... firstRowCol1 firstRowCol1 firstRowCol2 firstRowCol1 firstRowCol2 secondRowCol1 firstRowCol1 firstRowCol2 secondRowCol1 secondRowCol2
      Update: I noticed that you were using tab characters in the code. This is not a good idea because a number of problems arise. Not the least of which is that there is no standard definition of "how long a tab should be". In your program editor, set the option "convert tabs to spaces". That way the indentation will look the same to me as it does to you even though I'm using a different editor.

      Another Update: As per the post from Hippo, there are certainly some folks who disagree with my opinion about tabs. I don't want to re-hash this, especially since this point was not a focus of the OP's original question. For those interested, read the thread comments and make up your own mind.

        I noticed that you were using tab characters in the code. This is not a good idea because ...

        For clarity, the use of tabs is a personal decision (or a coding standards one). See Tabs vs Spaces lets give this a go for when this was all hashed out previously, so there's no point in going over it again here. Just note that the above is Marshall's opinion on the matter only.