in reply to Repeating the same command in different portions of input

tobyink has provided an excellent solution. For your future reference--and in case the need arises again--there are Perl modules that can be used for parsing the kind of text you have. Here's an example that uses Mojo::DOM to parse your <a> tags:

use strict; use warnings; use Mojo::DOM; my $text = <<END; <a> word1 word2 word3 </a> <a> word4 word5 </a> <a> word6 word7 </a> END my $dom = Mojo::DOM->new($text); my $i = 1; for my $chunk ( $dom->find('a')->each ) { print 'Chunk ' . $i++ . ': ' . $chunk->text . "\n"; }

Output:

Chunk 1: word1 word2 word3 Chunk 2: word4 word5 Chunk 3: word6 word7

Thus, each group that you need to analyze is contained by $chunk->text within the for loop.

Hope this helps!

Replies are listed 'Best First'.
Re^2: Repeating the same command in different portions of input
by albascura (Novice) on Jan 15, 2013 at 20:50 UTC

    It really helps, thanks.

    I was wondering. I see that $chunk->text doesn't preserve the new line at the end of each word. Since I need to check stuff that are in lines (I did simply my code a little in the previous example) I was wondering if I could do something like these:

    for my $chunk ( $dom->find('s')->each ) { my @values = split('\n', $chunk); foreach $line (@values) { do stuff on every line } }

    I'm trying it right now. I hope it works.

    Thanks again!

      Yes, splitting the 'chunk' is a good solution! However, since you've noticed the chunk lacks newlines, change:

      my @values = split('\n', $chunk);

      to

      my @values = split /\s+/, $chunk;
      • This splits on whitespace
      • It uses a regex, not a string literal (also, '\n' would not be interpolated into a newline since you've used single quotes)
      • Parentheses are optional