Can anyone simplify this code

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Can anyone simplify this code by GrandFather (Saint) on Jan 11, 2007 at 09:18 UTC
It can be cleaned up a little using statement modifiers and defaults for split: `use strict; use warnings; my $word="test"; while(<DATA>) { next if ! /\b($word)\b/; my @arr = split; my $col = ""; my $i=0; foreach my $k (@arr) { $i++; $col .= "$i\t" if $k eq $word; } print "The word repeated in Line $. and in column $col\n"; } __DATA__ This is a test from tester okay nothing message test center test test in proress ... test one test two` [download] Prints: `The word repeated in Line 1 and in column 4 The word repeated in Line 3 and in column 2 4 The word repeated in Line 4 and in column 1 5 7` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^2: Can anyone simplify this code by Anonymous Monk on Jan 11, 2007 at 12:28 UTC
Thanks a lot for your help	[reply]
Re: Can anyone simplify this code by johngg (Canon) on Jan 11, 2007 at 10:40 UTC
You can let the regular expression do the work of finding where the word is and printing it's positions. Doing a global match in an empty while loop, it uses the `@-` array which records the start positions of "last match", see also `@+` for end positions. It also uses a regular expression code block `(?{...})` to print the position once the match has been found. `use strict; use warnings; use re q{eval}; my $word = q{test}; my $rxWord = qr{\b$word\b}; # This is the one the does the work. # my $rxWordPos = qr{\b($word)\b(?{print $-[0], q{ }})}; while (<DATA>) { next unless m{$rxWord}; print qq{Match found on line $., column }; while (m{$rxWordPos}g) {;} print qq{\n}; } __END__ This is a test from tester okay nothing message test center test test in proress ... test one test two a tester in this line` [download] Here's the output `Match found on line 1, column 10 Match found on line 3, column 8 20 Match found on line 4, column 0 20 29` [download] I hope this is of use. Cheers, JohnGG Update: Simpler version eliminating the regular expression code block. `use strict; use warnings; my $word = q{test}; my $rxWord = qr{\b($word)\b}; while (<DATA>) { next unless m{$rxWord}; print qq{Match found on line $., column }; while (m{$rxWord}g) { print qq{$-[0] }; } print qq{\n}; } __END__ This is a test from tester okay nothing message test center test test in proress ... test one test two a tester in this line` [download]	[reply] [d/l] [select]
Re^2: Can anyone simplify this code by Anonymous Monk on Jan 11, 2007 at 12:35 UTC
Thanks a lot for your help and solution	[reply]
Re: Can anyone simplify this code by ambrus (Abbot) on Jan 11, 2007 at 11:45 UTC
Take care with that code because your `\b` regular expression has a different idea about words then the `\s` one. For example, if you change the first data line to this: `This is a test-thingy from tester okay` [download] then you'll get the following wierd output. `The word repeated in Line 1 and in column` [download] This may or may not be a problem depending on your data. To correct, you shouldn't use a separate condition to determine if the word occurs in a line and then search for the positions of the word in a different way, unless that's really necessary for performance reasons. That's code duplication and it's not surprising that it causes problems. Here's an example of how you do the search only once. Read more... (719 Bytes)	[reply] [d/l] [select]
Re^2: Can anyone simplify this code by Anonymous Monk on Jan 11, 2007 at 12:36 UTC
Thanks a lot for the solution	[reply]
Re: Can anyone simplify this code by shmem (Chancellor) on Jan 11, 2007 at 10:38 UTC
while(<DATA>) { while(/\b$word\b/g) { print "The word repeated in Line $. and in column ", scalar(split /\s+/, $`)+1,"\n"; } } [download] although this throws `Use of implicit split to @_ is deprecated` [download] if running with `-w`. Dunno why - I'm splitting `$PREMATCH`, no? <update> ambrus and Melly pointed me into the right direction - I overlooked the to. In void or scalar context, split will assign it's result to `@_`, which usage is deprecated. scalar(my @s = split /\s+/, $`)+1 [download] fixes that. This version produces the OP's output: while(<DATA>) { my @arr = (); push @arr, scalar(my @s=split/\s+/,$`)+1 while /\b$word\b/g; print "The word repeated in Line $. and in column ", join("\t",@arr) ,"\n" if @arr; } [download] </update> Note also that this has performance hits. See Devel::SawAmpersand. (Anybody knows how to rewrite this using captures and `$1` instead of $` ?) --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply] [d/l] [select]
Re^2: Can anyone simplify this code by Anonymous Monk on Jan 11, 2007 at 12:32 UTC
Thanks a lot for your help.	[reply]
Re: Can anyone simplify this code by Samy_rio (Vicar) on Jan 11, 2007 at 10:05 UTC
Hi, Try like this, TIMTOWTDI my $word="test"; while(<DATA>) { next if !/$word/i; while(/\b($word)\b/gi) { my $pre = $`; my $col; ($pre eq "") ? ($col = 0):($col = split/\s+/, $pre); print "The word repeated in Line ".$.." and in column ".++$col +."\n"; } } __DATA__ This is a test from tester okay nothing message test center test test in proress ... test one test two __END__ Output as: ---------- The word repeated in Line 1 and in column 4 The word repeated in Line 3 and in column 2 The word repeated in Line 3 and in column 4 The word repeated in Line 4 and in column 1 The word repeated in Line 4 and in column 5 The word repeated in Line 4 and in column 7 [download] Updated Regards, Velusamy R. eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@\|6%,53!-9@2~j';	[reply] [d/l] [select]
Re^2: Can anyone simplify this code by Anonymous Monk on Jan 11, 2007 at 12:27 UTC
Hi Velusamy, Thanks a lot. But There is some problem in the output... I made a small change in your code <code> ($pre eq "") ? ($col = 0):($col = split/\s+/, $pre); <code> Now its giving proper output. The word repeated in Line 1 and in column 4 The word repeated in Line 3 and in column 2 The word repeated in Line 3 and in column 4 The word repeated in Line 4 and in column 1 The word repeated in Line 4 and in column 5 The word repeated in Line 4 and in column 7 Thanks	[reply]