Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,
I wrote a code to find the word appeared in each line and column position.
Could you please simplify this code or better approach to write this code
my $word="test"; while(<DATA>) { if(/\b($word)\b/) { @arr = split(/\s+/,$_); $col = ""; $i=0; foreach $k (@arr) { if($k eq "$word") { $col.=($i+1)."\t"; } $i++; } print "The word repeated in Line ".$.." and in column ".$col."\n"; } } __DATA__ This is a test from tester okay nothing message test center test test in proress ... test one test two
Thanks

Replies are listed 'Best First'.
Re: Can anyone simplify this code
by GrandFather (Saint) on Jan 11, 2007 at 09:18 UTC

    It can be cleaned up a little using statement modifiers and defaults for split:

    use strict; use warnings; my $word="test"; while(<DATA>) { next if ! /\b($word)\b/; my @arr = split; my $col = ""; my $i=0; foreach my $k (@arr) { $i++; $col .= "$i\t" if $k eq $word; } print "The word repeated in Line $. and in column $col\n"; } __DATA__ This is a test from tester okay nothing message test center test test in proress ... test one test two

    Prints:

    The word repeated in Line 1 and in column 4 The word repeated in Line 3 and in column 2 4 The word repeated in Line 4 and in column 1 5 7

    DWIM is Perl's answer to Gödel
      Thanks a lot for your help
Re: Can anyone simplify this code
by johngg (Canon) on Jan 11, 2007 at 10:40 UTC
    You can let the regular expression do the work of finding where the word is and printing it's positions. Doing a global match in an empty while loop, it uses the @- array which records the start positions of "last match", see also @+ for end positions. It also uses a regular expression code block (?{...}) to print the position once the match has been found.

    use strict; use warnings; use re q{eval}; my $word = q{test}; my $rxWord = qr{\b$word\b}; # This is the one the does the work. # my $rxWordPos = qr{\b($word)\b(?{print $-[0], q{ }})}; while (<DATA>) { next unless m{$rxWord}; print qq{Match found on line $., column }; while (m{$rxWordPos}g) {;} print qq{\n}; } __END__ This is a test from tester okay nothing message test center test test in proress ... test one test two a tester in this line

    Here's the output

    Match found on line 1, column 10 Match found on line 3, column 8 20 Match found on line 4, column 0 20 29

    I hope this is of use.

    Cheers,

    JohnGG

    Update: Simpler version eliminating the regular expression code block.

    use strict; use warnings; my $word = q{test}; my $rxWord = qr{\b($word)\b}; while (<DATA>) { next unless m{$rxWord}; print qq{Match found on line $., column }; while (m{$rxWord}g) { print qq{$-[0] }; } print qq{\n}; } __END__ This is a test from tester okay nothing message test center test test in proress ... test one test two a tester in this line
      Thanks a lot for your help and solution
Re: Can anyone simplify this code
by ambrus (Abbot) on Jan 11, 2007 at 11:45 UTC

    Take care with that code because your \b regular expression has a different idea about words then the \s one. For example, if you change the first data line to this:

    This is a test-thingy from tester okay
    then you'll get the following wierd output.
    The word repeated in Line 1 and in column

    This may or may not be a problem depending on your data.

    To correct, you shouldn't use a separate condition to determine if the word occurs in a line and then search for the positions of the word in a different way, unless that's really necessary for performance reasons. That's code duplication and it's not surprising that it causes problems.

    Here's an example of how you do the search only once.

      Thanks a lot for the solution
Re: Can anyone simplify this code
by shmem (Chancellor) on Jan 11, 2007 at 10:38 UTC
    while(<DATA>) { while(/\b$word\b/g) { print "The word repeated in Line $. and in column ", scalar(split /\s+/, $`)+1,"\n"; } }

    although this throws

    Use of implicit split to @_ is deprecated

    if running with -w. Dunno why - I'm splitting $PREMATCH, no?

    <update>

    ambrus and Melly pointed me into the right direction - I overlooked the to. In void or scalar context, split will assign it's result to @_, which usage is deprecated.

    scalar(my @s = split /\s+/, $`)+1

    fixes that. This version produces the OP's output:

    while(<DATA>) { my @arr = (); push @arr, scalar(my @s=split/\s+/,$`)+1 while /\b$word\b/g; print "The word repeated in Line $. and in column ", join("\t",@arr) ,"\n" if @arr; }

    </update>

    Note also that this has performance hits. See Devel::SawAmpersand. (Anybody knows how to rewrite this using captures and $1 instead of $` ?)

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Thanks a lot for your help.
Re: Can anyone simplify this code
by Samy_rio (Vicar) on Jan 11, 2007 at 10:05 UTC

    Hi, Try like this,

    TIMTOWTDI

    my $word="test"; while(<DATA>) { next if !/$word/i; while(/\b($word)\b/gi) { my $pre = $`; my $col; ($pre eq "") ? ($col = 0):($col = split/\s+/, $pre); print "The word repeated in Line ".$.." and in column ".++$col +."\n"; } } __DATA__ This is a test from tester okay nothing message test center test test in proress ... test one test two __END__ Output as: ---------- The word repeated in Line 1 and in column 4 The word repeated in Line 3 and in column 2 The word repeated in Line 3 and in column 4 The word repeated in Line 4 and in column 1 The word repeated in Line 4 and in column 5 The word repeated in Line 4 and in column 7

    Updated

    Regards,
    Velusamy R.


    eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';

      Hi Velusamy,
      Thanks a lot. But There is some problem in the output...
      I made a small change in your code
      <code> ($pre eq "") ? ($col = 0):($col = split/\s+/, $pre); <code> Now its giving proper output.
      The word repeated in Line 1 and in column 4
      The word repeated in Line 3 and in column 2
      The word repeated in Line 3 and in column 4
      The word repeated in Line 4 and in column 1
      The word repeated in Line 4 and in column 5
      The word repeated in Line 4 and in column 7
      Thanks