ashok13123 has asked for the wisdom of the Perl Monks concerning the following question:

I am having a file file.txt
january february egypt moon saturday
I want to search for each of these elements and report if any of them is missing. I can do it by
open(FH,"file.txt"); @arr=<FH>; $var=join("",@arr); if($var=~ /january/) { print "january\n"; } else { print "january not present\n"; } if($var=~/february/) { print "february\n"; } else { print "february not present\n"; } if($var=~/egypt/) { print "Egypt is present"; } else { print "Egypt not present"; } etc...........
How can I reduce the code size and I should be able to report if any element is missing?????? One more doubt..... If one of the element is to search is a regex like
a.*e <sample>(.*?)</sample> etc
How can I include those values in the search and maintain the properties of pattern matching???

Replies are listed 'Best First'.
Re: How to club different lines of program into one
by McDarren (Abbot) on May 25, 2009 at 14:42 UTC
    I think part of your question is missing. That is, you haven't specified which of the words in your file are supposed to be present and which are not.

    But in any case, consider the following:

    darren@dino:~/perl$ cat words.txt january february egypt moon saturday darren@dino:~/perl$ cat missing.pl #!/usr/bin/perl use strict; use warnings; my $word_file = 'words.txt'; my @required_words = qw/january larry_wall february holiday egypt moon + saturday/; my %words; open my $in, '<', $word_file or die "$!\n"; while (my $line = <$in>) { chomp $line; $words{$line}++; } close $in; for my $word (@required_words) { if ($words{$word}) { print "Required word $word is present\n"; } else { print "Required word $word is missing\n"; } } darren@dino:~/perl$ perl missing.pl Required word january is present Required word larry_wall is missing Required word february is present Required word holiday is missing Required word egypt is present Required word moon is present Required word saturday is present

    Hope this helps,
    Darren

Re: How to club different lines of program into one
by arc_of_descent (Hermit) on May 25, 2009 at 14:42 UTC
Re: How to club different lines of program into one
by AnomalousMonk (Archbishop) on May 25, 2009 at 14:47 UTC
    One possible approach:
    >perl -wMstrict -le "my @terms = qw(january foo february egypt moon saturday bar); my $search = qr{ @{[ join '|', @terms ]} }xms; my $str = 'in february the moon is not visible in egypt on saturday night'; my %count; ++$count{$1} while $str =~ m{ ($search) }xmsg; for my $term (@terms) { print qq{$term is }, $count{$term} ? '' : 'NOT ', 'present'; } " january is NOT present foo is NOT present february is present egypt is present moon is present saturday is present bar is NOT present
Re: How to club different lines of program into one
by GrandFather (Saint) on May 25, 2009 at 23:48 UTC

    If you use a hash to record your results you don't need to slurp the file and can generate a little more information. Consider:

    use strict; use warnings; my %matches = map {$_ => 0} qw(january february egypt); while (<DATA>) { chomp; ++$matches{$_} if exists $matches{$_}; } for my $word (sort keys %matches) { if (! $matches{$word}) { print "Didn't find $word.\n"; } elsif (1 == $matches{$word}) { print "Found $word once.\n"; } else { print "Found $word $matches{$word} times.\n"; } } __DATA__ january february january moon saturday

    Prints:

    Didn't find egypt. Found february once. Found january 2 times.

    True laziness is hard work
Re: How to club different lines of program into one
by akho (Hermit) on May 25, 2009 at 14:39 UTC
    use strict; use warnings; my @words = qw{january february egypt}; open my $file, '<', 'file.txt'; my $text = do { local $/; scalar <$file>; }; close $file; for my $word (@words) { print (($text =~ /\Q$word\E/) ? $word : "$word not present"); }

    Didn't test it, however.

    upd: fixed syntax error

      ...and the lack of testing shows.

      syntax error at 766040.pl line 7, near "close"

      You're missing a terminal semicolon at the end of line 6.

      where file.txt is:

      001: january 002: february 003: egypt 004: moon 005: saturday

      The presence or absence of the line numbers reflects laziness and slow downloads but makes no difference here.

      Suggestion: Use 3-arg opens and test each one (...|| die "Can't open $file: $!\n";.

      Also, IMO, McDarren's response below strikes an appropriate chord. If the list of "wanted" words is in file.txt, then testing for their presence merely burns cycles and inconveniences electrons to no purpose whatsoever.

      Hence, one might infer that OP failed to specify the issue adequately and that leads to another question: Is the intent to find the "wanted" words *anywhere* within the text or is it to test the text, line-by-line, and report per-line. One might guess from OP's wording that it's the former < update for clarity (in which case, slurping the file is fine [size issues aside] but would NOT be a good approach in the latter case ). but In /update> any case, leaving the reader guessing doesn't always get the best answer.

      But, all that said, a question (perhaps ignorant) for akho: why scalar <$file>; for this application?

        I was writing this on a machine without Perl, thus the syntax error and no testing; I also tried to be extra safe with context in the do. That scalar is not necessary.

        Sorry for the confusion, if there was any.

        As for the OP's intent: it is hard to understand it. But the title question was "How to club different lines of program into one", so I tried to do the same thing the OP's code is doing, but in less lines.

        And I don't have an excuse for not testing my open except that I usually use autodie.

Re: How to club different lines of program into one
by Marshall (Canon) on May 26, 2009 at 01:47 UTC
    "slurping" in a whole file into a scalar variable sounds like a good idea in this case. $text = <IN>; looks fine to me. There is no need for @text = <IN>.

    I would suggest the use of the Perl function index() rather than regex in this situation.

    index requires an exact match and so you should case search term and search text to be the same. But this "casing" operation is very fast. The index function will quit on the first match which is an advantage over regex this situation.

    As always, your mileage may vary! Short "how to" is shown below.

    #!/usr/bin/perl -w use strict; my @listOfWords = qw (january february egypt moon saturday zoos zoo thingies thing ); my $text = "moon. I love full moons but this it has been a long thing since yesterday on the beach. And a whole buch of BLAH.\nYet another february line.\n More jan stuff goes here. What a zoo this text searching thing can be!"; print"\n\nUsing ListOfWord Tokens\n"; foreach my $word (@listOfWords) { if ( index($text,"$word")>= 0) { print "word: $word\t found\n"; } else { print "word: $word\t NOT found\n"; } } __END__ Using ListOfWord Tokens word: january NOT found word: february found word: egypt NOT found word: moon found word: saturday NOT found word: zoos NOT found word: zoo found word: thingies NOT found word: thing found
      $text = <IN>; will not work like you want it to if you do not undef $/.

      Regexen stop after the first match; you may gain a performance benefit, but not for the reason you cite.

        Thanks for the clarifications!

        As far as performance goes, it probably doesn't make any difference. So here we just have another way of doing things.

Re: How to club different lines of program into one
by ig (Vicar) on May 26, 2009 at 20:06 UTC

    For yet another alternative that allows you to include regular expressions in the patterns to be found:

    use strict; use warnings; my $filename = 'file.txt'; open(FH, '<', $filename) or die "$filename: $!"; my $text = do { local $/; <FH> }; close(FH); my @patterns = qw(january february egypt a.*e <sample>(.*?)</sample> e +tc); print map { $text =~ /$_/ ? "$_: found\n" : "$_: not found\n" } @patterns; __END__ january: found february: found egypt: found a.*e: not found <sample>(.*?)</sample>: not found etc: not found

    Depending on how you want your patterns interpreted you might want to add the s or m modifiers to the pattern match, to change the behavior of '^', '$' and '.'. See perlre for details.