in reply to search and extract lines which contain a word

On a unix system:
grep 'gateway' infile > outfile
On a unix system:
perl -ne'print if /gateway/' infile > outfile
On a Windows system:
perl -ne"print if /gateway/" infile > outfile

An example using file handles:

# Usage: script.pl infile outfile # or # Usage: perl script.pl infile outfile use strict; use warnings; my ($qfn_in, $qfn_out) = @ARGV; open(my $fh_in, '<', $qfn_in) or die("Unable to read file \"$qfn_in\": $!\n"); open(my $fh_out, '>', $qfn_out) or die("Unable to create file \"$qfn_out\": $!\n"); while (<$fh_in>) { if (/gateway/) { print $fh_out $_; } }

Update: Fixed copy & paste error mentioned in reply.
Update: Must have been asleep! Fixed missing my mentioned in reply.

Replies are listed 'Best First'.
Re^2: search and extract lines which contain a word
by cdarke (Prior) on Mar 26, 2008 at 12:36 UTC
    Slight typo in the file handles code:
    open($fh_out, '<', $qfn_out)
    Should read::
    open($fh_out, '>', $qfn_out)
Re^2: search and extract lines which contain a word
by rudder (Scribe) on Mar 26, 2008 at 14:52 UTC

    Note, when using strict, both $fh_in and $fh_out must have "my" in front of them in the calls to open.

      I would argue that my should be there either way, but that Perl only detects the error when use strict; is missing.

      But that's debatable. Anyway, thanks. Fixed.

Re^2: search and extract lines which contain a word
by m@cky (Initiate) on Mar 27, 2008 at 06:38 UTC
    Hi, perl -ne "print if /gateway/" infile > outfile This method works great! But how can search for multiple strings? E.g. I want to search for lines containing the string 'gateway' and 'TIMESTAMP' and extract the entire line if found. Please kindly advice, thank you!
      Hi Experts, Forget my last question post. I've managed to get it done. Not sure if i got it done 'intelligently' though. =P Can anyone please advice on this instead? Based on the current code below, how do i EXCLUDE extracting lines that contain any of the array of names defined, although strings 'gateway' & 'timestamp' are found in the particular line? E.g. ## To exclude extracting lines with username 'andrew' ## dcndksckdsnckdc gateway jdscjscjsdh timestamp andrew dkcd
      print "Specify full directory of logfile.\n"; $file = <STDIN>; chomp $file; print "\nIdentified file is $file\n"; print "Is this correct? (y/n)\n"; $confirm = <STDIN>; chomp $confirm; if ($confirm eq "y") { print "\nCommencing work on $file.\n"; workings(); } else { print "Terminating program...\n"; } ## Subroutine to open & read logfile sub workings { print"In Subroutine workings now.\n"; open (LOGFILE, $file); print "Opening logfile...\n"; open (OUT, ">>", 'D:\Temp\test.txt') or die "$!"; print "Opening output file...\n"; print "Checking for string GATEWAY...\n"; while (<LOGFILE>) { if (/gateway/) { $count = $count + 1; print OUT; print "Extracting line ...\n"; } elsif (/TIMESTAMP/) { $count = $count + 1; print OUT; print "Extracting line ...\n"; } } print "\nTotal of $count lines extracted.\n"; close OUT; close LOGFILE; } ##Array to contain usernames @names = qw(mddinzam khairulz sawaikha caoyx jchew khamshae hartinib t +eev xiaolh yettynm yussofyu leegm narayam karuppa doraira edanor raza +liha sambanr jwong lamks linwb rashid ongsp ooisc mohdrosn);

        That prints out the lines containing either or both of "gateway" and "TIMESTAMP". Your if is equivalent to

        if (/gateway/ || /TIMESTAMP/) { $count = $count + 1; print OUT; print "Extracting line ...\n"; }

        However, I was under the impression you want lines containing both. If you want the lines containing both, your if would look like

        if (/gateway/ && /TIMESTAMP/) { $count = $count + 1; print OUT; print "Extracting line ...\n"; }

        As for excluding the listed names, one way is to start by building a regex using one of the following two methods:

        my ($exclude_re) = map qr/$_/, join '|', map quotemeta, @names;

        or

        use Regexp::List qw( ); my $exclude_re = Regexp::List->new()->list2re(@names);

        The just make sure the line doesn't match that regexp:

        if (/gateway/ && /TIMESTAMP/ && !/$exclude_re/) { $count = $count + 1; print OUT; print "Extracting line ...\n"; }
      perl -ne "print if /gateway/ && /TIMESTAMP/" infile > outfile
      you can have multiple alternatives:

      if /(gateway|TIMESTAMP)/

      A paren around the collection of alternatives, a | between each alternative.