shynee has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have 26 cisco switch dumps. These are large files with close to a million files each. I tried opening individual files and reading them. But it is extremely slow. Ended up using Unix grep instead, which is fast.

Here is my code.

 my $hba_alias = `grep $host_hi $switchdir/*.swc|grep ":  device-alias"|head -1`;`

At some places I am grepping individual files like this

 my $hba_swport = `grep $hba_wwn $switchdir/$switchfqdn|grep "^fc"|head -1`

I want to grep without opening the files which is lot faster and also I can get away from using unix commands. Any ideas how can I accomplish this ?

TIA

-Shynee

Replies are listed 'Best First'.
Re: grep equivalent in perl
by afoken (Chancellor) on Nov 12, 2011 at 18:39 UTC

    Please use <c> and </c> around your code, as you were told when you wrote your posting. Please update your existing posting now.

    I think your problem is that you read the entire file into memory. That does not work well for large files.

    To re-implement grep in perl, use open, a while loop wheat reads a file line by line, a pattern matching, and finally close. If you need to search more than one file, wrap that into another loop that iterates over all files to be searched.

    Skeleton:

    foreach my $fn (qw( /what/ever.file /another/file.txt /third/file/here + )) { open my $f,'<',$fn or die "Can't open '$fn': $!"; while (<$f>) { # <-- special case, RTFM chomp; # <-- you don't want the newlines, do you? if (/interesting/) { print "Found in line $. of file '$fn': $_\n"; } } close $f; }

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: grep equivalent in perl
by graff (Chancellor) on Nov 12, 2011 at 19:36 UTC
    Based on the OP snippets, it looks like you just want a small amount of output from all those input files: if the value of "$host_hi" occurs together with ": device-alias" in any of the *.swc files, you only keep the first line of the first file where the match was found. If that's all you really need, you can save a lot of time by only reading data until a match is found:
    my $switchdir = '/some/path'; my $host_hi = qr/some pattern of interest/; my $hba_alias; my $found_name; for my $name ( <$switchdir/*.swc> ) { open my $in, '<', $name or do { warn "unable to open $name: $!\n"; next; }; while (<$in>) { if ( /$host_hi/ and /:\s+device-alias/ ) { $hba_alias = $_; $found_name = $name; last; } } last if $hba_alias; # as soon as we find a match, we're done } print "match found in $found_name: $hba_alias";
    (updated to make sure the matched file name would be accessible after exiting the for loop)

    As for grepping in individual files for other patterns (and assigning initial matches, if any, to other variables), it depends on what you really are doing, but I'd be inclined to set up a hash of file names with patterns to search for, and would use that to store matches that are found, as well:

    my %seeking = ( $switchfqdn => { regex => qr/^fc.*?$hba_wwn/ }, ... ); for my $filename ( keys %seeking ) { open my $in, '<', $filename or do { warn "$filename: $!\n"; next; }; while (<$in>) { if ( /$seeking{$filename}{regex}/ ) { $seeking{$filename}{found} = $_; last; } } if ( $seeking{$filename}{found} ) { print "found in $filename: $seeking{$filename}{found}"; else { warn "no matches in $filename for $seeking{$filename}{regex}\n +"; } }
    And if any of the specific file searches involve *.swc files, there's probably a reasonable way to include the file-specific searches inside the loop that handles all *.swc files, so that if you happen to be looking for more than one pattern in a given file, you can get all the results you want without having to read any file more than once.

    One more update: you didn't give any clues about what sorts of patterns you're looking for, but reading the file contents in perl and applying perl regexes to the data is both a lot safer and a lot more flexible than trying to interpolate a regex onto a shell command line that you pass to a system() call. Using things like backslashes, whitespace, parens, brackets, *? and so on is pretty common in regexes, and pretty hard to control when passing stuff to a shell.

Re: grep equivalent in perl
by ww (Archbishop) on Nov 12, 2011 at 18:55 UTC
    Title question: "grep equivalent in Perl"?
                Answer: grep

    But that's not the same as the key question in your narrative; namely "how do I ' grep without opening the files...'?"

    At least as I understand it, that translates to "I want to read something (inside) files without opening the file."

    Better to try for an answer to that at Magic_Monks or Sorcerers,Inc.

    So, looking for another approach involving Perl, and assuming you meant -- in your second sentence -- either (less likely:) 'these are close to a million large files' or (more likely, I guess1) 'These files contain close to a million lines each':

    • Are you opening any of these files more than once? If so, refactoring to avoid redundant opens might be helpful
    • Are your pipes setting your system to thrashing the swap disc? If so, do you have enough RAM to hold an entire file; can you refactor to use that capability?
    1 Making us guess what you mean can be a matter of ignorance of a subject matter, sloppy writing or sloppy thought. If it's not the first of those, see I know what I mean. Why don't you?; in fact, see it anyway.
    See also: On asking for help and How do I post a question effectively?.
Re: grep equivalent in perl
by davido (Cardinal) on Nov 12, 2011 at 19:18 UTC

    How can you expect us to help you with code you didn't show?

    The specification: You would like to search a bunch of really big files and print lines that match triggers.

    Qualifying the solution: You need speed, and don't want to use system grep.

    What you've tried so far: ????????????????

    How ???????????? failed: It was too slow.

    What you tried instead: system call to grep

    How that failed: You don't want to use a system call.

    What you need: Help with ???????????????? so that it can be faster.

    What we can do for you: Look at ????????????????? and offer suggestions on how ?????????????? might be improved upon so that it's faster.

    Where we must start: ?????????????????????

    Does this make sense? You are seeking help with a solution that you came up with that was too slow. But instead of showing us that solution, you provided us with an alternative that you don't want to use because it requires a system call. We could come up with a lot of ideas on how to do what you're asking for, but how do we know if our solution is any better than the one you tried but didn't show us? I'm not going to spend time working up a solution only to have you tell me that you already tried that.


    Dave

Re: grep equivalent in perl
by hbm (Hermit) on Nov 16, 2011 at 21:46 UTC

    In your first example, you grep all of your "large" files, and then use only the first result. How many large files are you opening unnecessarily?

    How does this compare:

    my $hba_alias = `for f in $switchdir/*.swc ; do grep $host_hi \$f & +& break; done |grep ": device-alias"`;

    That's analogous to graff's last if $hba_alias;