Re: Re: Searching through text files

I have to disagree. Depending on how complex your regex is and what you want to do once you find the string in the file, grep is usually NOT more efficient than Perl. One major reason is that grep uses a text-directed regex engine. It searches for the "best match" in a string so it has to search through the whole string even if it finds a match before it reaches the end. Perl uses a regex-directed engine and it returns the "left-most match". So the instant it finds a match, the Perl regex engine returns that match and moves on.

I've seen this demonstrated when dealing with very large log files(>2G). Perl was able to do in a few minutes what it was taking grep 10+ minutes to accomplish.

Check out Jeffrey Friedl's book Mastering Regular Expressions. It's an absolutely fascinating read on regexes and regex engines.

Later

Comment on Re: Re: Searching through text files

Replies are listed 'Best First'.
Re: Re: Re: Searching through text files by ambrus (Abbot) on Mar 24, 2004 at 14:00 UTC
While this is definitely true for some regexps, Anonymous says in his question: I have to write a script to search for a string, it has to search for a string in about a thousand different text files. For strings, probably `grep -F` is fast enough. If, however, the needle string contains newlines or nul characters, then this may be difficult to achieve with grep, so it may be better for perl. Also, if the file has very long lines (or no newlines at all), you can't make grep print only where the string is, it either wants to print the whole line, or the line number, or only give you a truth value. In such cases, Perl may be better (or some other program). Also, on windows, if you only have find installed, no real grep, you may have to use Perl.	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Re: Searching through text files
by ambrus (Abbot) on Mar 24, 2004 at 14:00 UTC

While this is definitely true for some regexps, Anonymous says in his question:

I have to write a script to search for a string, it has to search for a string in about a thousand different text files.

For strings, probably grep -F is fast enough.

If, however, the needle string contains newlines or nul characters, then this may be difficult to achieve with grep, so it may be better for perl. Also, if the file has very long lines (or no newlines at all), you can't make grep print only where the string is, it either wants to print the whole line, or the line number, or only give you a truth value. In such cases, Perl may be better (or some other program). Also, on windows, if you only have find installed, no real grep, you may have to use Perl.

[reply]
[d/l]