Quickly reading the first line of a large number of files...

Snowman has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Quickly reading the first line of a large number of files... by scain (Curate) on Aug 08, 2001 at 17:31 UTC
Non-Perl answer: `head -1 files \| grep "#!" > shebang.txt` Obviously a unix/linux answer. Scott Do'h! Before anyone else writes, this won't exactly work, since you'll only get the #! line, but not the file it came from. Gimme a second... Update 2: OK, you didn't ask for the file it came from, so it is a workable solution. If you want the file as well, do the head, do something like this: @temp = `head -1 files`; for ($i=0;$i<scalar @temp;$i++) { if ($temp[$i] =~ /^#!/) { #fixed a typo print "$temp[$1-1], $temp[$i]\n"; } } [download]	[reply] [d/l] [select]
Re: Quickly reading the first line of a large number of files... by arturo (Vicar) on Aug 08, 2001 at 17:34 UTC
In order to read the first line, you have to open the file, one way or another; the only question is whether you do it in Perl or not. I suppose you could write a routine in C, but since you ask this on a Perl site, I suppose that's not in the cards. You could use (e.g.) your system's `grep` utility, telling it to print the line number and filename only, and then filter that list for filenames that match only on the first line, but that will scan the whole file each time. The best place to look for a speedup would be, I suppose to cut down on the number of files you have to search by using filetests -- certainly skip directories, for example, and maybe you could assume that only executable files have #! lines worth looking at, not that that's a safe assumption, necessarily. HTH `perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; other_name = rose; print "$oth +er_name\n"'` [download]	[reply] [d/l] [select]
Re: Quickly reading the first line of a large number of files... by $code or die (Deacon) on Aug 08, 2001 at 17:37 UTC
Obviously I can open each file and check the first line, but this is time consuming. I can't think of any other way of doing it without "something" having to open the file! `my @perlscripts; foreach (@files) { open HANDLE, $_ or die "can't open $_"; push @perlscripts, $_ if <HANDLE> =~ /^#!/; close HANDLE; } print $_, "\n" for @perlscripts;` [download] Error: Keyboard not attached. Press F1 to continue.	[reply] [d/l]
Re: Quickly reading the first line of a large number of files... by Hofmator (Curate) on Aug 08, 2001 at 17:33 UTC
If on nix system then maybe use the `head` command, something like my @lines = grep {/^#!/} `head -1 -q `;but it depends on what you do with it if the overhead of the system call is justified by the quicker execution. Benchmark it ... -- Hofmator	[reply] [d/l]
Re: Quickly reading the first line of a large number of files... by Snowman (Acolyte) on Aug 08, 2001 at 17:41 UTC
The list of files has already been striped on those ending in recognizable extensions. I think it going to be out with benchmark and test perl open vs unix 'head -1'.. Thanks -- The Snowman snowman@notreally.co.uk	[reply]
Re (tilly) 2: Quickly reading the first line of a large number of files... by tilly (Archbishop) on Aug 09, 2001 at 09:31 UTC
If all of the files are in the same directory with many others, then that may be one of your performance problems. Many file operations have to scan the directory listing, and as the listing gets long, this gets inefficient. If you can scatter your files across many different directories in some way, you will probably get a performance increase. I believe that the Reiser filesystem is also specifically designed to be fast in this case, and others probably are as well, so you may be able to improve performance at the OS level. (Then again, you may not, and that kind of performance tweak is likely to be forgotten when the next admin looks at the system.)	[reply]
Re: Quickly reading the first line of a large number of files... by c-era (Curate) on Aug 08, 2001 at 17:32 UTC
The only other alternative I can think of is to use a system call, and that is usually longer. I didn't find anything with a quick search on cpan, but someone will corect me if I am wrong.	[reply]