(tye)Re: get the line of ith occurrence of '>' without OPENING THE FILE

I can understand everyone saying that what you are saying doesn't make sense.

I just wanted to drop a quick note saying there actually are cases where you can't open a huge file from within Perl but where you can do: cat huge.file | perlscript It is a strange concequence of how "large file support" was retrofitted into some operating system(s). In order to open a really large file under these operating systems you have to use a special version of open() that Perl will only use if you've compiled Perl to include "large file support". The operating system prevents you from opening the file otherwise just in case you plan on using seek() on the file handle.

In some ways I think this is a pretty silly decision, but then I haven't analyzed the situation much and yet I can still understand at least the temptation to do this so it is probably a reasonable design that just looks stupid until you understand the trade-offs better.

In any case, such operating system(s) will have a program called 'cat' that knows how to open and read such huge files so you just need to use: open( FILE, "cat $filename |" ) or die... and make very sure that $filename doesn't contain "interesting" characters. I'd show the secure way to do that, but I don't think there is one simple answer that covers all of the variables (versions of Perl, versions of operatings systems, etc.). I guess, since we are assuming 'cat', we can also assume /bin/sh and so

    open( FILE, qq<cat "\Q$filename\E" |> )
         or  die ...;
[download]

is probably bullet-proof.

- tye (echo "but my friends call me "'"'"$Tye"'"')

P.S. Please, please, please! Register a username at the site. It will make it easier for us to help you when you have these problems figuring out why you think your question has disappeared and will make it easier for you to find your questions and their answers.

Failing that, please, please, please, don't keep posting the same question over and over! You need to have a little patience. When you post a question, it doesn't appear on the main pages of the web site right away. That does not mean that your question has been lost and it doesn't even mean that there aren't people already reading your question and writing answers to it.

Comment on (tye)Re: get the line of ith occurrence of '>' without OPENING THE FILE Select or Download Code

Replies are listed 'Best First'.
Re: (tye)Re: get the line of ith occurrence of '>' without OPENING THE FILE by clintp (Curate) on Oct 05, 2002 at 23:13 UTC
If we're going as far as having access to "cat" and "shell", then it might be fun to tinker with "dd" as well. Then the OP can get to just the part of the file he wants to: `open(THIRDMB, "dd if=$filename bs=1024 skip=2048 count=1024\|") or die; while(<>) { ... } # Reads third MB of file only. close(THIRDMB);` [download] dd's useful in Unix for skipping around in large files at the shell level where seek(2) might be a bit more complicated.	[reply] [d/l]
Re: (tye)Re: get the line of ith occurrence of '>' without OPENING THE FILE by richardjfrench (Initiate) on Oct 05, 2002 at 22:47 UTC
Hi I'm new to Perl so I don't know how to do it that way. Conceptually though, from a UNIX/LINUX generalised standpoint cat filename \| wc -l Say this returns 80000 head -40000 filename > file1sthalf tail -40000 filename > file2ndhalf head -20000 file1sthalf > filename1stquarter tail -20000 file1sthalf > filename2ndquarter ...iterate for file2ndhalf ...then cut the resulting 4 quarters into 8ths by same method This doesn't count and evenly-distribute your >'s but it will give you 8 smaller files to play with. This method is crude and not at all Perlesque but I just don't have the knowledge ... yet: I'd be interested to see the Perl code to do what I described. Then run some Perl code to do your line of i th occurence Rich In the Gulf of Mexico Noone can hear you scream	[reply]
Re: Re: (tye)Re: get the line of ith occurrence of '>' without OPENING THE FILE by grazer (Sexton) on Oct 07, 2002 at 14:05 UTC
why this: cat filename \| wc -l when you can do: wc -l filename useless use of cat awards	[reply]