Re: Vertical split (ala cut -d:) of a file

qhayaal,

It doesn't really matter what happens under the covers, the computer simply must go through the file line by line to find two things: the text you're cutting on (":"), and the newline to signify the next line starting. Whether you do this via while(<$fh>) { my ($field) = split /:/; do_stuff_with($field); }, or you do it via Text::xSV, or you do it via my @fields = `cut -d: -f1 $file` or even my @fields = `awk -F: '{print \$1}' $file`, the computer will go through each file, line by line, inspecting characters. (Note that with the split example, we want to give it an array context - split is really special in that it can "see" how many fields are wanted, and will only split into one more field than that - so it will only look for one ":" in the string, already being as efficient as split can be.)

If you really think you need the speed, first try it with one of the above (I would recommend one of the first two). If it really is too slow (I doubt it will be), there are some optimisations you can make:

Instead of split, use index to find the first ':', and use substr to extract the beginning of the line: my $field = substr($_, 0, index($_, ':')); The speed difference may be enough - although I would never write code this way without first seeing if split was fast enough since split is so much easier to use, and since we're already using an optimised split.
Instead of reading in an entire line, play with $/. However, this is reserved for really advanced users, IMO. (I think of myself as really advanced, and I would never do this.) The purpose is to get the read operation to scan the input for ':' as your separator for the first field, and then reset $/ back to normal for slurping the rest of the line so you can set it back to ':' for the first field of the next line. This will mean that you're scanning each character only once - whether it's for ':' or for "\n". However, it is also dangerous - if you have a line that has no ':'s, you will read it in with the next line, and get a vastly different answer here. To avoid this, you can look for "\n"s in the input and split on that ... but now we're back to scanning each character twice - once for "\n" and once for ":".
And next, the same as the last idea, except use awk to do it. Awk can scan for : and \n both at the same time (looping through each character once). The disadvantages are that you need to spawn another process (some overhead here which may eat up any savings), and that awk programming is harder than perl programming, IMO :-}.
Finally, write your own input routines which reads a block of data into memory, and loop through the characters one at a time. This is only for the seriously advanced, though, since it'll be really easy to munge this.

Now, having said all that, I want to re-iterate: TEST OUT THE SPLIT (or Text::xSV) FIRST. It's probably more than fast enough, with the least amount of effort. Most of the rest of the above suggestions will only shave a fraction of a percent from the time, if they shave anything, with huge amounts of programmer time dedicated to it, also meaning large chances of bugs to find and eradicate.

Comment on Re: Vertical split (ala cut -d:) of a file Select or Download Code