comment on

G'day TJCooper,

I see you already have answers regarding the main thrust of your question, i.e. "Memory usage". My response here touches on other aspects of your posted code.

Using [$index], [$index+1] and [$index+2] does not make it clear what data you're accessing. This results in code that's more difficult to read and maintain, as well as making it more error-prone. Consider the improvement in clarity if those appeared as these alternatives:

[$index]   -> [$index_of{Strand}]
[$index+1] -> [$index_of{Type}]
[$index+2] -> [$index_of{Pos}]
[download]

In "Re^2: Memory usage while tallying instances of lines in a .txt file", you show two potential formats for your input data. In the first format, the wanted columns are in the order that you've hard-coded them; in the second, the hard-coded order stays the same but they're in different positions (because an additional column has been added before them). Given your input is variable, it could potentially take on other variances in the future; for instance, an additional column could be added between your wanted columns or the order of those columns could change.

You can achieve the improvement in clarity indicated above, get rid of the need to load a module (i.e. List::Util) to handle a few dozen bytes of an 800MB file, and protect yourself against future changes, with this line of code:

@index_of{@headers} = 0 .. $#headers;
[download]

See "perldata: Slices" if you're unfamiliar with that construct. Here's example code using your two current formats and two potential future ones:

#!/usr/bin/env perl -l

use strict;
use warnings;

my @test_headers = (
    [qw{Strand Type Pos Length Form Adjustment}],
    [qw{ID Strand Type Pos Length Form Adjustment}],
    [qw{Strand XXX Type Pos Length Form Adjustment}],
    [qw{Pos Type Length Strand Form Adjustment}],
);

for (@test_headers) {
    my @headers = @$_;
    my %index_of;
    @index_of{@headers} = 0 .. $#headers;
    print "Headers:      @headers";
    print "Strand index: $index_of{Strand}";
    print "Type index:   $index_of{Type}";
    print "Pos index:    $index_of{Pos}";
}
[download]

Output:

Headers:      Strand Type Pos Length Form Adjustment
Strand index: 0
Type index:   1
Pos index:    2
Headers:      ID Strand Type Pos Length Form Adjustment
Strand index: 1
Type index:   2
Pos index:    3
Headers:      Strand XXX Type Pos Length Form Adjustment
Strand index: 0
Type index:   2
Pos index:    3
Headers:      Pos Type Length Strand Form Adjustment
Strand index: 3
Type index:   1
Pos index:    0
[download]

Another potential improvement would be to consider reading your input with Text::CSV (and, if you also have Text::CSV_XS installed, it will run more quickly). The CSV stands for comma-separated values; however, by changing the "sep_char" attribute, it works equally well for tab-, pipe-, whatever-separated values. Whenever you need to deal with data in these types of formats, I'd recommend reaching for this module first and only attempting to roll your own custom solution as a last resort.

— Ken

In reply to Re: Memory usage while tallying instances of lines in a .txt file by kcott
in thread Memory usage while tallying instances of lines in a .txt file by TJCooper

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.