Re^2: Using hash keys to separate data

Marshall,

I suppose I was over-thinking it. Your method looks to read in constant time, and for a large input file that I'm working with I think that may be beneficial.

The only reason I included the key list is because I thought that would be the easiest way to separate the input data into separate files. However, the input data is already well-sorted so your method should work.

One more question in general: with my code and the suggestions everyone has given, there is still an error message (despite the fact that the output is correct). Is there any way to get rid of this error message or is it just something I am going to have to deal with? The message refers to uninitialized value errors, and I was trying to fix this before. However, I suppose if the output is still correct that is the only thing that matters.

Comment on Re^2: Using hash keys to separate data

Replies are listed 'Best First'.
Re^3: Using hash keys to separate data by Marshall (Canon) on Jun 29, 2011 at 22:43 UTC
Yes, once sorted, the algorithm just reads the file once in a linear fashion. So this should be great for your humongous file. The warning message should give you a line number in the code and often you also get the line number of the input file. One common way to get an uninitialized value is when there is a blank line in the file - this causes the split to fail (no results). An extra carriage return is easy to get missed since they are "invisible". I often put: next if /^\s*$/; which will go to the next input line if the current line contains nothing but white spaces. I think that I already mentioned that normally you probably should be splitting on the regex /\s+/ which is the default. white space (\s) includes all of the following: the space of course,\n\r\f\t any contiguous sequence of those gets removed. Splitting on just tab characters (\t) can cause problems if there are sometimes extra space characters in there that you cannot see with the editor. I think you are on the right track - keep at it!	[reply]

Replies are listed 'Best First'.

Re^3: Using hash keys to separate data
by Marshall (Canon) on Jun 29, 2011 at 22:43 UTC

The warning message should give you a line number in the code and often you also get the line number of the input file. One common way to get an uninitialized value is when there is a blank line in the file - this causes the split to fail (no results). An extra carriage return is easy to get missed since they are "invisible". I often put: next if /^\s*$/; which will go to the next input line if the current line contains nothing but white spaces.

I think that I already mentioned that normally you probably should be splitting on the regex /\s+/ which is the default. white space (\s) includes all of the following: the space of course,\n\r\f\t any contiguous sequence of those gets removed. Splitting on just tab characters (\t) can cause problems if there are sometimes extra space characters in there that you cannot see with the editor.

I think you are on the right track - keep at it!

[reply]