Clear questions and runnable code
get the best and fastest answer
Re^2: Get unique fields from fileby Marshall (Canon)
|on Jan 07, 2022 at 03:49 UTC
I do like this general approach, however the OP is talking about a significant sized file of 500 MB. Depending upon the data of course, your HoH (hash of hash) structure could consume quite a bit more memory than the actual file size in MB.
I came up with a representation (at this post) where the column values only occur once as hash keys and the value of each hash key is an array describing whether a value: appears or doesn't appear at all in column, whether a value only appears once in a column, whether a value occurs more than once in a column.
We both interpreted "unique" to mean different things.
My data structure:
Of course I could generate your same output from my data structure because I know the columns where the term appeared more than once.