Keep It Simple, Stupid | |
PerlMonks |
Re: Get unique fields from fileby davido (Cardinal) |
on Jan 06, 2022 at 16:53 UTC ( [id://11140224]=note: print w/replies, xml ) | Need Help?? |
I didn't see specifically what you are having trouble with. It didn't seem like there was a specific question. It is correct that using a hash is a good approach. I think that the uniq function is probably not where I would immediately go because of the fact that you want unique values per field, and in order to use uniq for that, you would have to hold the whole file in memory at once (even if there's a high rate of duplication within fields). Rather, I would do the unique filtering early on, line by line. That way, if there are a lot of collisions within any given field, you're only holding onto one instance, which could be a lot more memory friendly. You're dealing with a style of CSV. It's '|' separated csv, so |sv, but I prefer using a CSV parser for that so that I don't have to deal with the intricacies of embedded / escaped separators. The Text::CSV module can grab the headers for you and can already pre-organize the data into header => value pairs for you. Here's an example of how that could look:
The meat here is within the csv_to_unique_within_field function. Pass into the function a filehandle, and a separator. If no separator is provided, assume comma. The function does this:
After this I just print the headers and all the fields they contained. Since we filtered-in only unique per header, it's just a straighforward datastructure print. Dave
In Section
Seekers of Perl Wisdom
|
|