Reading Text File into Hash to Parse and Sort

Perl_Derek has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reading Text File into Hash to Parse and Sort by Laurent_R (Canon) on Apr 21, 2015 at 21:21 UTC
Just some comments: - Your data separator is apparently the pipe ("\|") character. Why are you splitting on the tab ("\t") character? When you do: `print <FILE>; #Prints entire file` [download] this gets the file handler to the end of the file, so that the next line: `while (<FILE>){` [download] will not get any line. - You say `#Date` for the first field of @array, but I fail to see a date in your input data. - If you are thinking of sorting data, then you probably don't want to use a hash, but rather an array. What you are trying to do with your code is not entirely clear to me, but this might get you closer to what you want: `#!/usr/bin/perl use strict; use warnings; my $file_name = 'FB_LPWAP.txt'; open my $FILE, '<', $file_name or die "ERROR: Could not open file $!"; + # better syntax for opening a file # print <$FILE>; don't do that, you will get no data in the while loop while (<$FILE>){ my @array = split /\t/, $_; # with the data shown, you should spli +t on pipe or on a pattern such as /\\|\s*/ print join "\t", @array, "\n"; } close ($FILE);` [download] But as I said, your program does not seem to match the data you've shown. Also, it does not make much sense to split data on tabs and then to print the fields separated by a tab. There was really no need to split it in the first place (unless you wanted to split on pipes and separate the data with tabs, but, then, there would probably be easier ways to replace pipes by tabs, using regular expressions, for example). Je suis Charlie.	[reply] [d/l] [select]
Re: Reading Text File into Hash to Parse and Sort by toolic (Bishop) on Apr 21, 2015 at 21:08 UTC
Consider loading your data into a hash-of-hashes structure: use warnings; use strict; my %data; <DATA>; # Discard header while (<DATA>) { chomp; my ($tick, @cols) = split /\s\\|\s/; my %comp; @comp{qw(name price cap ind)} = @cols; $data{$tick} = { %comp }; } use Data::Dumper; $Data::Dumper::Sortkeys=1; print Dumper(\%data); __DATA__ TICKER\| CO. NAME\| PRICE\| MARKET CAP\| INDUSTRY ABC \| ABC Co.\| 15.5\| 5000\| Industrials AB \| Alpha Beta\| 12\| 2500\| Materials DOZ \| ZZZZZ\| 5.05\| 2800\| Telecom DX \| DX Co.\| 77.2\| 12000\| Industrials DXX \| DXX Co.\| 50.25\| 9000\| Utilities [download] Prints: `$VAR1 = { 'AB' => { 'cap' => '2500', 'ind' => 'Materials', 'name' => 'Alpha Beta', 'price' => '12' }, 'ABC' => { 'cap' => '5000', 'ind' => 'Industrials', 'name' => 'ABC Co.', 'price' => '15.5' }, 'DOZ' => { 'cap' => '2800', 'ind' => 'Telecom', 'name' => 'ZZZZZ', 'price' => '5.05' }, 'DX' => { 'cap' => '12000', 'ind' => 'Industrials', 'name' => 'DX Co.', 'price' => '77.2' }, 'DXX' => { 'cap' => '9000', 'ind' => 'Utilities', 'name' => 'DXX Co.', 'price' => '50.25' } };` [download] Refer to perldsc. You need to decide how to manipulate it from there.	[reply] [d/l] [select]
Re^2: Reading Text File into Hash to Parse and Sort by SimonPratt (Friar) on Apr 22, 2015 at 08:30 UTC
Not bad. I would change the way the keys are generated slightly, though. This change allows the data vendor to do weird things, like change the order of headers or add / remove columns without affecting your underlying process (unless they completely balls it up). `my %item_map = ( 'CO.NAME' => 'name', 'MARKETCAP' => 'cap', 'INDUSTRY' => 'ind' ); chomp(my $headers = <DATA>); my @headers = map {s/\s//g; ($item_map{$_} \|\| lc $_)} split /\s\\|\s/ +, $headers;` [download] Yes, purists may point out that this is serious over-engineering. I would retort that if they had to maintain loading filesets from a couple of hundred different FTSE feeds they might rethink their position (though to be fair, FTSE are a lot better these days)	[reply] [d/l]
Re: Reading Text File into Hash to Parse and Sort by NetWallah (Canon) on Apr 21, 2015 at 20:59 UTC
Take out "print <FILE>;". Because that will require you to reset the EOF pointer before re-reading. Instead, you can put a simple "print;" inside the "while" loop. Also, your separators are "\| ", not "\t". Change the "split". Your column rferences are lost because your code is not formatted correctly. please follow the suggestions above. "You're only given one little spark of madness. You mustn't lose it." - Robin Williams	[reply]
Re: Reading Text File into Hash to Parse and Sort by GotToBTru (Prior) on Apr 21, 2015 at 21:30 UTC
Nothing in your post uses hashes, so we can't even imagine what kinds of problems you are having. But let's assume you want a hash to the different columns of each row. You ultimately want an array of hashes, one array element for each line in the file, each array element a hash with the columns. `#!/usr/bin/perl use strict; use warnings; my @columns = qw/Date ClosingPrice AveragePrice/; my (@lines,@array); open (FILE, '<', 'FB_LPWAP.txt') or die ("ERROR: Could not read file") +; foreach my $line (<FILE>){ push @array, do { my %h; @h{@columns} = split /\t/,$line; \%h }; } close (FILE); foreach my $row (@array) { foreach my $col (reverse keys %{$row}) { printf "%s=>%s ",$col,$row->{$col} } print "\n"; }` [download] Update #1 - thanks to tye for the hash slice solution. Update #2 - by the time I figured all this out, there were already better solutions. Hope I learned something ... Dum Spiro Spero	[reply] [d/l]
Re^2: Reading Text File into Hash to Parse and Sort by Perl_Derek (Initiate) on Apr 22, 2015 at 16:29 UTC
Thank you for all of the great feedback. It will help in my understanding of Perl to see how the Monks work through problems. The example code I provided worked for me. It allowed me to parse data from a text file using arrays, but I am trying to do this on a larger scale with a new file with hashes. Ultimately, I am trying to learn how to create a hash to sort data by a specific column from a text file. I would also like to learn how to parse data from a text file using hashes. I don't know how to reference all columns from a text file using hashes and then isolate one specific column to sort or parse. This is one of many knowledge gaps for me. There is nothing in the books or posts I have read which shows me how to do this. It seems to be a pretty basic, but I just can't find this information. I am just getting started with Perl, so I will go back to the books and come back with more well-organized questions in the future. Thank you for your time and all of your responses.	[reply]
Re^3: Reading Text File into Hash to Parse and Sort by Anonymous Monk on Apr 22, 2015 at 19:51 UTC
I am trying to do this on a larger scale with a new file with hashes. Ultimately, I am trying to learn how to create a hash to sort data by a specific column from a text file. I would also like to learn how to parse data from a text file using hashes. Tux has already pointed out the better tool: Text::CSV. It will handle all of the details people forget about when dealing with delimited files like what you have. Assuming the first line of the file has a list of column names, you'll end up with something like: use strict; use warnings; use Text::CSV; # I had to specify "XS" in the use line and in the call to new, below #use Text::CSV_XS; open my $fh, '<', 'filename.txt' or die "Cannot open filename.txt: $!\n"; my $parser = Text::CSV->new({ sep_char => '\|' }); $parser->column_names($parser->getline($fh)); # $parser->getline_hr_all() returns a reference to an array; # which is easy enough to use, but the @{...} syntax unpacks # it to an array, which you might find convenient my @rows = @{$parser->getline_hr_all($fh)}; close $fh; # sort can take a block of code to specify how things should # be sorted, which in your case would be "by which columns" # use cmp for string sort; <=> for numeric sort, and # Unicode::Collate ( https://metacpan.org/pod/Unicode::Collate ) # for anything complex for my $row (sort { $a->{col_name} cmp $b->{col_name} } @rows) { ... } for my $row (sort { $a->{number} <=> $b->{number} } @rows) { ... } [download]	[reply] [d/l]
Re: Reading Text File into Hash to Parse and Sort by GotToBTru (Prior) on Apr 21, 2015 at 19:15 UTC
Please consult this link to see how to improve the formatting of your post. It is much easier to help when we can actually read it. Dum Spiro Spero	[reply]
Re^2: Reading Text File into Hash to Parse and Sort by Perl_Derek (Initiate) on Apr 21, 2015 at 19:47 UTC
That is not how I entered the information. Thank you for the link.	[reply]
Re^3: Reading Text File into Hash to Parse and Sort by GotToBTru (Prior) on Apr 21, 2015 at 19:54 UTC
It might not be how you entered it .. but it is how it displays. Put <c> .. </c> tags around your code and it might just look like what you entered. Example: I entered the following pair of statements exactly the same. The first pair has no tags. The second does. this is on one line this is on the next line `this is on one line this is on the next line` [download] Dum Spiro Spero	[reply] [d/l]
Re: Reading Text File into Hash to Parse and Sort by Tux (Canon) on Apr 22, 2015 at 16:45 UTC
Looks like a variation of CSV, so I am triggered :) Changing the `\|` to a `TAB` is left an exercise to the user :P $ perl -MText::CSV_XS=csv -MData::Peek -e'DDumper csv (in => "test.csv +", key => "TICKER", sep => "\|", allow_whitespace => 1)' { AB => { 'CO. NAME' => 'Alpha Beta', INDUSTRY => 'Materials', 'MARKET CAP' => 2500, PRICE => 12, TICKER => 'AB' }, ABC => { 'CO. NAME' => 'ABC Co.', INDUSTRY => 'Industrials', 'MARKET CAP' => 5000, PRICE => '15.5', TICKER => 'ABC' }, DOZ => { 'CO. NAME' => 'ZZZZZ', INDUSTRY => 'Telecom', 'MARKET CAP' => 2800, PRICE => '5.05', TICKER => 'DOZ' }, DX => { 'CO. NAME' => 'DX Co.', INDUSTRY => 'Industrials', 'MARKET CAP' => 12000, PRICE => '77.2', TICKER => 'DX' }, DXX => { 'CO. NAME' => 'DXX Co.', INDUSTRY => 'Utilities', 'MARKET CAP' => 9000, PRICE => '50.25', TICKER => 'DXX' } } [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re: Reading Text File into Hash to Parse and Sort by AnomalousMonk (Archbishop) on Apr 21, 2015 at 20:31 UTC
I see the OP now uses a `<br />` tag on each code line within a paragraph block. Please Update your post to use `<c> ... </c>` or `<code> ... </code>` tags around each block of code, data or input/output. Please see Markup in the Monastery and Writeup Formatting Tips. Give a man a fish: `<%-(-(-(-<`	[reply] [d/l] [select]
Re: Reading Text File into Hash to Parse and Sort by Anonymous Monk on Apr 21, 2015 at 21:03 UTC
One key point ... and you may as well clarify it now, not later ... is that the data-structure you are using here is an array, not a hash. In some languages (PHP, say ...), there is no difference between the two: what Perl calls a hash, PHP actually calls an array. Perl draws a sharp distinction between the two, and does not confuse its nomenclature. In Perl, arrays (and lists) are ordered collections, referenced by numeric index (≥ 0). A hash is an unordered collection referenced by a string key. Both are one-dimensional, although “references” can be used to mimic multi-dimensional structures. _{(slight hand-waving intended in the previous description of things.)}	[reply]