Perl_Derek has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am just getting started with Perl and I have not used any programming languages in the past. I hope you can help me with this.

I am not able to effectively use hashes to parse, sort or replace data from a text file. I have read through a couple of books (Learning Perl, Perl by Example) and browsed Perl websites, but I am still having difficulty grasping the concepts. For example, I am trying to take a text file and sort the results in any way I choose. I can't seem to find a real-world example where I read a file into a hash and parse/sort and/or replace data. Here is my code for parsing data using arrays which works:
#!/usr/bin/perl use strict; use warnings; open (FILE, '<', 'FB_LPWAP.txt') or die ("ERROR: Could not read file" +); print <FILE>; #Prints entire file while (<FILE>){ my @array = split /\t/, $_; print "$array[0]\t"; #Date print "$array[1]\t"; #Closing Price print "$array[2]\n"; #Weighted Average Price } close (FILE);


I still have a lot to learn, but I am hoping someone could help point me in the right direction as I have not be able to use hashes on a text file that has multiple columns of data without receiving a number of syntax/compilation errors. The text file I have has multiple columns and I would like to parse certain columns using hashes. I also have would like to sort the data. I don't know how to reference my columns. Here is the sample data in my test file:

TICKER| CO. NAME| PRICE| MARKET CAP| INDUSTRY ABC | ABC Co.| 15.5| 5000| Industrials AB | Alpha Beta| 12| 2500| Materials DOZ | ZZZZZ| 5.05| 2800| Telecom DX | DX Co.| 77.2| 12000| Industrials DXX | DXX Co.| 50.25| 9000| Utilities


Thank you.

Replies are listed 'Best First'.
Re: Reading Text File into Hash to Parse and Sort
by Laurent_R (Canon) on Apr 21, 2015 at 21:21 UTC
    Just some comments:

    - Your data separator is apparently the pipe ("|") character. Why are you splitting on the tab ("\t") character?

    When you do:

    print <FILE>; #Prints entire file
    this gets the file handler to the end of the file, so that the next line:
    while (<FILE>){
    will not get any line.

    - You say #Date for the first field of @array, but I fail to see a date in your input data.

    - If you are thinking of sorting data, then you probably don't want to use a hash, but rather an array.

    What you are trying to do with your code is not entirely clear to me, but this might get you closer to what you want:

    #!/usr/bin/perl use strict; use warnings; my $file_name = 'FB_LPWAP.txt'; open my $FILE, '<', $file_name or die "ERROR: Could not open file $!"; + # better syntax for opening a file # print <$FILE>; don't do that, you will get no data in the while loop while (<$FILE>){ my @array = split /\t/, $_; # with the data shown, you should spli +t on pipe or on a pattern such as /\|\s*/ print join "\t", @array, "\n"; } close ($FILE);
    But as I said, your program does not seem to match the data you've shown. Also, it does not make much sense to split data on tabs and then to print the fields separated by a tab. There was really no need to split it in the first place (unless you wanted to split on pipes and separate the data with tabs, but, then, there would probably be easier ways to replace pipes by tabs, using regular expressions, for example).

    Je suis Charlie.
Re: Reading Text File into Hash to Parse and Sort
by toolic (Bishop) on Apr 21, 2015 at 21:08 UTC
    Consider loading your data into a hash-of-hashes structure:
    use warnings; use strict; my %data; <DATA>; # Discard header while (<DATA>) { chomp; my ($tick, @cols) = split /\s*\|\s*/; my %comp; @comp{qw(name price cap ind)} = @cols; $data{$tick} = { %comp }; } use Data::Dumper; $Data::Dumper::Sortkeys=1; print Dumper(\%data); __DATA__ TICKER| CO. NAME| PRICE| MARKET CAP| INDUSTRY ABC | ABC Co.| 15.5| 5000| Industrials AB | Alpha Beta| 12| 2500| Materials DOZ | ZZZZZ| 5.05| 2800| Telecom DX | DX Co.| 77.2| 12000| Industrials DXX | DXX Co.| 50.25| 9000| Utilities

    Prints:

    $VAR1 = { 'AB' => { 'cap' => '2500', 'ind' => 'Materials', 'name' => 'Alpha Beta', 'price' => '12' }, 'ABC' => { 'cap' => '5000', 'ind' => 'Industrials', 'name' => 'ABC Co.', 'price' => '15.5' }, 'DOZ' => { 'cap' => '2800', 'ind' => 'Telecom', 'name' => 'ZZZZZ', 'price' => '5.05' }, 'DX' => { 'cap' => '12000', 'ind' => 'Industrials', 'name' => 'DX Co.', 'price' => '77.2' }, 'DXX' => { 'cap' => '9000', 'ind' => 'Utilities', 'name' => 'DXX Co.', 'price' => '50.25' } };

    Refer to perldsc. You need to decide how to manipulate it from there.

      Not bad. I would change the way the keys are generated slightly, though. This change allows the data vendor to do weird things, like change the order of headers or add / remove columns without affecting your underlying process (unless they completely balls it up).

      my %item_map = ( 'CO.NAME' => 'name', 'MARKETCAP' => 'cap', 'INDUSTRY' => 'ind' ); chomp(my $headers = <DATA>); my @headers = map {s/\s//g; ($item_map{$_} || lc $_)} split /\s*\|\s*/ +, $headers;

      Yes, purists may point out that this is serious over-engineering. I would retort that if they had to maintain loading filesets from a couple of hundred different FTSE feeds they might rethink their position (though to be fair, FTSE are a lot better these days)

Re: Reading Text File into Hash to Parse and Sort
by NetWallah (Canon) on Apr 21, 2015 at 20:59 UTC
    Take out "print <FILE>;".

    Because that will require you to reset the EOF pointer before re-reading.

    Instead, you can put a simple "print;" inside the "while" loop.

    Also, your separators are "| ", not "\t". Change the "split".

    Your column rferences are lost because your code is not formatted correctly. please follow the suggestions above.

            "You're only given one little spark of madness. You mustn't lose it."         - Robin Williams

Re: Reading Text File into Hash to Parse and Sort
by GotToBTru (Prior) on Apr 21, 2015 at 21:30 UTC

    Nothing in your post uses hashes, so we can't even imagine what kinds of problems you are having. But let's assume you want a hash to the different columns of each row. You ultimately want an array of hashes, one array element for each line in the file, each array element a hash with the columns.

    #!/usr/bin/perl use strict; use warnings; my @columns = qw/Date ClosingPrice AveragePrice/; my (@lines,@array); open (FILE, '<', 'FB_LPWAP.txt') or die ("ERROR: Could not read file") +; foreach my $line (<FILE>){ push @array, do { my %h; @h{@columns} = split /\t/,$line; \%h }; } close (FILE); foreach my $row (@array) { foreach my $col (reverse keys %{$row}) { printf "%s=>%s ",$col,$row->{$col} } print "\n"; }

    Update #1 - thanks to tye for the hash slice solution. Update #2 - by the time I figured all this out, there were already better solutions. Hope I learned something ...

    Dum Spiro Spero

      Thank you for all of the great feedback. It will help in my understanding of Perl to see how the Monks work through problems.

      The example code I provided worked for me. It allowed me to parse data from a text file using arrays, but I am trying to do this on a larger scale with a new file with hashes.

      Ultimately, I am trying to learn how to create a hash to sort data by a specific column from a text file. I would also like to learn how to parse data from a text file using hashes.

      I don't know how to reference all columns from a text file using hashes and then isolate one specific column to sort or parse. This is one of many knowledge gaps for me.

      There is nothing in the books or posts I have read which shows me how to do this. It seems to be a pretty basic, but I just can't find this information.

      I am just getting started with Perl, so I will go back to the books and come back with more well-organized questions in the future.

      Thank you for your time and all of your responses.

        I am trying to do this on a larger scale with a new file with hashes.

        Ultimately, I am trying to learn how to create a hash to sort data by a specific column from a text file. I would also like to learn how to parse data from a text file using hashes.

        Tux has already pointed out the better tool: Text::CSV. It will handle all of the details people forget about when dealing with delimited files like what you have. Assuming the first line of the file has a list of column names, you'll end up with something like:
        use strict; use warnings; use Text::CSV; # I had to specify "XS" in the use line and in the call to new, below #use Text::CSV_XS; open my $fh, '<', 'filename.txt' or die "Cannot open filename.txt: $!\n"; my $parser = Text::CSV->new({ sep_char => '|' }); $parser->column_names($parser->getline($fh)); # $parser->getline_hr_all() returns a reference to an array; # which is easy enough to use, but the @{...} syntax unpacks # it to an array, which you might find convenient my @rows = @{$parser->getline_hr_all($fh)}; close $fh; # sort can take a block of code to specify how things should # be sorted, which in your case would be "by which columns" # use cmp for string sort; <=> for numeric sort, and # Unicode::Collate ( https://metacpan.org/pod/Unicode::Collate ) # for anything complex for my $row (sort { $a->{col_name} cmp $b->{col_name} } @rows) { ... } for my $row (sort { $a->{number} <=> $b->{number} } @rows) { ... }
Re: Reading Text File into Hash to Parse and Sort
by GotToBTru (Prior) on Apr 21, 2015 at 19:15 UTC

    Please consult this link to see how to improve the formatting of your post. It is much easier to help when we can actually read it.

    Dum Spiro Spero
      That is not how I entered the information. Thank you for the link.

        It might not be how you entered it .. but it is how it displays. Put <c> .. </c> tags around your code and it might just look like what you entered. Example: I entered the following pair of statements exactly the same. The first pair has no tags. The second does.

        this is on one line this is on the next line
        this is on one line this is on the next line
        Dum Spiro Spero
Re: Reading Text File into Hash to Parse and Sort
by Tux (Canon) on Apr 22, 2015 at 16:45 UTC

    Looks like a variation of CSV, so I am triggered :)

    Changing the | to a TAB is left an exercise to the user :P

    $ perl -MText::CSV_XS=csv -MData::Peek -e'DDumper csv (in => "test.csv +", key => "TICKER", sep => "|", allow_whitespace => 1)' { AB => { 'CO. NAME' => 'Alpha Beta', INDUSTRY => 'Materials', 'MARKET CAP' => 2500, PRICE => 12, TICKER => 'AB' }, ABC => { 'CO. NAME' => 'ABC Co.', INDUSTRY => 'Industrials', 'MARKET CAP' => 5000, PRICE => '15.5', TICKER => 'ABC' }, DOZ => { 'CO. NAME' => 'ZZZZZ', INDUSTRY => 'Telecom', 'MARKET CAP' => 2800, PRICE => '5.05', TICKER => 'DOZ' }, DX => { 'CO. NAME' => 'DX Co.', INDUSTRY => 'Industrials', 'MARKET CAP' => 12000, PRICE => '77.2', TICKER => 'DX' }, DXX => { 'CO. NAME' => 'DXX Co.', INDUSTRY => 'Utilities', 'MARKET CAP' => 9000, PRICE => '50.25', TICKER => 'DXX' } }

    Enjoy, Have FUN! H.Merijn
Re: Reading Text File into Hash to Parse and Sort
by AnomalousMonk (Archbishop) on Apr 21, 2015 at 20:31 UTC

    I see the OP now uses a  <br /> tag on each code line within a paragraph block. Please Update your post to use  <c> ... </c> or  <code> ... </code> tags around each block of code, data or input/output. Please see Markup in the Monastery and Writeup Formatting Tips.


    Give a man a fish:  <%-(-(-(-<

Re: Reading Text File into Hash to Parse and Sort
by Anonymous Monk on Apr 21, 2015 at 21:03 UTC

    One key point ... and you may as well clarify it now, not later ... is that the data-structure you are using here is an array, not a hash.

    In some languages (PHP, say ...), there is no difference between the two:   what Perl calls a hash, PHP actually calls an array.   Perl draws a sharp distinction between the two, and does not confuse its nomenclature.

    In Perl, arrays (and lists) are ordered collections, referenced by numeric index (≥ 0).   A hash is an unordered collection referenced by a string key.   Both are one-dimensional, although “references” can be used to mimic multi-dimensional structures.

    (slight hand-waving intended in the previous description of things.)