comment on

As a programmer and teacher of the Perl programming language, I often get destabilizing questions. In one of the last class I gave, while I was talking about hashes, someone asked me "What is it used for? When would I ever need that?" Of course, for me (and you too, probably) hashes are quite practical, but being told that, on the spot, I didn't know what to say, so I talked about the %ENV hash and made an example with it.

Today I found an interesting use for hashes. I wish I would have thought of it during my class but I didn't, so I would like to share it with you for the benefit of newer Perl programmers.

Imagine you have to read a Space Separated Value file or Comma Separated Value (CSV) file. It's easy because the fields are always in the same order. For example:

# firstname lastname age
joe builder 9
bob plumber 66
dora squarepants 10
diego simpson 11
[download]

You can do this:

open( $l, "<file" ) || die "Error : $!";
my @lines = <$l>;
close( $l );

foreach my $line ( @lines ) {
  
  # Skipping if the line is empty or a comment
  next if ( $line =~ /^\s*$/ );
  next if ( $line =~ /^\s*#/ );
  
  my ($firstname, $lastname, $age) = split( /\s+/, $line );
  
  # then do whatever you have to
}
[download]

But then someday someone give you a new file with the fields in a different order plus new extra fields you don't need. Here is the new file:

# lastname firstname age gender phone
mcgee bobby 27 M 555-555-5555
kincaid marl 67 M 555-666-6666
hofhazards duke 22 M 555-696-6969
[download]

What do you do? Do you change your code with a if statement? Do you alter the file to change the order of the fields and remove the extra fields? No! You use hashes!

Here is the solution:

open( $l, "<file" ) || die "Error : $!";
my @lines = <$l>;
close( $l );

my @keys = split( /\s+/, $lines[0] );
shift( @keys ); # to remove the # as the first field

foreach my $line ( @lines ) {
  
  # Skipping if the line is empty or a comment
  next if ( $line =~ /^\s*$/ );
  next if ( $line =~ /^\s*#/ );
  
  my %hash;
  @hash{ @keys } = split( /\s+/, $line );
  
  # then do whatever you have to
}
[download]

Note that the first line in the file is important, it gives you the order of the fields. Even if it's not there when you receive the file, you can easily add it. Note the @hash{ } syntax. This is called a slice. You are slicing the hash using the array form, basically to access a list of element from the hash. The @keys array contains a list of keys in the same order written at the top of the file therefore, doing @hash{ @keys } is like doing @hash{ qw(lastname firstname age gender phone) } or @hash{ 'lastname', 'firstname', 'age', 'gender', 'phone' } except it doesn't matter if the fields in the file are not always in the same order as in the previous file.

The split of the line returns a list so doing this:

@hash{ @keys } = split( /\s+/, $line );

is the same as this:

@hash{'lastname', 'firstname', 'age', 'gender', 'phone' } = split( /\s+/, $line );

or this:

($hash{'lastname'}, $hash{'firstname'}, $hash{'age'}, $hash{'gender'}, $hash{'phone'}) = split( /\s+/, $line );

Also if some fields are not needed, you don't care. As long as all the required fields are there, your code will always work.

I hope this will be useful for you someday! Good luck!

A for will get you from A to Z; a while will get you everywhere.

-- greengaroo

In reply to Cool way to parse Space Separated Value and CSV files by greengaroo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.