hash and arrays

sandy_1028 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: hash and arrays by ccn (Vicar) on Nov 08, 2008 at 11:28 UTC
`=>` is usually used to separate a hash key from a hash value. But it can be used istead of comma in other cases also. It is often more readable to use the `=>` operator between key/value pairs. The `=>` operator is mostly just a more visually distinctive synonym for a comma, but it also arranges for its left-hand operand to be interpreted as a string -- if it's a bareword that would be a legal simple identifier (`=>` doesn't quote compound identifiers, that contain double colons). This makes it nice for initializing hashes. perldoc perldata	[reply] [d/l] [select]
Re: hash and arrays by gone2015 (Deacon) on Nov 09, 2008 at 00:27 UTC
With apologies if a brief tutorial was not what you wanted... When to use Hash and Arrays in the program ? Arrays and Hashes are similar in that they are containers for multiple things (scalars), and you access those things using a form of key. In the case of an Array the key (usually called an index) is an integer -- so `$foo[25]` is item `25` in the array `@foo`. In the case of a Hash the key is a string -- so `$bar{'homburg'}` is the `homburg` item in the hash `%bar`. Arrays have an implied ordering: item 0 comes before item 1 and so on. The contents of an array may be treated as a list. Hashes have no ordering whatsover. So when `keys(%bar)` gives you a list of all the keys in the hash, they are in no predictable order -- in particular, not in the order things were put into the hash (except by pure chance) -- this is a common trap that people fall into. Arrays are used where your key values are simple integers (in a reasonable range), or as containers for lists. For example, the array `@days = ('Mon', 'Tue', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun')` can be accessed by index: `$days[4]` gives 'Fri', day 4 of the week, where day 0 is 'Mon'. Or treated as a list: `foreach my $d (@days)` sets `$d` to each day name in turn, from `'Mon'` to `'Sun'`. Hashes are used where your keys are arbitrary -- noting that the keys you use are converted to strings. For example, the hash `%day_num = ('Mon' => 0, 'Tue' => 1, 'Wed' => 2, 'Thu' => 3, 'Thurs' => 3, 'Fri' => 4, 'Sat' => 5, 'Sun' => 6)` can be accessed by the name of a day to get the day number: `$day_num{'Thurs'}` gives `3`. Note that `$day_num{'sat'}` will give `undef`, because there is no such entry -- hashes are very literal minded about the keys. The function `exists` will tell you whether a given key exists in the hash, so `exists($day_num{'sat'})` would return false, in this case. Other essential functions for using hashes are `keys`, `values` and `each`. You will also find `sort` used quite a bit with `keys`, to impose order on chaos. You can use a Hash to implement a sparse array. So, `$sa{79}, $sa{200000}, $sa{123456}` could be entries an a sparse array. Noting that the absent entries would all appear as `undef`. Hash keys are strings, but if you use a number, as above, it will be converted to a string -- Perl is broadminded, and generally does not discriminate between string and numeric values (which has its own little quirks, but that's another story). This is a big topic. I recommend a little reading ! In which scenario => is used? As others have said, broadly speaking there is no difference between '`=>`' and '`,`' apart from the appearance. I used '`=>`' between each key and its value in the "literal hash" above. I could just as well have used '`,`', but the '`=>`' makes the key/value pairs more obvious. However, that is not the whole story. Perl has the notion of a "bareword", whose genesis is lost in the mists of time. A bareword is simply a thing that looks like an identifier with no "sigil" (leading '`$`', '`@`', '`%`', ... etc) and no trailing '`()`. Deciding what a bareword means is a bit of a problem for Perl. If it knows of a subroutine which is defined to take no arguments, Perl will (generally) treat a bareword as a call of that subroutine -- this is how "constants" (as defined by `use constant`) are implemented. Otherwise Perl either has to know what to do, or must guess. If you `use strict` (and, frankly, you need a good reason not to), Perl will throw a compile time error rather than guess at the meaning of a bareword. There are cases where Perl knows what to do with barewords, and hashes are a prime example: when accessing a hash entry you can write `$bar{'homburg'}` or `$bar{homburg}`. Between the '`{}`' Perl treats a bareword as a literal string. Anything else is treated as an expression, whose result is converted to string form, if required. Note that in `$spa{0001}` the `0001` is not a bareword, it is trivial expression, whose result is converted to the string `'1'`. So `$spa{0001}` is not equivalent to `$spa{'0001'}`, appearance notwithstanding. this is where '`=>`' and '`,`' differ. The '`=>`' tells Perl to treat any bareword before it as a literal string. So the following are equivalent: `('Mon', 0)`, `('Mon' => 0)` and `(Mon => 0)`. Note that a bareword is not required in these cases, but if a bareword is present, it is treated as a literal string. The relationship between "constants" and barewords is slightly tricky. Consider: `use strict; use warnings ; use constant HOMBURG => 'HA' ; my %bar = (homburg => 'homburg!', HOMBURG => 'HomBurg!', HOMBURG, 'H +a!') ; print join(', ', %bar), "\n" ; my $k = HOMBURG ; print "\$k = HOMBURG -> \$k=$k\n" ; print "\$bar{homburg}=$bar{ homburg }, \$bar{HOMBURG}=$bar{HOMBURG}, + ", "\$bar{\$k}=$bar{$k}, \$bar{+HOMBURG}=$bar{+HOMBURG}\n" + ;` [download] whose output is: homburg, homburg!, HA, Ha!, HOMBURG, HomBurg! $k = HOMBURG -> $k=HA $bar{homburg}=homburg!, $bar{HOMBURG}=HomBurg!, $bar{$k}=Ha!, $bar{+HOMBURG}=Ha! which shows a number of things: where Perl is expecting (but not requiring) a bareword, it does not treat the bareword as a "constant". So in `HOMBURG => 'HomBurg!'` and `$bar{HOMBURG}` the `HOMBURG` is not the "constant" whose value is `'HA'`. This may or may not be a disappointment. this special handling means that `$bar{HOMBURG}` and `$bar{$k}` are not equivalent, even though `HOMBURG` has been assigned to `$k`. (If there wasn't a faint whiff of magic, it wouldn't be Perl.) if you want the value of the "constant" `HOMBURG` as a hash key, you need to persuade Perl that there's an expression. In this example I used '`+HOMBURG`', which is one convention. You could also write `$bar{HOMBURG()}`, to force Perl to treat it as the subroutine which the "constant" "is" (mostly, but that's another story). Though not shown, the same applies to `+HOMBURG => 'Ha!'` If you're still awake, you may be asking yourself why `+HOMBURG` doesn't generate an error, given that the value of `HOMBURG` is manifestly not numeric. (The clever people who know the answer to this one can leave now.) So: in numeric expressions Perl will generally accept a string that looks like a number, as a number. So if we start with the string `'123 456'` and split it `my ($a, $b) = split(/ /, '123 456')` it would be reasonable to suppose that `$a`, and `$b` were strings, and in a less enlightened language `$a + $b` would be an error (or might give `'123456'`). Perl happily returns the result `579`, and you may never have considered that this might be surprising. of course, if the string was `'zlxq 456'` then the addition will fail, because Perl has no defined way of adding `'zlxq'` and `'456'` together. (Which may, or may not, be a surprise.) some apparently numeric operations, however, are defined to work on strings which do not look like numbers. In particlar unary '`+`' and '`-`', so: `my $w = '0001' ; my $x = 'zlqq' ; my $y = '-zlxq' ; my $z = '+zlxq' +; print " \$w=$w, \$x=$x, \$y=$y, \$z=$z\n" ; print "+\$w=", +$w, ", +\$x=", +$x, ", +\$y=", +$y, ", +\$z=", +$z, +"\n" ; print "-\$w=", -$w, ", -\$x=", -$x, ", -\$y=", -$y, ", -\$z=", -$z +, "\n" ;` [download] gives: $w=0001, $x=zlqq, $y=-zlxq, $z=+zlxq +$w=0001, +$x=zlqq, +$y=-zlxq, +$z=+zlxq -$w=-1, -$x=-zlqq, -$y=+zlxq, -$z=-zlxq because unary '`+`' is defined to have no effect whatsover on its operand (so isn't in the slightest bit interested whether the operand looks like a number or like a bunch of bananas). Unary '`-`', on the other hand, will treat its operand a number, if it can; otherwise it prefixes the string with a '`-`' character; unless the string starts with '`+`' or '`-`' ... (yes, this is all defined behaviour). unary '`-`' is defined to accept a bareword. However, unlike '`=>`', "constants" are evaluated, so: `use strict; use warnings ; use constant HOMBURG => 'HA' ; print "-HOMBURG=", -HOMBURG, ", -foo=", -foo, "\n" ;` [download] gives: Ambiguous use of -HOMBURG resolved as -&HOMBURG() at sigs.pl line 5. -HOMBURG=-HA, -foo=-foo (where the warning indicates the degree of wonderfulness involved here). conversely, where we have a numeric value Perl will happily convert it to a string if required. So, `($w + 9) . $x` yields the string `'10zlxq'` (given that `$w = '0001'` and `$x = 'zlxq'`). This process is known as "stringification", and applies not just to simple numeric values but to almost everything -- in a number of simply wonderful ways. (In less enlightened languages there is a sharp distinction between strings and numbers, and it is up to the programmer to explicitly convert between the two.) so far, so good. For (most) numeric operations we expect that Perl (helpfully) will convert strings to numbers, if it can. And, for string operations we expect that Perl (helpfully) will convert numbers to strings. So there's no effective difference between numbers and strings that can be converted to numbers... ...up to a point. The bitwise operations behave differently where Perl thinks the operand is a string. For example, unary '`~`' will take a numeric value (forced to integer form) and return the '1's complement. If the value is a string, however, it will not attempt to convert it to a number, but will return a string with every bit of the original string inverted. Thus: `my $x = '0x1234' ; my $y = 0x1234 ; printf "\$x=%s, ~\$x=%s\n", show($x), show(~$x) ; printf "\$y=0x%X, ~\$y=0x%X\n", $y, ~$y ; printf "hex(\$x)=0x%X, ~\hex($x)=0x%X\n", hex($x), ~hex($x) ; printf "\$y=%s,\n ~\$y=%s\n", show($y), show(~$y) ; sub show { return '"'. join('', map { sprintf('\\x%02X', ord($_)) } split(//, + $_[0])) .'"' ; } ;` [download] produces: $x="\x30\x78\x31\x32\x33\x34", ~$x="\xCF\x87\xCE\xCD\xCC\xCB" $y=0x1234, ~$y=0xFFFFFFFFFFFFEDCB hex($x)=0x1234, ~hex(0x1234)=0xFFFFFFFFFFFFEDCB $y="\x34\x36\x36\x30", ~$y="\x31\x38\x34\x34\x36\x37\x34\x34\x30\x37\x33\x37\x30\x39\x35\x34\x36\x39\x35\x35" which shows the difference between the string operand `$x` and the numeric operand `$y`. (The last line shows that there is no sleight of hand here -- `show()` simply renders the string form of its argument in hex.) The binary bitwise operations '`&`', '`\|`' and '`^`' will convert a string operand to numeric form if the other is numeric. But if both arguments are strings, then it will perform the operation, byte-wise, between the strings. The `vec` function also treats strings as collections of bits. I leave as homework what happens with utf8 strings, and whether '`<<`' and '`>>`' will operate on strings. Of course, this begs the question: how can you tell when Perl thinks something is a string or a number. AFAIK this is not defined. However, the result of a numeric operation can be relied on to be a number, and the result of a string operation can be relied on to be a string. I/O operations work with strings, so watch out for: `use strict; use warnings ; my $z = <DATA> ; printf "\$z=%s, ~\$z=%s\n", show($z), show(~$z) ; sub show { return '"'. join('', map { sprintf('\\x%02X', ord($_)) } split(//, + $_[0])) .'"' ; } ; __DATA__ 0x1234` [download] which gives: $z="\x30\x78\x31\x32\x33\x34\x0A", ~$z="\xCF\x87\xCE\xCD\xCC\xCB\xF5" To convert a string to a number you can simply add zero (as in: `$x += 0 ;` or `($x + 0)`), though that assumes base 10. `oct` works for octal and numbers prefixed `0x` and `0b`, but it's a small disappointment that there isn't a single function that will convert a string as if it were a general Pel literal number. (Most numeric operations in Perl do not distinguish between floating point and integer arguments. Bitwise operations are an exception there too, requiring their operand(s) to be converted to integer form -- at which point it may start to matter to you whether your system suports 32 or 64 bit integers.) Finally, while on the topic of quirks in Perl's handling of strings... when evaluated as a boolean (true/false) value an empty string is false and a non-empty string is considered true, except for '0'. Note that this is not treating the string as a number -- to be false a non-empty string must be exactly one character long, and that character must be '`0`' ! So: `use strict; use warnings ; foreach my $z ('0', '0000', '+0', '-0', ' 0', '0 ', '0E0') { try($z) + ; } ; sub try { my ($z) = @_ ; my $bool = $z ? 'True' : 'False' ; my $zero = $z == 0 ? '==' : '!=' ; printf "%7s is %5s and %2s 0\n", "'$z'", $bool, $zero ; } ;` [download] gives: '0' is False and == 0 '0000' is True and == 0 '+0' is True and == 0 '-0' is True and == 0 ' 0' is True and == 0 '0 ' is True and == 0 '0E0' is True and == 0 You will often see code which tests strings thus: `if ($string)`. This is shorter than `if ($string ne '')`, and has the advantage of working if `$string` is `undef`. If `$string` can ever be `'0'`, however, you will regret not having written `if (defined($string) && ($string ne ''))`, clumsy though that may appear !	[reply] [d/l] [select]
Re: hash and arrays by jwkrahn (Abbot) on Nov 08, 2008 at 11:29 UTC
Any time you use `,` you can use `=>` instead.	[reply] [d/l] [select]
Re: hash and arrays by apl (Monsignor) on Nov 08, 2008 at 12:58 UTC
I strongly suggest you pick up a good introductory book like Programming perl by Wall and Schwartz.	[reply]
Re^2: hash and arrays by jwkrahn (Abbot) on Nov 09, 2008 at 15:53 UTC
That should be Programming Perl. (See: What's the difference between "perl" and "Perl"?) Or are you suggesting that sandy_1028 buy the first edition, which is out of print, because Randal is no longer a co-author of that particular book.	[reply]
Re^3: hash and arrays by apl (Monsignor) on Nov 10, 2008 at 11:28 UTC
Thank you for the correction. Dinosaur that I am, I simply picked up the copy I keep at my PC and cited the relevant information...	[reply]
Re: hash and arrays by zentara (Cardinal) on Nov 08, 2008 at 18:41 UTC
Both are basically a way to store data. Arrays are listed in numerical order, 0.. size_of_array. You retreive an element with `my $val = $array[$number];` [download] It is often desired to retreive the data by a name, instead of a number, so hashes are used. They are often called linked-lists in c. There are 2 lists, the keys and the values, so you can access data with $hash{$key_name}. Hashes are very convenient, but are slower. That is it in a nutshell. I'm not really a human, but I play one on earth Remember How Lucky You Are	[reply] [d/l]
Re^2: hash and arrays by ikegami (Patriarch) on Nov 08, 2008 at 19:44 UTC
[Hashes] are often called linked-lists in c. Hashes aren't linked lists. Not even close. Hashes are completely different than linked lists. A hash is an array of vectors of key-value pairs. The array is indexed by a hash of the key. Hash functions return a as-unique-as-possible number for a string, but always the same number for equivalent strings. The array grows as elements are added to the hash. This keeps the vectors very short if an appropriate hashing algorithm is used. The vectors could very well be implemented as linked lists, but each list contains only the elements that hash to the same bucket (array element). Hashes are much more efficient than linked lists. Since the first step to add, delete or fetch an element from the data structure is to locate the element in the data structure, let's consider how long that takes. With a linked list, every key needs to be compared against the searched key if the element is absent ( `O(N)`* ), or half that on average if the element is present ( Avg case `O(N)`* ). With a hash, only the keys that hash to the same bucket are compared. With a properly working hashing algorithm, that should be a tiny number no matter how many elements are in the data structure. ( Avg case `O(1)`* ) * — Treats the keys length as bounded, as usual.	[reply] [d/l] [select]
Re: hash and arrays by JadeNB (Chaplain) on Nov 10, 2008 at 00:31 UTC
Everybody else gave useful answers. I simply can't help myself. `=>` is used when you want to compute a minimum; as, for example, `[ $a => $b ]->[ $b <= $a ]`.	[reply] [d/l] [select]
Re^2: hash and arrays by repellent (Priest) on Nov 11, 2008 at 08:13 UTC
Not quite true. I will not recommend the use of `[ $a => $b ]->[ $b <= $a ]` [download] as it relies on too much syntax magic. The first set of brackets `[ ]` is used to indicate an anonymous array, whereas the second set of brackets `->[ ]` de-references the array. `=>` is used as a fat comma (not greater-than-or-equal-to), whereas `<=` is used as less-than-or-equal-to. Hence, `[ $a => $b ]->[ $b <= $a ] is same as [ $a, $b ]->[ ($b <= $a) ] which either evaluates to [ $a, $b ]->[0] # returning $a or [ $a, $b ]->[1] # returning $b` [download]	[reply] [d/l] [select]