Re^2: Help on format or better way to do..?

Replies are listed 'Best First'.
Re^3: Help on format or better way to do..? by graff (Chancellor) on Nov 28, 2007 at 00:43 UTC
I don't know what you mean by "the mixture", and I don't understand how the "scenario" with "Bill" vs. "William" is different from what was handled in the earlier replies by toolic and me. If "B,C,D,E" are all nicknames for "A", and "A,F,G,H" are all nicknames for "B", and if the purpose of this data structure is to provide a one-shot lookup for a given string (i.e. to get the "immediate nickname set" for that string), then okay, that structure makes sense. (Well, sort of, I guess... but are you saying you have cases where A is a nickname for B and B is also a nickname for A? I'm having trouble with that.) But if the purpose is to pursue all possible "respelling" relations in a set (e.g. "A" can be respelled as any of "B,C,D,E", and for each of those, use the same structure to find all possible respellings), then you have a problem of circularity: A can be respelled as B, which can be respelled as A, which can be respelled as B, which... (infinite loop). Actually, it's not at all clear now what you are really trying to do, so I'm not sure what advice to give about the data structure. There are two basic directions that seem to be at issue: tracking many-to-one relations: for each member in a set of N "nicknames", relate it to a specific member in a set of M "real" names, where M < N, and two or more nicknames can refer to the same real name; this involves a simple hash where each hash key points to exactly one value, but the hash values can be non-unique. tracking one-to-many relations: for each member in a set of M "real" names, list the set of one or more "nicknames" that are synonymous; this involves a hash of arrays, and if this structure is derived from the many-to-one set described above, it cannot be the case that a given array value shows up under more than one hash key (because each "nickname" relates to only one "real name"). If the structure you are looking for is not one of those two, then you need to be more clear about what kind of structure you are looking for and how you want it to organize things. You seem to be giving us simplified fake examples, and maybe they are too simple or maybe they don't accurately reflect your data or your task. What are you really trying to do? UPDATE: There is a third direction you might be thinking about: many-to-many relations, e.g. a nickname like "Chas" might relate to both "Charles" and "Chastity", but "Chuck" is also a nickname for "Charles". This is another hash of arrays, where some array values (the "real" names in this case) can occur with two or more hash keys. In any case, the important thing is that the hash keys are one set of entities (e.g. nicknames), and the hash values, whether scalars or arrays, are a distinct set of entities (e.g. real names). Just don't get them confused.	[reply]
Re^4: Help on format or better way to do..? by learnperl (Acolyte) on Nov 29, 2007 at 22:59 UTC
Thanks Graff, Sorry I didn't get a chance to reply for this, My Grandmother passed away and it has been a very emotional time since she had a major impact on my life. Answer to your question which was "What am I trying to do here " This is module that I am creating to work along with few other modules. And finally I am planning on running my perl script against the website to get the data. What I want to do here is, when the user enter a name analyze if there are nicknames for that name and get those nick names. if the user enter name and if its a nick name lets say Will, it should give Willy, William as the results for "Will". If the user enter "William" then it should give Willy, Will as the result. Main Idea is to find multiple ways of representing name with the emphasize on nicknames. I dont have to put the spelling mistakes and other scenarios since I handle them in a different module. So in mathematically I am thinking of a set theory implementation I am sorry I miss lead you in my original post. the concept of the hash I am trying to have is, A => B, C, D, E B => A, C, D, E C => A, B, D, E D => A, B, C, E E => A, B, C, D I didn't put entire code since it has other subroutines that have no impact on this situation also I am new to perl so the code is 200 lines long. This is the entire code that I have which is related to the issue, sorry I should have put this before rather than putting a part of the code. #!/usr/bin/perl use Text::Soundex; use WWW::Mechanize; #use MakeRegex; $DEBUG = true; $SOUNDEX = 0; print STDOUT "\nEnter name: "; $name = <STDIN>; chomp $name; =pod $result = ucfirst( $name ); $totalResult .= $result . " "; print ( "\n---------- Entered Name\n\n" ); print ($result); print ( "\n\n---------- Transposed Letters\n\n" ); $result = lc transposeName( $name ); $totalResult .= $result . " "; print ($result); calcSoundex( $result ); print ( "\n\n---------- Dropped Character\n\n" ); $result = lc dropCharName( $name ); $totalResult .= $result ." "; print ($result); calcSoundex( $result ); =cut #hash table that contains nicknames %nicknames = ( Abe => 'Abraham', Abram => 'Abraham Abe', Bill => 'William Will', Will => 'Bill William willy', Richard => 'Rick Dick Ric Ricky', Rick => 'Richard Dick Ricky', ); sub matchNickname { my $nickTemp; foreach my $key(keys %nicknames) { my $value = $nicknames{$key}; if(lc $name eq lc $key){ print "user typed ".$name." Match the key ".$key." and val +ue is $value\n\n"; $nickTemp = $value; @nickArray = split '\W+', $nickTemp; foreach $nickTemp(@nickArray) { if($DEBUG){print "The name $name could be <$nickTemp> +or ";} } } #print "$key ==> $value\n"; #print "$nickTemp\n"; } #print "Value of nickTemp is $nickTemp\n"; } [download] Thanks LearnPerl	[reply] [d/l]
Re^5: Help on format or better way to do..? by graff (Chancellor) on Nov 30, 2007 at 02:03 UTC
the concept of the hash I am trying to have is, `A => B, C, D, E B => A, C, D, E C => A, B, D, E D => A, B, C, E E => A, B, C, D` [download] It strikes me as a bit wasteful to have so many copies of all the names. And it would be worthwhile to figure out an easy way to initialize the complete structure from the simplest possible listing of name sets. Here's what I would propose: #!/usr/bin/perl use strict; use warnings; # start with an array of arrays (list of lists, which can come from a +data file): # each row contains all the "synonymous" names in a set, and # each element in the row becomes a hash key whose value is # a reference to the whole row/set of synonymous names: my %altnames; # will be a HoA while (<DATA>) { my @nameset = split; for my $name ( @nameset ) { if ( ! exists( $altnames{$name} )) { $altnames{$name} = \@nameset; } else { my $newname = $name; $newname .= " " while ( exists( $altnames{$newname} )); $altnames{$newname} = \@nameset; } } } for my $name ( qw/Allen Bill Chuck Dave Edward Jan/ ) { my $modifier = ''; for ( grep /^$name\s*$/, keys %altnames ) { print "The name $_ is $modifier a member of the set @{$altname +s{$_}}\n"; $modifier = 'also'; } } __DATA__ Allen Al Charles Chuck Chas David Dave Edward Eddie Ed Janet Jan Janice Jan William Will Willie Bill Billie [download] That plays a little game with the name strings to make sure that you can keep track of different name sets containing the same nickname: just add spaces at the end of a previously seen nickname until it becomes a unique key in the altnames hash. Then when searching the hash, make sure to look for the target name optionally followed by spaces. (If the results are going to a web page, the extra spaces won't affect the display.) Note that by storing multiple references to the same array in the hash, you are not using extra memory to store copies of the names -- each nameset is stored exactly once.	[reply] [d/l] [select]