Creating a text index for a text file

texuser74 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a plain ascii text file sorted alphabetically.

e.g.

AA for apple

A for apple

BB for ball

B for ball

C for ....

i need to insert :A: before the first line that starts with A, similarly :B:, :C: ...till :Z:. Only at the first occurance of A, B and so on.

i.e. my out put should be like

:A:

AA for apple

A for apple

:B:

BB for ball

B for ball

:C:

C for ....

:E:...

:Z:...

Z for Zebra

Please help me

janitored by ybiC: Retitle from "Find", balanced<tt>'s around example text for legibility

Comment on Creating a text index for a text file

Replies are listed 'Best First'.
Re: Find by Roger (Parson) on Nov 26, 2003 at 08:54 UTC
I think this should be a good exercise and learning experience for you. There are many solutions ... Solution 1 `#!/usr/local/bin/perl -w use strict; my %letter; while (<DATA>) { print("$_"), next if /^\s$/; # print and skip empty lines my $c = substr $_, 0, 1; # get first character if (! exists $letter{$c}) { # have we seen it before? $letter{$c}++; print ":$c:\n"; } print; } __DATA__ AA for apple A for apple BB for ball B for ball C for ....` [download] And the output is - `:A: AA for apple A for apple :B: BB for ball B for ball :C: C for ....` [download] Solution 2* A more perl-ish solution, with regular expressions `#!/usr/local/bin/perl -w use strict; my %letter; foreach (<DATA>) { s/^(.)/$letter{$1}++ ? $1 : ":$1:\n$1"/e; print; } __DATA__ AA for apple A for apple ...` [download] Solution 3 Another variant, read the file into a scalar and do search and replace in one go: `#!/usr/local/bin/perl -w use strict; local $/; my $data = <DATA>; my %letter; $data =~ s/^(.)/$letter{$1}++ ? $1 : ":$1:\n$1"/emg; print "$data"; __DATA__ AA for apple A for apple ...` [download]	[reply] [d/l] [select]
Re: Re: Find by sgifford (Prior) on Nov 26, 2003 at 09:34 UTC
We really don't need to keep a hash of letters, since the file is already sorted. We just need to keep track of what the last letter seen was: `#!/usr/local/bin/perl -w use strict; my $lastlet; while (<DATA>) { print, next if /^\s*$/; # print and skip empty lines my $c = uc substr $_, 0, 1; # get first character print ":$c:\n" if (!$lastlet \|\| $lastlet ne $c); $lastlet = $c; print; } __DATA__ AA for apple A for apple BB for ball B for ball C for ....` [download]	[reply] [d/l]
Re: Re: Find by texuser74 (Monk) on Nov 26, 2003 at 09:30 UTC
Your Solution 1 is fabulous, i am adopting it many thanks raj	[reply]
Re: Re: Re: Find by Roger (Parson) on Nov 26, 2003 at 09:46 UTC
No problem. ;-) By the way, solution 1 can be simplified further - `#!/usr/local/bin/perl -w use strict; my %letter; while (<DATA>) { print, next if /^\s*$/; my $c = substr $_, 0, 1; print ":$c:\n" if ! $letter{$c}++; print; } __DATA__ AA ... A..` [download] Hey, didn't you like my second and third solutions? ;-)	[reply] [d/l]
Re: Re: Re: Re: Find by texuser74 (Monk) on Nov 27, 2003 at 02:53 UTC
Re: Re: Find by duff (Parson) on Nov 26, 2003 at 15:19 UTC
I think this should be a good exercise and learning experience for you. Then you proceed to give him answers! Fish giver! But you're not alone, there is much shameful homework-doing in response to this node. :-) PerlJam	[reply]
Re: Find by Abigail-II (Bishop) on Nov 26, 2003 at 09:28 UTC
A one liner: `perl -i -pe '!/\w/ or $f {$&} ++ or $_ = ":$&:\n\n$_"' your_file` [download] Or: `perl -i -aF'(?<=.)' -pe '$F [0] =~ /\w/ or $f {$F [0]} ++ or $_ = ":$F +[0]:\n\n$_"' your_file` [download] Abigail	[reply] [d/l] [select]
Re: Find by l3nz (Friar) on Nov 26, 2003 at 09:00 UTC
This is quite easy: you want single letters between ':'s to be sorted as if they were without the ':'. As we're at it, we do a case-insensitive sort. That's the beauty of perl hashes! :-) `use strict; my %aq = map ( (lc ((/^:(.):/) ? $1 : $_) => $_) , <DATA>); foreach my $k ( sort keys %aq ) { print $aq{$k}; } __DATA__ D for Donkey B for Box B for Bubble C for Cat A for Alpha :A: :B: :C: :D:` [download] This stuff prints: `:A: A for Alpha :B: B for Box B for Bubble :C: C for Cat :D: D for Donkey` [download] Hope this is what you were looking for.	[reply] [d/l] [select]
Re: Find by Anonymous Monk on Nov 26, 2003 at 09:04 UTC
If your file is not too big to fit in memory, then `perl -ne 'push@{$ndx{uc substr($_,0,1)}},$_;END{for(sort keys%ndx){pri +nt":$_:\n",@{$ndx{$_}}}}' sorted.txt` [download]	[reply] [d/l]