texuser74 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a plain ascii text file sorted alphabetically.

e.g.

AA for apple

A for apple

BB for ball

B for ball

C for ....

i need to insert :A: before the first line that starts with A, similarly :B:, :C: ...till :Z:. Only at the first occurance of A, B and so on.

i.e. my out put should be like

:A:

AA for apple

A for apple

:B:

BB for ball

B for ball

:C:

C for ....

:E:...

:Z:...

Z for Zebra

Please help me

janitored by ybiC: Retitle from "Find", balanced<tt>'s around example text for legibility

Replies are listed 'Best First'.
Re: Find
by Roger (Parson) on Nov 26, 2003 at 08:54 UTC
    I think this should be a good exercise and learning experience for you. There are many solutions ...

    Solution 1
    #!/usr/local/bin/perl -w use strict; my %letter; while (<DATA>) { print("$_"), next if /^\s*$/; # print and skip empty lines my $c = substr $_, 0, 1; # get first character if (! exists $letter{$c}) { # have we seen it before? $letter{$c}++; print ":$c:\n"; } print; } __DATA__ AA for apple A for apple BB for ball B for ball C for ....
    And the output is -
    :A: AA for apple A for apple :B: BB for ball B for ball :C: C for ....
    Solution 2
    A more perl-ish solution, with regular expressions
    #!/usr/local/bin/perl -w use strict; my %letter; foreach (<DATA>) { s/^(.)/$letter{$1}++ ? $1 : ":$1:\n$1"/e; print; } __DATA__ AA for apple A for apple ...
    Solution 3
    Another variant, read the file into a scalar and do search and replace in one go:
    #!/usr/local/bin/perl -w use strict; local $/; my $data = <DATA>; my %letter; $data =~ s/^(.)/$letter{$1}++ ? $1 : ":$1:\n$1"/emg; print "$data"; __DATA__ AA for apple A for apple ...
      We really don't need to keep a hash of letters, since the file is already sorted. We just need to keep track of what the last letter seen was:
      #!/usr/local/bin/perl -w use strict; my $lastlet; while (<DATA>) { print, next if /^\s*$/; # print and skip empty lines my $c = uc substr $_, 0, 1; # get first character print ":$c:\n" if (!$lastlet || $lastlet ne $c); $lastlet = $c; print; } __DATA__ AA for apple A for apple BB for ball B for ball C for ....

      Your Solution 1 is fabulous, i am adopting it

      many thanks

      raj

        No problem. ;-)

        By the way, solution 1 can be simplified further -
        #!/usr/local/bin/perl -w use strict; my %letter; while (<DATA>) { print, next if /^\s*$/; my $c = substr $_, 0, 1; print ":$c:\n" if ! $letter{$c}++; print; } __DATA__ AA ... A..
        Hey, didn't you like my second and third solutions? ;-)

      I think this should be a good exercise and learning experience for you.
      Then you proceed to give him answers! Fish giver! But you're not alone, there is much shameful homework-doing in response to this node. :-)

Re: Find
by Abigail-II (Bishop) on Nov 26, 2003 at 09:28 UTC
    A one liner:
    perl -i -pe '!/\w/ or $f {$&} ++ or $_ = ":$&:\n\n$_"' your_file
    Or:
    perl -i -aF'(?<=.)' -pe '$F [0] =~ /\w/ or $f {$F [0]} ++ or $_ = ":$F +[0]:\n\n$_"' your_file

    Abigail

Re: Find
by l3nz (Friar) on Nov 26, 2003 at 09:00 UTC
    This is quite easy: you want single letters between ':'s to be sorted as if they were without the ':'. As we're at it, we do a case-insensitive sort. That's the beauty of perl hashes! :-)
    use strict; my %aq = map ( (lc ((/^:(.):/) ? $1 : $_) => $_) , <DATA>); foreach my $k ( sort keys %aq ) { print $aq{$k}; } __DATA__ D for Donkey B for Box B for Bubble C for Cat A for Alpha :A: :B: :C: :D:
    This stuff prints:
    :A: A for Alpha :B: B for Box B for Bubble :C: C for Cat :D: D for Donkey
    Hope this is what you were looking for.
Re: Find
by Anonymous Monk on Nov 26, 2003 at 09:04 UTC

    If your file is not too big to fit in memory, then

    perl -ne 'push@{$ndx{uc substr($_,0,1)}},$_;END{for(sort keys%ndx){pri +nt":$_:\n",@{$ndx{$_}}}}' sorted.txt