Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a code below. The __DATA__ contains the two files which is in Standard Intermediate Format and separation of the file is based on control characters at the start. I have to list all the tags which is in the braces and if the same tag exists twice it should not be displayed twice. The code which i have written displays only some of the tags. Please provide me a solutions. How to display all the tags but not repeated. If the tag is present twice within the same file I have to print "The key DATE is multiple" For example, DATE is present twice in the first file. How to print it multiple.
#!/usr/bin/perl my %hTmp; while($_=<DATA>){ if($_ =~ m/\{/){ next if $_ =~ m/^\d+/; #remove empty lines print $_ unless ($hTmp{$_}++); } } __DATA__ ^C^D^V^V^A os01 0002 010101 R S 0012310002 00003466^B{IT} R {SOURCETAG} 0012310002 {ACCESSION} 000000 {PUBLICATION} THE ORLANDO SENTINEL {EDITION} METRO {DATE} 010101 {DATE} 010102 {TDATE} Monday, January 1, 2001 {SECTION} SPECIAL SECTION {PAGE} E2 {ZONE} FLORIDA {KEYWORDS} VOLUNTEER SUPPORT {SECTION} SPECIAL SECTION1 {SEND} YES ^C^D^V^V^A os01 0003 010101 R S 0012310003 00001558^B{IT} R {SOURCETAG} 0012310003 {ACCESSION} 000000 {PUBLICATION} THE ORLANDO SENTINEL {HI}hi {EDITION} METRO {DATE} 010101 {TDATE} Monday, January 1, 2001 {SEND} YES

Replies are listed 'Best First'.
Re: remove duplicate tags
by jbt (Chaplain) on Aug 10, 2009 at 10:52 UTC
    One way to do it:

    #!/usr/bin/perl use strict; my %hTmp; while (<DATA>) { if (/^{(\w*)}/) { $hTmp{$1}++; } } for my $key (keys %hTmp) { if ($hTmp{$key} > 1) { print "The key $key is multiple\n"; } else { print "$key\n"; } } __DATA__ ^C^D^V^V^A os01 0002 010101 R S 0012310002 00003466^B{IT} R {SOURCETAG} 0012310002 {ACCESSION} 000000 {PUBLICATION} THE ORLANDO SENTINEL {EDITION} METRO {DATE} 010101 {DATE} 010102 {TDATE} Monday, January 1, 2001 {SECTION} SPECIAL SECTION {PAGE} E2 {ZONE} FLORIDA {KEYWORDS} VOLUNTEER SUPPORT {SECTION} SPECIAL SECTION1 {SEND} YES ^C^D^V^V^A os01 0003 010101 R S 0012310003 00001558^B{IT} R {SOURCETAG} 0012310003 {ACCESSION} 000000 {PUBLICATION} THE ORLANDO SENTINEL {HI}hi {EDITION} METRO {DATE} 010101 {TDATE} Monday, January 1, 2001 {SEND} YES
      Thanks for the reply In the __DATA__ there are two files. ^C^D^V^V^A is the start of file. In each file how to check if the tag is duplicate.
      For example: in first file: {DATE} and {SECTION} is twice. So i have to print only {DATE} and {SECTION} as multiple. Please tell me how can I find out?
Re: remove duplicate tags
by bichonfrise74 (Vicar) on Aug 11, 2009 at 16:58 UTC
    A possible way...
    #!/usr/bin/perl use strict; my @files = do { local $/ = '^C^D^V^V^A'; <DATA> }; for (@files) { my %tags = (); my (@temp_keys) = $_ =~ /\{(\w+)\}/g; map { $tags{$_}++ } @temp_keys; for ( keys %tags ) { if ( $tags{$_} > 1 ) { print "The key $_ is multiple.\n"; } else { print "$_\n"; } } } __DATA__ ^C^D^V^V^A os01 0002 010101 R S 0012310002 00003466^B{IT} R {SOURCETAG} 0012310002 {ACCESSION} 000000 {PUBLICATION} THE ORLANDO SENTINEL {EDITION} METRO {DATE} 010101 {DATE} 010102 {TDATE} Monday, January 1, 2001 {SECTION} SPECIAL SECTION {PAGE} E2 {ZONE} FLORIDA {KEYWORDS} VOLUNTEER SUPPORT {SECTION} SPECIAL SECTION1 {SEND} YES ^C^D^V^V^A os01 0003 010101 R S 0012310003 00001558^B{IT} R {SOURCETAG} 0012310003 {ACCESSION} 000000 {PUBLICATION} THE ORLANDO SENTINEL {HI}hi {EDITION} METRO {DATE} 010101 {TDATE} Monday, January 1, 2001 {SEND} YES