make a search pattern and remove duplicate from the file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi ,

I have a file which have input as like below

pet="cat"|hate="rat"|like="dog"
hate="rat"|like="dog"|pet="cat"
hate="rat"|like="horse"|pet="cat"
pet="cow"|hate="rat"|like="dog"
hate="rat"|like="dog"|pet="cow"
[download]

And output im looking is

pet="cat"|hate="rat"|like="dog"
pet="cow"|hate="rat"|like="dog"
[download]

Any first instance where I find unique value for pet and hate I wanted to print.

Thanks

2018-05-16 Athanasius added code and paragraph tags

Comment on make a search pattern and remove duplicate from the file
Select or Download Code

Replies are listed 'Best First'.
Re: make a search pattern and remove duplicate from the file by jimpudar (Pilgrim) on May 11, 2018 at 04:24 UTC
You should use HTML tags to format your post. It's very difficult to tell what you are asking the way you posted it! I'm assuming you meant this: `Hi , I have a file which have input as like below pet="cat"\|hate="rat"\|like="dog" hate="rat"\|like="dog"\|pet="cat" hate="rat"\|like="horse"\|pet="cat" pet="cow"\|hate="rat"\|like="dog" hate="rat"\|like="dog"\|pet="cow" And output im looking is pet="cat"\|hate="rat"\|like="dog" pet="cow"\|hate="rat"\|like="dog" Any first instance where I find unique value for pet and hate I wanted + to print. Thanks` [download] You should have tried your hand at writing a solution and posted the code, but since I'm feeling generous here is a solution which you can pipe your input file to: `#! /usr/bin/env perl use strict; use warnings; my %pet_hate; while (<>) { my ($pet) = /pet="(\w+)"/ or next; my ($hate) = /hate="(\w+)"/ or next; my ($like) = /like="(\w+)"/ or next; $pet_hate{"$pet:$hate"} //= { pet => $pet, hate => $hate, like => $like, }; } foreach (values %pet_hate) { printf(qq{pet="%s"\|hate="%s"\|like="%s"\n}, $_->{pet}, $_->{hate}, $_->{like}); }` [download] Best, Jim	[reply] [d/l] [select]
Re: make a search pattern and remove duplicate from the file by NetWallah (Canon) on May 11, 2018 at 06:36 UTC
Here is a one-liner that generates the structure you need to analyze this info: `$ perl -MData::Dumper -lanF\\\\| -e 'for (@F){@x=split /=/;$h{$x[0]} +=$x[1]};$pet_hate{$h{pet}}{HATE}{$h{hate}}++;$pet_hate{$h{pet}}{LIKE} +{$h{like}}++; }{print Dumper \%pet_hate' your-file.txt $VAR1 = { '"cow"' => { 'HATE' => { '"rat"' => 3 }, 'LIKE' => { '"dog"' => 3 } }, '"cat"' => { 'LIKE' => { '"horse"' => 1, '"dog"' => 2 }, 'HATE' => { '"rat"' => 3 } } };` [download] I did not understand your specifications well enough to figure out how you came to get the output you want. but you can loop through this structure (or something close) to generate that. Memory fault -- brain fried	[reply] [d/l]
Re^2: make a search pattern and remove duplicate from the file by jimpudar (Pilgrim) on May 11, 2018 at 16:15 UTC
I think what he is asking for is to print out only the first instance of any line which has a unique value of the `pet, hate` tuple. Any other lines should be ignored. Since the structure you are building does not keep track of which line came first, looping over it will never bring you to the answer he is looking for. If you really wanted to do this with a one liner, (and I'm definitely not saying you should), I would do this, which will print exactly the result he asked for: `perl -wlF'\\|' -e 'for (@F) { /(\w+)="(\w+)"/; $r{$1} = $2 } $x="$r{pet}:$r{hate}"; $s{$x} ? next : ++$s{$x} && print' <input pet="cat"\|hate="rat"\|like="dog" pet="cow"\|hate="rat"\|like="dog"` [download] Best, Jim	[reply] [d/l] [select]
Re^3: make a search pattern and remove duplicate from the file by NetWallah (Canon) on May 11, 2018 at 20:15 UTC
Your comment propagates the OP's usage of the word "unique" - and that causes confusion because your implementation reports the "first" usage of the tuple, which may not necessarily be a "unique" occurrence. In order to find "unique", you will need to do that after all records have been ingested, and post-process, as my code does. Anyway - this only illustrates non-specific specifications - and these nits are not worth picking. Cheers. Memory fault -- brain fried	[reply]
Re^4: make a search pattern and remove duplicate from the file by jimpudar (Pilgrim) on May 12, 2018 at 15:30 UTC