in reply to Re: File Intersection problem
in thread File Intersection problem

Please check my updated code and suggest modifications bcoz i am not getting the output :(

Replies are listed 'Best First'.
Re^3: File Intersection problem
by johngg (Canon) on Nov 13, 2008 at 22:41 UTC

    A few points about your updated code

    • Although you are checking for the success of your open statements your error messages are pretty useless as they don't say which file you were trying to open when the failure occurred and they don't give any indication of the o/s error (see $! or $OS_ERROR in perlvar) that might have caused the failure.
    • chomp defaults to operating on $_ so your chomp($_); can simply be written chomp; to save you some typing.
    • Having read your File1.txt you have a look at the hash you have created by iterating over the key/value pairs using each in a while loop (your print statement is actually commented out so I guess you just used this for debugging). That works for simple hashes but quickly becomes unwieldy when data structures are more complicated. The Data::Dumper module is part of the standard Perl distribution and is an invaluable tool if you want to examine your data.
    • You do a regex substitution along these lines, s{...}{}xms but you used a pattern that is one contiguous string. The whole point of the x flag is to allow you to intersperse spaces (and comments if you wish) in the pattern to make it more readable and easily understood.

    Judging by the desired output in your OP, my understanding of your requirement is this

    • Read and parse File1.txt to obtain a list of names (and, possibly, associated numbers) used to filter File2.txt
    • Read and process File2.txt line by line
    • For each line:
      • Extract the preamble preceding the first set of name-associated data and print it without a newline
      • Extract each set of name-associated data, determine the name contained in that data and only print that data if the name occurs in the list, again no newline
      • When all of the name-associated data groups have been processed print a newline
    When you start processing lines in File2.txt you rather jump the gun by substituting the first name-associated data group by nothing before you know whether you actually need to keep it and also before you do anything with the preamble. The need to identify the preamble as well as extracting each data set leads me to slightly reconsider the compiled regular expression (see Regexp Quote Like Operators in perlop) I gave in my earlier reply. I would remove the second capture around the name to become

    my $rxExtractNameData = qr {(?x) ( \s+ \w+ \s+ \([A-Z]\) \s+ \(\d*%\) \s+ \d+\s+\+\s+[+-]?\d+ ) };

    which would allow me to use it both for the preamble and the data groups

    ... while( <$file2FH> ) { # reject line unless it has name data next unless m{^(.*?)$rxExtractNameData}; print $1; # preamble captured in $1 # global match to extract to an array my @dataGroups = m{$rxExtractNameData}g; ... } ...

    Once you have the name-associated data groups you can loop over them pulling out each name, test whether it is in the hash parsed from File1.txt and, if so, print the data group.

    I hope that I have correctly understood your requirements and that these thought will be useful.

    Cheers,

    JohnGG

Re^3: File Intersection problem
by brsaravan (Scribe) on Nov 13, 2008 at 11:01 UTC
    Try this code.
    open(FD, "File2.txt") || die "Cannot open file"; my @input_array = <FD>; foreach my $line (@input_array) { chomp($line); my $cnt = 0; map {($line =~ /$_/)?++$cnt:$cnt}keys %href; print "$line $cnt\n"; }