in reply to partial matching of lines in perl

I prepared the following code before I realized that AnomalousMonk had already made almost the same suggestion as an advanced point. I have chosen to post it because it does not produce your expected output. Perhaps we misunderstand your requirement.
use strict; use warnings; my $file1 = \<<"END1"; he is man don't you what goes on END1 my $file2 = \<<"END2"; he is what are try to do END2 open my $h2, '<', $file2 or die "cannot open file2"; my @a2 = <$h2>; close $h2; chomp @a2; my $match = join '|', @a2; $match = qr/$match/; open my $h1, '<', $file1 or die "cannot open file1"; my @a1 = <$h1>; close $h1; print grep {$_ =~ $match} @a1;

OUTPUT:

he is man
Bill

Replies are listed 'Best First'.
Re^2: partial matching of lines in perl
by AnomalousMonk (Archbishop) on Jun 13, 2020 at 06:54 UTC
    ... the following code ... does not produce your expected output. Perhaps we misunderstand your requirement.

    I'm also confused about Sidd@786's expected output. I can see that 'he is man' from file1 should be output because it has 'he is' from file2 as an exact substring. But Sidd@786 also seems to be saying in the OP that 'what goes on' should also be output, and I don't see how that's possible given the (somewhat vaguely presented) data and my (similarly vague) understanding of the requirement. Perhaps Sidd@786 can clarify things for us.

    I have code solutions for both index-based and dynamic regex approaches, but I'm a bit reluctant to post because the OP has too strong a smell of homework about it. Perhaps I'll post them tomorrow.


    Give a man a fish:  <%-{-{-{-<

Re^2: partial matching of lines in perl
by Sidd@786 (Initiate) on Jun 15, 2020 at 08:24 UTC
    thanks for helping...Also please help me in finding reverse of the same problem or i am willing to find partially mismatched lines. output should be 1. don't you 2. what goes on

      Here's a variation based on index that seems to satisfy your requirement insofar as I understand it as discussed here, here and here.

      Note that this solution is O(n1 * n2) (the product of the number of lines in each file) because it depends on a nested loop, whereas the regex-based solution presented by BillKSmith here is O(n). Unfortunately, the regex-based solution imposes a tighter limit on the size of the substrings file that can be supported: at least several hundred, but surely no more than several thousand substring lines. The index-based solution, while potentially much slower, can support a few, perhaps several, million lines of substrings. (Caveat: These are all estimates.) The number of lines to be searched for substrings is unlimited with both approaches if the lines are processed line-by-line in a while-loop. The code below identifies both lines that match some substring and lines that do not match any substring, so comment out whichever branch of the if-else conditional you do not need. (There's also a bit of ornamental code that highlights the substring that was found.)


      Give a man a fish:  <%-{-{-{-<

      ... finding reverse of the same problem ... find partially mismatched lines.

      There is again a lack of clarity. I would define the "reverse of the same problem" as "find all lines in file1 that do not match any string in file2 as a substring." But "find partially mismatched lines" can be taken IMHO to mean "find all lines in file1 in which some part does not match any string in file2." All lines in file1 have some part that does not match anything in file2, but I doubt this is what you really mean.

      If I take the former of the two interpretations above as your intended requirement ("find all lines in file1 that do not match any string in file2 as a substring"), then the code provided by BillKSmith here can easily be adapted by changing the statement
          print grep { $_ =~ $match } @a1;
      to
          print grep { $_ !~ $match } @a1;
      (!~ vice =~). This change produces the output you seem to be specifying here.

      Again, please see How do I post a question effectively?, How (Not) To Ask A Question and I know what I mean. Why don't you? for help with asking questions more clearly: Please help us to help you. (And please do try to take a look at Short, Self-Contained, Correct Example and How to ask better questions using Test::More and sample data.)


      Give a man a fish:  <%-{-{-{-<

        referring to the same problem i want output in a file ,how can i get it?
      I cannot give you the "reverse of the same problem" because I still do not understand the original problem. I stated that the code I posted did not pass your single test case. I posted it to demonstrate my understanding of the problem. I expected you to post a clarification.

      You now mention "Partially mismatched" lines. I cannot think of any interpretation of this phrase which is consistent with your new test case. In addition to all the suggestion from AnomolousMonk, I also tried: "Select a line from file1 if it does (not) contain any word which appears in file2" (Where "word" is defined as all the text between regex word boundaries.)

      Please post unambiguous requirements and several test cases. It is important that they all be exactly correct.

      Bill
        ... unambiguous requirements and several test cases. It is important that they all be exactly correct.

        Good luck with that.


        Give a man a fish:  <%-{-{-{-<