Re^3: partial matching of lines in perl
by AnomalousMonk (Archbishop) on Jun 15, 2020 at 10:46 UTC
|
Here's a variation based on index that seems to satisfy your requirement insofar as I understand it as discussed here, here and here.
Note that this solution is O(n1 * n2) (the product of the number of lines in each file) because it depends on a nested loop, whereas the regex-based solution presented by BillKSmith here is O(n). Unfortunately, the regex-based solution imposes a tighter limit on the size of the substrings file that can be supported: at least several hundred, but surely no more than several thousand substring lines. The index-based solution, while potentially much slower, can support a few, perhaps several, million lines of substrings. (Caveat: These are all estimates.) The number of lines to be searched for substrings is unlimited with both approaches if the lines are processed line-by-line in a while-loop. The code below identifies both lines that match some substring and lines that do not match any substring, so comment out whichever branch of the if-else conditional you do not need. (There's also a bit of ornamental code that highlights the substring that was found.)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re^3: partial matching of lines in perl
by BillKSmith (Monsignor) on Jun 16, 2020 at 15:50 UTC
|
I cannot give you the "reverse of the same problem" because I still do not understand the original problem. I stated that the code I posted did not pass your single test case. I posted it to demonstrate my understanding of the problem. I expected you to post a clarification.
You now mention "Partially mismatched" lines. I cannot think of any interpretation of this phrase which is consistent with your new test case. In addition to all the suggestion from AnomolousMonk, I also tried: "Select a line from file1 if it does (not) contain any word which appears in file2" (Where "word" is defined as all the text between regex word boundaries.)
Please post unambiguous requirements and several test cases. It is important that they all be exactly correct.
| [reply] |
|
|
... unambiguous requirements and several test cases. It is important that they all be exactly correct.
Good luck with that.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] |
Re^3: partial matching of lines in perl
by AnomalousMonk (Archbishop) on Jun 15, 2020 at 09:07 UTC
|
... finding reverse of the same problem ... find partially mismatched lines.
There is again a lack of clarity. I would define the "reverse of the same problem" as "find all lines in file1 that do not match any string in file2 as a substring." But "find partially mismatched lines" can be taken IMHO to mean "find all lines in file1 in which some part does not match any string in file2." All lines in file1 have some part that does not match anything in file2, but I doubt this is what you really mean.
If I take the former of the two interpretations above as your intended requirement ("find all lines in file1 that do not match any string in file2 as a substring"), then the code provided by BillKSmith here can easily be adapted by changing the statement
print grep { $_ =~ $match } @a1;
to
print grep { $_ !~ $match } @a1;
(!~ vice =~). This change produces the output you seem to be specifying here.
Again, please see How do I post a question effectively?, How (Not) To Ask A Question and I know what I mean. Why don't you? for help with asking questions more clearly: Please help us to help you. (And please do try to take a look at Short, Self-Contained, Correct Example and How to ask better questions using Test::More and sample data.)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
referring to the same problem i want output in a file ,how can i get it?
| [reply] |
|
|
| [reply] |
|
|
Also we have to operate with only file locations in program,we are not supposed to write actual texts of file in program(referring to the same problem) and we have to also print the output in other file.
| [reply] |
|
|
|
|
|
|
|
|
|