in reply to Re^2: compare files by words
in thread compare files by words

Its almost 2 same files. They differs in diacritic only. And i just need to know how many words have different diacritic. I dont need to know details.

In this case your approach above seems fine. Did you try it? Did it fail somehow? One thing you "have" to do is to make it strict-safe. Then, for words comparison I'd write:

no warnings 'uninitialized'; ($words1[$_] eq $words2[$_] ? $good : $bad)++ for 0..(@words1>@words2 ? $#words1 : $#words2);

(I suppose you want to count a word as bad if it has not a correspondent one at all. Otherwise you should change > into <. In the latter case no wouldn't be necessary.)

Update: you also probably don't want to split on / /, but on ' ' which is more likely to do what you mean, and in fact is also the default.

Replies are listed 'Best First'.
Re^4: compare files by words
by ysth (Canon) on May 31, 2007 at 10:02 UTC
    How about
    use List::Util "min"; ... my $words = min(@words1, @words2); $total += $words; $bad += grep $words1[$_] ne $words2[$_], 0 .. ($words - 1); ... print "good:", $total - $bad;
    possibly switching max for min.
      use List::Util "min";

      Yep, I like and use that. But for two values only a simple ternary is more appropriate IMHO. Of course keeping a $total is also fine, interesting and I had thought of it myself. But after all the $bad vs $good one is more symmetric and thus I prefer it. That's just me of course.

        I've seen people get that "simple ternary" backwards :) and min reads a lot easier. In a one-liner, I'd use the ternary.

        You could do bad & good and no total and still have the grep do the counting of one or the other. Still not quite symmetric, though.