Re^2: compare files by words

Replies are listed 'Best First'.
Re^3: compare files by words by blazar (Canon) on May 31, 2007 at 08:25 UTC
Its almost 2 same files. They differs in diacritic only. And i just need to know how many words have different diacritic. I dont need to know details. In this case your approach above seems fine. Did you try it? Did it fail somehow? One thing you "have" to do is to make it strict-safe. Then, for words comparison I'd write: `no warnings 'uninitialized'; ($words1[$_] eq $words2[$_] ? $good : $bad)++ for 0..(@words1>@words2 ? $#words1 : $#words2);` [download] (I suppose you want to count a word as bad if it has not a correspondent one at all. Otherwise you should change `>` into `<`. In the latter case no wouldn't be necessary.) Update: you also probably don't want to split on `/ /`, but on `' '` which is more likely to do what you mean, and in fact is also the default.	[reply] [d/l] [select]
Re^4: compare files by words by ysth (Canon) on May 31, 2007 at 10:02 UTC
How about `use List::Util "min"; ... my $words = min(@words1, @words2); $total += $words; $bad += grep $words1[$_] ne $words2[$_], 0 .. ($words - 1); ... print "good:", $total - $bad;` [download] possibly switching max for min.	[reply] [d/l]
Re^5: compare files by words by blazar (Canon) on May 31, 2007 at 10:30 UTC
`use List::Util "min";` [download] Yep, I like and use that. But for two values only a simple ternary is more appropriate IMHO. Of course keeping a `$total` is also fine, interesting and I had thought of it myself. But after all the `$bad` vs `$good` one is more symmetric and thus I prefer it. That's just me of course.	[reply] [d/l] [select]
Re^6: compare files by words by ysth (Canon) on May 31, 2007 at 19:18 UTC