in reply to Re: redirect output from a command to another command
in thread redirect output from a command to another command
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: redirect output from a command to another command
by ikegami (Patriarch) on Mar 02, 2011 at 17:19 UTC | |
I would think launching diff would be relatively expensive. The disk cache should eliminate all disk wait for small files. Have you considered Algorithm::Diff? | [reply] [d/l] |
by Allasso (Monk) on Mar 02, 2011 at 20:31 UTC | |
EDIT: Though I must say, the Cpan page on diff is the best I've seen. | [reply] |
by ikegami (Patriarch) on Mar 02, 2011 at 21:02 UTC | |
You'd gain portability. | [reply] |
by Allasso (Monk) on Mar 02, 2011 at 21:31 UTC | |
by ikegami (Patriarch) on Mar 03, 2011 at 00:16 UTC | |
by Allasso (Monk) on Mar 03, 2011 at 03:05 UTC | |
UPDATE I will be posting results from more extensive testing to show what these figures are refleting soon I did a speed comparison between using Algorithm::Diff, and writing all the data to temp files and calling system diff. Repeated execution on 105 html files.using system diff: 1.9s average using Algorithm:Diff: 95s average Not too impressed with Algorithm:Diff... | [reply] |
by Anonymous Monk on Mar 03, 2011 at 03:51 UTC | |
| [reply] |
by Allasso (Monk) on Mar 03, 2011 at 19:13 UTC | |
Some further investigation revealed some interesting results. First, I will note the conditions of the tests, as were used according to the specific needs I have for which I am using the algorithm. The main idea is, I want to compare words (strings of characters separated by whitespace) in two files. I am not concerned about changes in whitespace in the comparisons, so all groups of consecutive whitespace are collapsed to single \n characters. This of course was the necessary character to use for preprocessing for using diffutils diff, and for consistency, I left it the same in using the CPAN module. For the CPAN module method, I used the example code from the CPAN Algorithm::Diff webpage to perform the actual comparison. The files were read into scalars, the substitutions were done, then the modified scalars were split into arrays at the \n's. These arrays are what is then used by the example code. For the diffutils method, the files were read into scalars, subs made, then the modified scalars were written to temp files, the names of which were used as arguments to the diff command, being executed from the script. Ultimately, I want to do a recursive comparison of file hierarchies, but for the sake of getting some clearer data from comparing the two algorithms, I first ran tests comparing the same two files numerous times, then compared the results yielded from the testing of each algorithm. This test would yield the closest comparison of strictly the algorithm itself. (with one possibly disputable exception which I will elaborate on below*). While the results from this test still revealed the diffutils method to be quite a bit faster, they were not the dramatic 45 fold difference that I observed yesterday. (more on the order of 3.3 times) However, I still needed to test what I would really be doing, which is a recursive comparison. It was when I did these test that they revealed a 55 fold increase in time using the CPAN module. I do not understand the reason for such disporportionate results. I have carefully laid out my methods and code below. ----------- Tested on iMac G5 1.8 GHz PPC, 1 GB ram, OS 10.4.11. Diff::Algorithm version 1.1902 First test was run comparing the algorithms alone, running the same two files 1000 times. This test was performed 5 times for each algorithm. This was done twice, alternating between the two. The files used were html files of approx 28kB each in length. They were not identical. Results were:
Second test was run doing a recursive comparison of two directories each parenting 105 html files. About half of the files were not identical. The total was approx. 2.6 MB for each tree. The recursion is iterated 10 times. I ran this test 10 times using diffutils method, and 2 times using the Algorithm::Diff method. After not being comfortable with my cpu running at the rail for 15 minutes, I then ran the Algorithm::Diff method iterating over the recursion once, then giving it a rest, and repeating. I repeated this 8 times. I alternated between the using the two algorithms. results were:
Summary of tests:
*I will note that someone may dispute that in the first set of tests, in the case of testing with the Algorithm::Diff method, the operation of splitting the text string which is done on every iteration in the timing loop is not purely testing the algorithm alone. While this may be true, I did it this way so it would be a 1 to 1 comparison in the context of what I was trying to accomplish. IE, I wanted to have the same framework code, and just be able to interchange the two methods. However, for the sake of fairness to the algorithm, I removed the split out of the timing loop and in performing the test 5 times the average time for 1000 iterations went to 26.92 sec. (I refrain from posting all the data on that). However, it should be noted that for the second test, it was necessary to have the split in the loop, since we are comparing different files every time. --------------- Here is the code I used in the tests: code used to run the same file 1000 times: This is the framework for the recursive comparison of 105 files, in which one of the two code snippets posted directly above were substituted for ## DIFF algorithm here. I am also posting the full script for each method in which a recursive comparison was done (in which was yielded the curiously slow output using the CPAN module), copied and pasted directly after performing the tests for each method. I am doing this so eliminate any question about the posted code not reflecting the actual test:
| [reply] [d/l] [select] |
|
Re^3: redirect output from a command to another command
by roboticus (Chancellor) on Mar 07, 2011 at 14:09 UTC | |
If you write to the file, and then use and delete it shortly thereafter, it may not even get written to the disk at all. It may simply reside in memory buffers. So don't be afraid of short-term temporary files. They can even be handy debugging tools--just comment out the delete, so you can see what the intermediate results were in an operation. ...roboticus When your only tool is a hammer, all problems look like your thumb. | [reply] |
by Allasso (Monk) on Mar 09, 2011 at 23:11 UTC | |
Another paradigm that I am wondering if it is fallacious is that calling a system command is more expensive than using a module. Is there really a (significant) difference between forking a process and executing code that is written in from a module? If there is a slight cost to forking, it doesn't seem like it would be that significant. Some one enlighten me. Anyway, looking at the results of my tests, it is hard to convince me that there is anything to be gained by using Algorithm::Diff, as far as speed goes. Portability, perhaps, as some have pointed out. | [reply] |