in reply to Re^4: comparing contents of two arrays and output differences
in thread comparing contents of two arrays and output differences

PitifulProgrammer:

When Richard_K mentioned "shell out to `diff`" he didn't mean for the user to use diff manually, but for your program to do the work of creating the command line and running it for the user to get the desired results. Consider this:

open my $FH, '>', "file_difference_report" or die $!; my @base_file_names = ( 'file1', 'file2', 'file3', 'file4' ); for my $file_name (@base_file_names) { if (! -e "$file_name.xml") { print "$file_name.xml: Not present ... not interesting file?\n"; next; } if (! -e "$file_name.bak") { print "$file_name: no backup, so probably not changed\n"; next; } # If we get here, we have a .bak and a .xml file, so make another +program # compare them for us: my $output = `diff $file_name.xml $file_name.bak`; print $FH "\n\n===== $file_name changes =====\n"; print $FH $output; print $FH "\n\n"; }

In the line starting "my $output", we shelled out to use the diff command to compare the files and store the result in $output. From there you can do what you want with the results, such as concatenate it to the end of a report, as done here.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^6: comparing contents of two arrays and output differences
by PitifulProgrammer (Acolyte) on Jan 05, 2015 at 12:17 UTC

    Dear roboticus

    Thanks a mil for clarifying RichardK's example and for providing a code sample. I like the approach and I think this might be the way to go ( might look nicer to the user, although I personally prefer tables ).

    Be that as it may, I have one question that cropped up, while I was trying the code. In your example the files for the array are hardcoded. Since in the application scenario(s) the amount of files will vary. So I need to read the files into an array.

    When using my previous approach with the glob function, the file names do not match, i.e. the script checks for

    file_02_0.xml.xml: file_03_0.xml.xml: file_04_0.xml.xml: file_05_0.xml.xml:

    and with the .bak files, the script checks for:

    file_02_0.xml.bak.xml: file_03_0.xml.bak.xml: file_04_0.xml.bak.xml: file_05_0.xml.bak.xml:

    I would like to turn this piece of code into a subroutine which will be implemented into another script, so I guess I cannot hardcode the file names, nor pass via cmd. Secondly, the xml files might be used for further processing so I would like to keep them separate.

    I have been wrecking my head how to get around the issue, but no matter what I used I have not been successful. Moreover, I think I cannot change the the file tests for .bak and .xml, since what would be there to check, right?. Is there any way I could keep the file test and using glob and/or File::Find::Rule to keep both file types separate while still doing the comparision as shown here?

    I know that I am missing something quite elemental, but I could not figure it out, please excuse my stupidity.

    Thanks a mil for your help, I am really learning a lot more than just going through one book after the other

    Kind regards

    C.
    #Separating xml and backup files my @xml_files = glob( '*xml' ); #say for @xml_files; my @bak_files = glob( '*bak' ); #say for @bak_files; #Show differences between file_01.xml and file_01.xml.bak, etc... open my $FH, '>', "file_difference_report" or die $!; my @base_file_names = ( @xml_files, @bak_files ); print Dumper \@base_file_names; print "\n\n\n"; for my $file_name ( @base_file_names ) { if ( ! -e "$file_name.xml" ){ print "$file_name.xml: Not present ... not interesting file?\n +"; next; } if ( ! -e "$file_name.bak" ){ print "$file_name: no backup, so probably not changed\n"; next; } # If we get here, we have a .bak and a .xml file, so make another # program to compare them for us: my $output = 'diff $file_name.xml $file_name.bak'; print $FH "\n\n===== $file_name changes =====\n"; print $FH $output; print $FH "\n\n"; }

      PitifulProgrammer:

      Yeah, I hardcoded the filenames to simplify things. For your case, I'd probably load up the array with something like:

      my @files = map { s/\.xml$//; $_ } glob('*.xml');

      The map statement simply trims the ".xml" off the end of the list of XML files. Then when checking for the XML and/or BAK files, we glue 'em on as needed.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        Dear roboticus

        That is pretty amazing, I read about map, but at that time, I could not think about an application. That is neat line of code, I'll try to memorize it for the future.

        Thanks a lot for your help, I will go back to the code later and post my result(s).

        Thank you very much for taking the trouble and for your explanations. I should have joined the forum much earlier :)

        Kind regards

        C.

        Dear roboticus,

        I tried the code and made some changes, mostly to see which files are stored where. The script worked in the test run, my next effort is to create a subroutine from that script, but before I do that I still have a question, since I really would like to understand what the code does.

        I have a question about "trimming" the xml off as you nicely put it. The checking of the files does not raise an error, at least that is what I figure, since the errors messages are not printed out.

        My question is how the script can differentiate between .xml and .bak. Would that be a built-in feature of the Text::Diff used in line 45? How exactly does the interpolation between file extension work in that particular case?

        Would be grand if you or another monk could shed some light.

        Thanks a mil in advance and kind regards

        C.
        use 5.018; use strict; use warnings; use Data::Dumper; use File::Glob; use Text::Diff; use Text::Diff::Table; #Separating xml and backup files #my @xml_files = glob( '*xml' ); #say for @xml_files; #my @bak_files = glob( '*bak' ); #say for @bak_files; #Show differences between file_01.xml and file_01.xml.bak, etc... open my $FH, '>', "file_difference_report" or die $!; my @base_file_names_xml = map { s/\.xml$//; $_ } glob('*.xml'); print Dumper \@base_file_names_xml; my @base_file_names_bak = glob('*.bak'); print Dumper \@base_file_names_bak; #cutting off file extension to use file name only, extension for #comparing .xml and .bak added by code below; #print Dumper \@base_file_names; #print "\n\n\n"; for my $file_name ( @base_file_names_xml ) { if ( ! -e "$file_name.xml" ){ print "$file_name.xml: Not present ... not interesting file?\n +"; next; } if ( ! -e "$file_name.xml.bak" ){ print "$file_name: no backup, so probably not changed\n"; next; } # If we get here, we have a .bak and a .xml file, so make another # program to compare them for us: my $output = diff "$file_name.xml", "$file_name.xml.bak"; print $FH "\n\n===== $file_name changes =====\n"; print $FH $output; print $FH "\n\n"; }