in reply to Re^2: Perl noob is lost
in thread Perl noob is lost

There are many alternatives. Here is one way that keeps the overall layout of the file:

use strict; use warnings; use Data::Dumper; foreach my $file (glob("*.ATT")) { open(my $fh, '<', $file) or die "$file: $!"; my $content = do { local $/; <$fh> }; close($fh); my ( $page_id ) = $content =~ m/^page_id=(.*)/m; my ( $site_code ) = $content =~ m/^site_code=(.*)/m; my ( $subject_id ) = $content =~ m/^subject_id=(.*)/m; my ( $page_description ) = $content =~ m/^page_description=(.*)/m; next if( $page_id =~ m/(^\s*$|\?)/ or $site_code =~ m/(^\s*$|\?)/ or $subject_id =~ m/(^\s*$|\?)/ ); $page_description .= "some text - $page_id"; $content =~ s/^page_description=.*/page_description=$page_descript +ion/m; print "$content"; }

In this case, the entire file content is read into a single string (slurped) then pattern matching and substitution are used to extract data and modify the description without chaning the overall layout. The result is:

OBJECT=(removed) page_id=#### (usually 3-5 digits) page_description=some text - #### (usually 3-5 digits) product=(removed) study_number=(removed) content_provider=(removed) site_code=### subject_id=######### CONTENT=test.pdf SAVE

Replies are listed 'Best First'.
Re^4: Perl noob is lost
by sensesfail (Initiate) on Apr 06, 2009 at 01:24 UTC
    Thanks for your help! Is this an appropriate way to make the script recognize that all pages 27.## makes the page_description = "statement" and 22.## makes page_description="statement". Would it autosave and move to the next text file? I've been adding the page descriptions after reading the page number manually for the past few days and it's driving me insane!
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; foreach my $file (glob("*.ATT")) { open(my $fh, '<', $file) or die "$file: $!"; my $content = do { local $/; <$fh> }; close($fh); my ( $page_id ) = $content =~ m/^page_id=(.*)/m; my ( $site_code ) = $content =~ m/^site_code=(.*)/m; my ( $subject_id ) = $content =~ m/^subject_id=(.*)/m; my ( $page_description ) = $content =~ m/^page_description=(.*)/m; #if value is non-alphanumeric it skips this file next if( ($page_id =~ m/[^a-zA-Z0-9]/) || ($site_code =~ m/[^a-zA-Z0-9]/) || ($subject_id =~ m/[^a-zA-Z0-9]/) ) {die}; #all versions of these numbers results in those descriptions. if ($page_id = "27.*") {$page_description = "GRAPHICS"}; if ($page_id = "21.*") {$page_description = "GENERAL COMMENTS"}; if ($page_id = "24.*") {$page_description = "DATA IS CLEAN"}; if ($page_id = "22.*") ($page_description = "PICTURES"}; then; $content =~ s/^page_description=.*/page_description=$page_descript +ion/m; print "$content"; #saves the updated text file? close(OUT);

      Your script doesn't compile as it is. It produces the following warnings/errors:

      Found = in conditional, should be == at ./test.pl line 26. Found = in conditional, should be == at ./test.pl line 27. Found = in conditional, should be == at ./test.pl line 28. syntax error at ./test.pl line 24, near ") {" Global symbol "$page_id" requires explicit package name at ./test.pl l +ine 26. Global symbol "$page_description" requires explicit package name at ./ +test.pl line 26. Global symbol "$page_id" requires explicit package name at ./test.pl l +ine 27. Global symbol "$page_description" requires explicit package name at ./ +test.pl line 27. Global symbol "$page_id" requires explicit package name at ./test.pl l +ine 28. Global symbol "$page_description" requires explicit package name at ./ +test.pl line 28. Global symbol "$page_id" requires explicit package name at ./test.pl l +ine 29. syntax error at ./test.pl line 29, near ") (" Global symbol "$page_description" requires explicit package name at ./ +test.pl line 29. Global symbol "$content" requires explicit package name at ./test.pl l +ine 33. Global symbol "$page_description" requires explicit package name at ./ +test.pl line 33. Global symbol "$content" requires explicit package name at ./test.pl l +ine 35. Bareword "then" not allowed while "strict subs" in use at ./test.pl li +ne 31. Execution of ./test.pl aborted due to compilation errors.

      You are using "strict" and "warnings" which is good, but you have to look at the warnings and errors they produce.

      Starting with Found = in conditional, should be == at ./test.pl line 26.: You are using the assignment operator in the condition of an if statement which is usually (but not always) an error. I am guessing that you want a regular expression pattern match here - something like: $page_id =~ m/^27/, and similarly in the following statements. You can read more about regular expressions in perlre and more about the binding operator ( =~ ) in perlop. Both are worth reading completely once or twice, just to get familiar with what's in them. You can then return to them for details when appropriate.

      Also, you have next if(...) {die}. You must choose between if() as a statement modifier and if() as a compund statement - it can't be both at the same time. It may be best to remove the {die} in this case.

      Otherwise you have some relatively minor typos which you can find and fix by running your script and following up on the errors. If you want to check your script without running it you can use perl -c script.pl, assuming your script is named script.pl.

      The script will not automatically save anything. If you want to update the .ATT files with the revised content, you will have to open them for output and write the new content to them. You should probably rename the originals (e.g. by appending ".bak" to the name) before writing the new content. Otherwise, if something is wrong with your script you might lose all your data. Alternatively, you can write the new content to a file with ".new" appended to the original file name and check these new files carefully before replacing the originals.

      Until you are sure all is correct you might do well to do something like the following:

      my $newfile = "$file.new"; open(my $fh, ">", $newfile) or die "$newfile: $!"; print $fh $content; close($fh);