in reply to Perl noob is lost

Here is something that might get you started.

use strict; use warnings; use Data::Dumper; foreach my $file (glob("*.ATT")) { open(my $fh, '<', $file) or die "$file: $!"; my %params = map { chomp($_); my ($key, $value) = split(/=/, $_); $value="" unless(defined($value)); ($key, $value) } <$fh>; close($fh); next if( $params{'page_id'} =~ m/(^\s*$|\?)/ or $params{'site_code'} =~ m/(^\s*$|\?)/ or $params{'subject_id'} =~ m/(^\s*$|\?)/ ); $params{'page_description'} .= "some text - $params{'page_id'}"; print Dumper(\%params); }

Given a file test.ATT with the content as in your example, this produces:

$VAR1 = { 'content_provider' => '(removed)', 'page_id' => '#### (usually 3-5 digits)', 'OBJECT' => '(removed)', 'SAVE' => '', 'study_number' => '(removed)', 'CONTENT' => 'test.pdf', 'page_description' => 'some text - #### (usually 3-5 digits) +', 'subject_id' => '#########', 'site_code' => '###', 'product' => '(removed)' };

To understand how this works you may find some of the perl manual pages helpful (perldata, perlop, perlfunc, perlsyn, perlre) and, for a gentler introduction you might like to start from Where and how to start learning Perl.

Good luck learning perl!

update: corrected the update of page_description.

Replies are listed 'Best First'.
Re^2: Perl noob is lost
by sensesfail (Initiate) on Apr 05, 2009 at 15:48 UTC
    Thank you so much for helping me out!. I was thinking it would require the split function and i had the "=~ m/(^\s*$|\?)/" part down.

    Is there a way to retain the original format of the document? These documents will be read by another program.

    OBJECT=(removed) page_id=#### (usually 3-5 digits) page_description=GRAPHICS product=(removed) study_number=(removed) content_provider=(removed) site_code=### subject_id=######### CONTENT=test.pdf SAVE

      There are many alternatives. Here is one way that keeps the overall layout of the file:

      use strict; use warnings; use Data::Dumper; foreach my $file (glob("*.ATT")) { open(my $fh, '<', $file) or die "$file: $!"; my $content = do { local $/; <$fh> }; close($fh); my ( $page_id ) = $content =~ m/^page_id=(.*)/m; my ( $site_code ) = $content =~ m/^site_code=(.*)/m; my ( $subject_id ) = $content =~ m/^subject_id=(.*)/m; my ( $page_description ) = $content =~ m/^page_description=(.*)/m; next if( $page_id =~ m/(^\s*$|\?)/ or $site_code =~ m/(^\s*$|\?)/ or $subject_id =~ m/(^\s*$|\?)/ ); $page_description .= "some text - $page_id"; $content =~ s/^page_description=.*/page_description=$page_descript +ion/m; print "$content"; }

      In this case, the entire file content is read into a single string (slurped) then pattern matching and substitution are used to extract data and modify the description without chaning the overall layout. The result is:

      OBJECT=(removed) page_id=#### (usually 3-5 digits) page_description=some text - #### (usually 3-5 digits) product=(removed) study_number=(removed) content_provider=(removed) site_code=### subject_id=######### CONTENT=test.pdf SAVE
        Thanks for your help! Is this an appropriate way to make the script recognize that all pages 27.## makes the page_description = "statement" and 22.## makes page_description="statement". Would it autosave and move to the next text file? I've been adding the page descriptions after reading the page number manually for the past few days and it's driving me insane!
        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; foreach my $file (glob("*.ATT")) { open(my $fh, '<', $file) or die "$file: $!"; my $content = do { local $/; <$fh> }; close($fh); my ( $page_id ) = $content =~ m/^page_id=(.*)/m; my ( $site_code ) = $content =~ m/^site_code=(.*)/m; my ( $subject_id ) = $content =~ m/^subject_id=(.*)/m; my ( $page_description ) = $content =~ m/^page_description=(.*)/m; #if value is non-alphanumeric it skips this file next if( ($page_id =~ m/[^a-zA-Z0-9]/) || ($site_code =~ m/[^a-zA-Z0-9]/) || ($subject_id =~ m/[^a-zA-Z0-9]/) ) {die}; #all versions of these numbers results in those descriptions. if ($page_id = "27.*") {$page_description = "GRAPHICS"}; if ($page_id = "21.*") {$page_description = "GENERAL COMMENTS"}; if ($page_id = "24.*") {$page_description = "DATA IS CLEAN"}; if ($page_id = "22.*") ($page_description = "PICTURES"}; then; $content =~ s/^page_description=.*/page_description=$page_descript +ion/m; print "$content"; #saves the updated text file? close(OUT);