comment on

I'm writing a code that need to satisfy the following conditions 1.Change existing date formats (mm/dd/yyyy, yyyymmdd & mm-dd-yyyy) into yyyy/mm/dd format 2.Change all numeric information from 1 - 10 to appear in word format (eg. 1 = one). Ensure that date information are excluded from this change. 3.Count the number of occurrences when the word 'and' appeared in the data (may be found in any case). 4.Capitalize the first letter of each word found after the marker AU:. 5.Add 'Written by: ' at the beginning of the byline information found in <author> tag. 6.Extract the numeric id information (found in parenthesis) from the second occurrence of PP: and display this in ID:. This ID: should be placed before the AU: tag. 7.Use the first paragraph as the headline information and display contents after HL: . Remove the first paragraph to avoid repetitive information. 8.Remove the occurrences of the word 'end' in the data (may appear in any case). Question and problems - so far the no. 1 is good the only problem is I can't convert the JULY to 06 - to the no. 2, the code doesn't work because it also changes the date format - I can't think of way to count the occurrences of 'and' and to add 'Written by' - Why can't extract the numeric in () with the regular expression and how can I put the extract number to ID:, how to add ID: in the file. - I don't know the no. 7 - the word 'end' or 'END' doesn't remove. this my code

#C:\strawberry\perl\bin\perl.exe

my $filename = "input.txt";
open my $file, '<', $filename;
@fileinput=<$file>;
close($file);

#while($file)
#{ my $line =$_;
 # $line=~s/(\d{2}\/\d{2}\/\d{4})/($3\/$1\/$2)/g;

#print "@listinput";
# change date format
foreach $line(@fileinput)
{
my $testdate=($line); #= "11/09/2009";
if($testdate =~s/(\d{2})\/(\d{2})\/(\d{4})/$3\/$1\/$2/g)
 { 
   print $testdate; #=~s/(\d{2}\/\d{2}\/\d{4})/($3\/$1\/$2)/;
 }
if($testdate =~s/(\d{4})(\d{2})(\d{2})/$1\/$2\/$3/)
  {
   print $testdate;
   }
if($testdate =~s/(\w{4})\s(\d{2})\,\s(\d{4})/$3\/$1\/$2/)
   {
    print $testdate;
   }
if($testdate =~s/(\d{2})\-(\d{2})\-(\d{4})/$3\/$1\/$2/)
  { 
    print $testdate;
  }
}
#Wrong,It also change the date
#foreach $line(@fileinput)
#{
#my $numToword=($line);
#$num = "9";
#$word = "nine";
#if($numToword =~s/$num/$word/g)
#  {
#    print $numToword;
#  }
#$num = "19";
#$word = "nineteen";
#if($numToword =~s/$num/$word/g)
#  {
#    print $numToword;
#  }
#$num = "10";
#$word = "ten";
#if($numToword =~s/$num/$word/g)
#  {
#    print $numToword;
#  }
#$num = "5";
#$word = "five";
#if($numToword =~s/$num/$word/g)
#  {
#    print $numToword;
#  }
#}
# Capilize the first character after AU: 
foreach $line(@fileinput)
{ 
my $Cword=($line);
if($Cword=s/(^AU:\s[a-z])/(^AU:\s[A-Z])/)
   { 
    print $Cword;
   }
}
#extract ID in ()
foreach $line(@fileinput)
{my $extractid=($line);
if($extractid =~m/\( (\d+)\)/g)
   {
    print $extractid;
   }
}
#remove END, end word in file
foreach $line(@fileinput)
{ my $removeend=($line);
  if($removeend=~s/(^END) | (^end) | (END$) |(end$)//g)
     {
      print $removeend;
     }
}
#$line=~s/(\d{2}\/\d{2}\/\d{4})/($3\/$1\/$2)/g;
#print $line;
#}
[download]

The content of 'input.txt'

DD: 11/09/2009
AU: jas dimaano
PP: Employee ID list
PP: (489459) Jas = DS16 -> with SPi since 2005/04/04 AND with ECO ever
+ since
PP: Sam = FT35 -> resigned last 09-03-2008
PP: Evan = AT89 -> transferred last 20070605 to Journals
PP: there's more...
===
DD: july 11, 2009
AU: Jr s. Tolentino, -editor
PG: page 9
PP: Earn points now!
PP: Yes! You heard it right! (635436)
PP: Finish your exercise before the deadline and you'll receive additi
+onal points.
PP: Even if you're early by 10, 5 or even 1 minute, we'll give away th
+e corresponding points!
PP: So hurry now and ensure that you'll be able to finish ASAP! END
PP:
===
DD: 10-09-2009
AU: mr. Henderson de la cruz (h.delacruz@yahoo.com)
VL: volume IV.
PG: page 19
PP: This is only a test document.
PP: Did you know that this is just a test doc? See if you're end outpu
+t (if you'll be able to produce 1) will be the same as the required o
+ne.
PP: Don't build you're house on a sandy land. And not too near the sho
+re as well.
PP: Perhaps you'll earn more than 10-50k of money! Why not?! This does
+n't make any sense. Right? So I better end this now.
PP: This is the end of this test document.
===
[download]

the changes will be put in output.txt, thank you for the help

In reply to Regular expressions extract, change num to word and capitalize each word after AU: tag and add 'written by' by allison

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.