jack_64 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks im trying to complete my homework but i got stuck in between...This is my 2 nd week of learning perl and im not very good wiht regex...Im trying to extract some text from a C file and im writing to a .doc file..

The C file format is Function: stdio.h Input: None Output : Performs Standard input output operations <Some Code here> Function: math.h Input: Mathematical operators Ouput: Performs mathematical operations <Some Code here>

The output in the .doc file should be like this

Function: stdio.h Output: Performs Standard input output operations Input: None Function: math.h Output: Performs mathematical operations Input: Mathematical operators

Im have written a snippet to open MS Word and read the .c file and extract the function name and the snippet is below:-

use warnings; use Win32::OLE; use Win32::OLE qw(in with); use Win32::OLE::Variant; use Win32::OLE::Const 'Microsoft Excel'; use Win32::OLE::Const 'Microsoft Word'; use Cwd; use File::Find; //MS Word Opening my $word = CreateObject Win32::OLE 'Word.Application' or die $!; $word->{'Visible'} = 1; my $document = $word->Documents->Add; my $selection = $word->Selection; my @scriptfiles; my $text; my @array; our $file; @scriptfiles=glob('*.c'); foreach $file (glob('*.c')) { open(my $fh, $file) or die("Unable to open '$file': $!"); while (my $line = <$fh>) { if ($line =~ /Function:\s*(.+)/ ) { $selection -> TypeText($line); } } }

The issue is how to extract the Input and output from the .c file. Im extracting Function name first then i have to extract Output and then input..im currently reading the .c file ($line = <$fh>) and searching for the function name how to do the same for Output and Input..pls teach me thanks...

Replies are listed 'Best First'.
Re: Understanding Regex
by aaron_baugher (Curate) on Jun 10, 2012 at 12:15 UTC

    First of all, you're processing a text file, so leave the MS Word stuff out of it. That's unnecessary and complicating things, as far as I can tell. I can see two ways to do this:

    Process the file line-by-line, watching for lines starting with Function:. When you hit one, print it, then read in the next line (Input:) and save it, then read in the next line (Output:), stick a newline after the first colon, print it, then do the same with the Input: line you saved and print it, then print a new line. Start watching for a Function: line again.

    The other way: read the whole thing into a single scalar variable, then write a regex to match the chunks of text that you're looking for, and loop through those chunks, printing out what you find in the format you want. Something like this (assumes a couple typos in your example):

    while( $str =~ /(Function: .+?)\nInput: (.+?)\nOutput: (.+?)\n/gs ){ print <<END; $1 Output: $2 Input: $3 END }

    In many languages you'd have no choice but to use the first method, which requires more programming logic. Perl's regexes are so powerful that you can replace a lot of logic with one big pattern match.

    Aaron B.
    Available for small or large Perl jobs; see my home node.

Re: Understanding Regex
by pRaNaV (Novice) on Jun 10, 2012 at 12:03 UTC

    Hi, Let's say you want to extract 'output' info, then you can write code like below

    if($line =~ /^Ouput/) { chomp $line; #To delete any newline character at the end of line t +o appear in outputInfo my $outputInfo = $line; $outputInfo =~ s/^Ouput: //; print "\n$outputInfo"; }
Re: Understanding Regex
by Anonymous Monk on Jun 10, 2012 at 10:33 UTC

    A homework assignment trying to teach regex and Win32::OLE at the same time? Unlikely story