Understanding Regex

jack_64 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks im trying to complete my homework but i got stuck in between...This is my 2 nd week of learning perl and im not very good wiht regex...Im trying to extract some text from a C file and im writing to a .doc file..

The C file format is

Function: stdio.h
Input: None
Output : Performs Standard input output operations

<Some Code here>

Function: math.h
Input: Mathematical operators
Ouput: Performs mathematical operations
<Some Code here>
[download]

The output in the .doc file should be like this

Function: stdio.h
Output:
Performs Standard input output operations
Input:
None

Function: math.h
Output:
Performs mathematical operations
Input:
Mathematical operators
[download]

Im have written a snippet to open MS Word and read the .c file and extract the function name and the snippet is below:-

use warnings;
use Win32::OLE;
use Win32::OLE qw(in with);
use Win32::OLE::Variant;
use Win32::OLE::Const 'Microsoft Excel';
use Win32::OLE::Const 'Microsoft Word'; 
use Cwd;
use File::Find;

//MS Word Opening
my $word = CreateObject Win32::OLE 'Word.Application' or die $!;
$word->{'Visible'} = 1;
my $document = $word->Documents->Add;
my $selection = $word->Selection;

my @scriptfiles;
my $text;
my @array;
our $file;

@scriptfiles=glob('*.c');
foreach $file (glob('*.c'))
{ 
    open(my $fh, $file) or die("Unable to open '$file': $!");  
    while (my $line = <$fh>)
    {
        if ($line =~ /Function:\s*(.+)/ )
        {
            $selection -> TypeText($line); 
        }
    }
    
    }
[download]

The issue is how to extract the Input and output from the .c file. Im extracting Function name first then i have to extract Output and then input..im currently reading the .c file ($line = <$fh>) and searching for the function name how to do the same for Output and Input..pls teach me thanks...

Comment on Understanding Regex Select or Download Code

Replies are listed 'Best First'.
Re: Understanding Regex by aaron_baugher (Curate) on Jun 10, 2012 at 12:15 UTC
First of all, you're processing a text file, so leave the MS Word stuff out of it. That's unnecessary and complicating things, as far as I can tell. I can see two ways to do this: Process the file line-by-line, watching for lines starting with `Function:`. When you hit one, print it, then read in the next line (`Input:`) and save it, then read in the next line (`Output:`), stick a newline after the first colon, print it, then do the same with the `Input:` line you saved and print it, then print a new line. Start watching for a `Function:` line again. The other way: read the whole thing into a single scalar variable, then write a regex to match the chunks of text that you're looking for, and loop through those chunks, printing out what you find in the format you want. Something like this (assumes a couple typos in your example): `while( $str =~ /(Function: .+?)\nInput: (.+?)\nOutput: (.+?)\n/gs ){ print <<END; $1 Output: $2 Input: $3 END }` [download] In many languages you'd have no choice but to use the first method, which requires more programming logic. Perl's regexes are so powerful that you can replace a lot of logic with one big pattern match. Aaron B. Available for small or large Perl jobs; see my home node.	[reply] [d/l] [select]
Re: Understanding Regex by pRaNaV (Novice) on Jun 10, 2012 at 12:03 UTC
Hi, Let's say you want to extract 'output' info, then you can write code like below `if($line =~ /^Ouput/) { chomp $line; #To delete any newline character at the end of line t +o appear in outputInfo my $outputInfo = $line; $outputInfo =~ s/^Ouput: //; print "\n$outputInfo"; }` [download]	[reply] [d/l]
Re: Understanding Regex by Anonymous Monk on Jun 10, 2012 at 10:33 UTC
A homework assignment trying to teach regex and Win32::OLE at the same time? Unlikely story	[reply]