Hello, I'm working on creating a code syntax conversion script that is filtering from an uncommon OOP language to C++.

Right now the filter is working okay - but I have a couple of issues that I'm hoping that someone here can help me with:
1) I am not able to successfully ignore comments
2) I am not able to successfully ignore multiline macros

Here's my general algorithm:
1) slurp in the source code file - into a single string
2) convert syntax
3) write out converted file

Originally - I was pulling the file into an array - and then converting the file line-by-line. This worked pretty well - but had issues with coding styles, where one user would write something like this:
class myclass {

and another might write
class myclass {

So - to get through that I decided to slurp the entire file into a string. This allows me to search for the language legal patterns without making any assumptions about newlines - which are pretty much allowed anywhere.

BUT! With my line-by-line style I could simply skip (using next) any lines that started with //, were between /* .. */, or contained a \ (presumed to be a multiline macro). Now that everything is one long string I'm having trouble figuring out how to do this.

Some specific examples:
Example 1:
A class in my language looks like this:
class foo; blah blah; endclass

Which I convert to something like this:
class foo { blah blah; }
No problem there.
s/\bclass(\s+)(\w+);/class $1 {/g; s/endclass/};/

A macro in my language looks like this:
`define mymacro (blah blah) \ blah \ blah blah \ blah

I need to convert it to:
#define mymacro (blah blah) \ blah \ blah blah \ blah

Problem: sometimes the macro contains code that triggers other filters.
Example:
`define myclassmacro (blah) \ class myclass``blah ... \ blah \ endclass

So I guess my simple questions are:
1) How do I write a regular expression that can ignore a line based on another regular expression?

2) I want to define a regexp for a multline macro as: starts with `define and ends with the first non-escaped newline. I tried:
my $multiline_preprocessor_macro = qr/^(.*?)(?!\\)\n/sm;

Thanks!

"chon"


In reply to Help Creating a Code Filter by chon

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.