in reply to Re^4: regex subst not DWIM
in thread regex subst not DWIM

Ok, try this... It has the fix to the params regex, the fix to the not_method regex, I also changed around the header bit so it does a regex replace using a positive lookahead, cuz your split just didn't work quite right...
#!c:/perl/bin/perl.exe use strict; use warnings; use File::Find; our $copyright = <<'HERE'; #region "Usage Rights" // Redacted to protect the guilty // All rights reserved. This document was developed under U.S. // Government Contract No. pi, and therefore the U.S. // Government is granted a copyright license to this document for U.S +. // Government purposes. #endregion HERE our $access_modifier = "public|private|protected|internal|protected in +ternal"; our $method_modifier = "virtual|sealed|override|abstract|extern"; our $comment = <<HERE; \\s*///\\s*<summary> \\s*///\\s*\\w+ \\s*///\\s*</summary> \\s*///\\s*<param name = ".+">\\w+</param>\\s* HERE our $not_method = "as|basebreakcase|catch|checked|class|const|continue| default|delegate|do|else|enum|event|explicit|false|finally|fixed| for|foreach|goto|if|implicit|in|interface|is|lock|namespace| null|object|operator|out|params|readonly|ref|return|sizeof| stackalloc|struct|switch|this|throw|true|try|typeof|unchecked| unsafe|using|volatile|while"; =comment put stuff to glob the files here. =cut # Gets input and output filenames while getting Wanted to work my ($filin, $filout) = (@ARGV); Wanted($filin, $filout); exit; sub Wanted { my ($filin, $filout) = (@_); my %methods; open(FILIN, "<", "$filin") or die $^E; my $autogen = 0; my $line; while ($line = <FILIN>) { chomp $line; if ($line =~ /^\W*(?:$not_method)/) { next; } if ($autogen < 48) { # This block looks for auto-generated fil +es and skips to the next with no alterations at all. if ($line =~ /<auto-generated>/) { close FILIN; return 0; } ++$autogen; } if ( $line =~ m`(new)? # new is an optional + element \s+ ($access_modifier) # public, private, + protected etc. \s* ($method_modifier)? # static, override +, extern, etc. Also optional \s* \w+ # return type, not + optional \s+ \w+ # method name \s* \( .* \) # parameter list i +n parentheses `xo ) # 'x' allows commen +ts and internal whitespace, o says compile the pattern once only { my $method = $line; my $summary = $method; $summary =~ s/^\s+//; my $params = $summary; $params =~ s`.*\(([^\)]*)\)`$1`; my @params = split ',', $params; my $header = <<HERE; /// <summary> /// $summary /// </summary> HERE for (@params) { my ($type, $name) = split ' ', $_; $header .= <<HERE; /// <param name="$name">$type</param> HERE } $methods{$method} = $header; } } close FILIN; open FILIN, "<", "$filin" or die $!; my $file; {local $/; $file = <FILIN>; close FILIN; } METHOD: for (sort keys %methods) { my $method = $_; if ($file =~ m`$comment$method`s) { next METHOD; } my $header = $methods{$method}; $file =~ s/(?=\Q$method\E)/$header/g; } $file =~ s/#region "Usage Rights".+Government purposes.\n#endregion/ +/gs; $file = $copyright . $file; open FILOUT, ">", "$filout"; print FILOUT $file; close FILOUT; }

                - Ant
                - Some of my best work - (1 2 3)

Replies are listed 'Best First'.
Re^6: regex subst not DWIM
by girarde (Hermit) on Dec 20, 2007 at 23:16 UTC
    Works for the small dataset, and I have moved to a live example, in which some of the methods already have headers and we want to leave them alone.

    Input snippet:

    /// <summary> /// Checks to see if the data on the clipboard supports /// the given data type. /// </summary> /// <param name="dataType">The data type to check for.</param> /// <returns><c>true</c> if the data type is present.</returns +> public bool ClipboardDataHasType(string dataType) { return dataType == m_dataType; } /// <summary> /// Gets the clipboard data. /// </summary> /// <param name="dataType">Type of the data.</param> /// <returns> /// The data on the clipboard matching the datatype /// </returns> public object GetClipboardData(string dataType)
    gets turned into output snippet:
    /// <summary> /// Checks to see if the data on the clipboard supports /// the given data type. /// </summary> /// <param name="dataType">The data type to check for.</param> /// <returns><c>true</c> if the data type is present.</returns +> /// <summary> /// public bool ClipboardDataHasType(string dataType) /// </summary> /// <param name="dataType">string</param> public bool ClipboardDataHasType(string dataType) { return dataType == m_dataType; } /// <summary> /// Gets the clipboard data. /// </summary> /// <param name="dataType">Type of the data.</param> /// <returns> /// The data on the clipboard matching the datatype /// </returns> /// <summary> /// public object GetClipboardData(string dataType) /// </summary> /// <param name="dataType">string</param> public object GetClipboardData(string dataType)
    Basically, the if ($file =~ /$comment$method/) { line is never matching, where $method is the declaration line and $comment is:
    \\s*///\\s*<summary>\\s*///.*\\n\\s*///\\s*</summary>\\n(\\s*///.*\\n) +*
    This is the case whether I specify s, m, or both (in either sequence) for the matching.
      Rather than doing the complex regex at the end, I would check for it in the read loop, myself...

      Maybe have a has_header flag that you increment if you see a //\s*

      line and unset it once you match the method, and skip to the next line.

                      - Ant
                      - Some of my best work - (1 2 3)