rinkish85 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am looking for an regular expression to match the file name. I got an array which contains file names like:

log4perl.appender1996:06:58:12.General_Information.ErrorLog log4perl_LogicModules.appender1996:07:56:12CSS.General_Information_hel +p.ErrorLog
log4perl.Edition1998:06:56:12.General_Information.ErrorLog log4perl_Eventsource.Edition1998:06:56:12CSS.General_Information-colle +ctor_1.ErrorLog
log4perl.Advanced1999:06:56:12.General_Information.ErrorLog log4perl.Advanced1999:06:56:12CSS.General_Information-collector-2.Erro +rLog

So if i take an example for below files

log4perl.Advanced1999:06:56:12.General_Information.ErrorLog log4perl.Advanced1999:06:56:12CSS.General_Information-collector-2.Erro +rLog

I need an regular expression which matches the format of my files and removes "CSS" after string "Advanced1999:06:56:12CSS" in second file and keep both the files in array.

File name format : <(alphanumericString,_)>.<(alphanumericUniqueIdentifier,.)>.<(alphanumericString2,_,-)>.ErrorLog

<(alphanumericUniqueIdentifier,.)> is appended with constant "CSS" sometimes with alphanumericUniqueIdentifier

So after removing CSS in need below files in array

log4perl.Advanced1999:06:56:12.General_Information.ErrorLog log4perl.Advanced1999:06:56:12.General_Information-collector-2.ErrorLo +g
Thanks.

Replies are listed 'Best First'.
Re: Perl file name parsing - Regular expression
by Athanasius (Archbishop) on May 30, 2017 at 03:47 UTC

    Hello rinkish85,

    I think the following does what you want:

    use strict; use warnings; use feature 'say'; my @files = ( 'log4perl.Advanced1999:06:56:12.General_Information.ErrorLog', 'log4perl.Advanced1999:06:56:12CSS.General_Information-collector-2 +.ErrorLog', 'log4perl.Advanced1999:06:56:12CSS.General_Information-collector-2 +.INVALID', ); for (@files) { if (/ ^ [\w_-]+ \. [\w:]+ \. [\w_-]+ \. ErrorLog $ /x) { s/ ^ ([\w_-]+) \. ([\w:]+) CSS \. /$1.$2./x; say; } }

    Output:

    13:43 >perl 1783_SoPW.pl log4perl.Advanced1999:06:56:12.General_Information.ErrorLog log4perl.Advanced1999:06:56:12.General_Information-collector-2.ErrorLo +g 13:43 >

    See perlrecharclass.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thank you so much.
      for (@files) { if (/ ^ [\w_-]+ \. [\w:]+ \. [\w_-]+ \. ErrorLog $ /x) { s/ ^ ([\w_-]+) \. ([\w:]+) CSS \. /$1.$2./x; say; } }
      Can i extend  s/ ^ ([\w_-]+) \. ([\w:]+) CSS \. /$1.$2./x; subsitution of CSS to [A-Z]. ?
        Yes you can, but you need to add a quantifier. For example:
        s/ ^ ([\w_-]+) \. ([\w:]+) [A-Z]{3} \. /$1.$2./x;
        or possibly:
        s/ ^ ([\w_-]+) \. ([\w:]+) [A-Z]+ \. /$1.$2./x;
        But you might have to watch out that [A-Z]+ will not match some other uppercase letter(s) before in some other file names.
Re: Perl file name parsing - Regular expression
by shmem (Chancellor) on May 30, 2017 at 05:40 UTC

    Are there any strings in your logfile array containing CSS which shouldn't be altered? If not, you could just go with

    for (@filenames) { s/CSS//; } # or written with a statement modifier: # s/CSS// for @filenames;

    You could run the above code over your array and check for false positives, and extend the regular expression if so.

    Since your strings are tokens concatenated with a dot, of which the last is always ErrorLog, you could anchor your regular expression at the end with $ and remove CSS from the token preceeding the last two:

    for (@filenames) { s{ # substitute CSS # the string CSS ( # and a capture consisting in \. # a literal dot, [^\.]+ # any number of non-dots, \. # another literal dot, ErrorLog # and the constant ErrorLog ) # (end of capture) $ # anchored at the end of the string } {$1}x; # against the capture above }

    Or written more concisely with a terse explanation:

    for (@filenames) { s/CSS(\.[^\.]+\.ErrorLog)$/$1/; # remove CSS from end of antepenul +timate token }

    Perhaps more effective and faster, if you have only a few strings with CSS: look for CSS, if found, split, remove and join:

    for (@filenames) { if (/CSS/) { my @t = split /\./; $t[-3] =~ s/CSS$//; $_ = join '.', @t; } }
    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
      Thank you
Re: Perl file name parsing - Regular expression
by Anonymous Monk on May 30, 2017 at 05:53 UTC