Sara has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's minimum standard of quality and will not be displayed.

Replies are listed 'Best First'.
Re: parse
by DamnDirtyApe (Curate) on Jul 24, 2002 at 20:03 UTC

    This is quite an elementary problem; I'd like to see what you're previous attempts look like before I post mine. If you're stuck for a starting point, try implementing this pseudocode is Perl.

    While there are lines of input Skip this line unless it looks like data (not headers) Split the line into the two relevant sections Extract the filename from the file path.

    _______________
    D a m n D i r t y A p e
    Home Node | Email
Re: parse
by silent11 (Vicar) on Jul 24, 2002 at 20:09 UTC
Re: parse
by mp (Deacon) on Jul 24, 2002 at 20:02 UTC
    #! /usr/local/bin/perl -w #----------------------------------------------- use strict; use warnings; while(<>) { next unless m/ ^(\d+) # Match the number \s+ # Match spaces after number \S+? # Minimally match path ([^\/]+)$ # Match everything after the last slash /x; my ($linesOfCode,$filename)=($1,$2); # Access captured text print "$linesOfCode, $filename\n"; }
Re: parse
by neilwatson (Priest) on Jul 24, 2002 at 20:03 UTC
    #!/bin/perl -w use strict; use warnings; my ($filename, $lineOfCode); open (FH, "filename.txt")|| die "could not open filename.txt"; while (<FH>){ if (m/^(\d+)\s/){ $lineOfCode = $1; } if (m/\/(\w+\.?\w*)$/){ $filename = $1; } }

    If there are many lines that you wish to save then you'll have to use an array instead for $filename and $lineOfCode.

    Neil Watson
    watson-wilson.ca

      Actually, there's a finer point of regular expressions at work against this perl code. It is as follows:

      if (m/\/(\w+\.?\w*)$/){ $filename = $1; }

      The assumption here is that the first "\/" means a literal forward slash, then the (\w+\.?\w*)$ means "some word characters, followed by (possibly) a literal period, followed by 0 or more word characters, then the end of line.

      However, the "\w" metacharacter is intended to match only alphanumerics and the underscore character, which leaves out a whole bevy of other characters which may be present in file names, e.g. spaces, hyphens, parenthesis, etc.

      Granted, "sensible" UNIX filenames often don't contain those characters because they are also often used as shell metacharacters, but this sample dataset looks suspiciously like DOS/Win32 filenames (X: being a giveway) and I can't count the number of times I've had to deal with filenames like "Sales Figures - Dec 19 - Dec 26.doc" and the like!

      One possible alternative regex which still gives all characters after the last forward slash to the end of the string is:

      /\/([^\/]+)$/
      Meaning "A literal forward slash, followed by anything that's NOT a forward slash, to the end of string

      Or, as was pointed out in another reply to this post, File::Basename is an alternative if you wish to extract the entire path expression and figure out the $filename from there

      Hope that helps,

      Paul

      When there is no wind, row.

Re: parse (K.I.S.S. peoples!)
by gmpassos (Priest) on Jul 25, 2002 at 05:40 UTC
    $data = q` NCLOC Filename -------- ------------------------- 3198 X:/bs/al/src/eilass.pl `; ## Avoid binary: $data =~ s/\r\n?/\n/gs ; ## Split the top and content: my ( $top , $content ) = split(/\n--[-\s]+\n/s , $data) ; $top = "\n$top\n" ; $content = "\n$content\n" ; ## The col names: my ($col_name_1,$col_name_2) = ( $top =~ /\n(\w+)\s+(\w+)/gs ); ## The content by cols: my ($col_1,$col_2) = ( $content =~ /\n([^\s]+)\s+([^\s]+)/gs ) ; print "$col_1,$col_2\n" ; ## Now get your variables: my ($filename) = ( $col_2 =~ /([^\/]+)\/*$/gi ); my $lineOfCode = $col_1 ; print "$filename\n$lineOfCode\n" ; ## Or put inside a while to get all the lines ## of the content: while( $content =~ /\n([^\s]+)\s+([^\s]+)/gs ) { print "$1,$2\n" ; }


    See the regular expression doc: http://www.perldoc.com/perl5.6/pod/perlre.html

    "The creativity is the expression of the liberty".