Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

First one I have to open a log file and remove JUST the very last segment (which are usernames). So I need 1dfgfg and asdasdad.
06/19/2007 10:50 AM <DIR> 1dfgfg 07/20/2007 02:35 PM <DIR> asdasdad
I tried
open(LOG, "home.txt") or die "Error: $!"; my @lines = <LOG>; close(LOG); foreach my $line (@lines) { $line =~ m/\s+([a-zA-Z0-9])$/; print "$1<br>"; }
and it printed out a million blank lines?

A little tougher and since I had problems with the first one, ..

\\STPFS03asd MRAD\abeissel \\stpfs03\E$\CISROOT\Home\123username +\ 163.59 132 750 \\STPFS02 serv\USERNAME \\Stpfs02\E$\CISROOT\Home\anotherusernam +e\ 18.37 10 29
In the one above all I need is the username. none of the slashes or anything. Really this is a DIR dump and I need to see if the username appears in both files, or just one. Can someone give me a hand?

Replies are listed 'Best First'.
Re: Two regex problems (easy)
by grinder (Bishop) on Aug 13, 2007 at 16:51 UTC

    I may be mistaken, but the first output you are processing looks suspiciously like the output of Window's (DOS's) dir command.

    If you're capturing the output of dir merely to read it with Perl, it may be simpler to open the directory and read it yourself. If this is the case, the functions openddir, readdir and closedir will be of interest.

    If you are looking at a large directory tree (that is, dir /s), you will probably find that a module such as File::Find::Rule is more useful.

    The idea is to make the processing a little more self-contained. No need to have a batch file that does some stuff and then calls Perl to do the rest, when Perl could no doubt do the entire job just as easily.

    • another intruder with the mooring in the heart of the Perl

Re: Two regex problems (easy)
by thezip (Vicar) on Aug 13, 2007 at 17:17 UTC
    I would contend that you shouldn't even use a regex to solve this problem, since the problem pertains to decoding the data stored within a fixed-width columnar format.

    For this, the appropriate tool is unpack:

    #!/perl/bin/perl use strict; use warnings; use Data::Dumper; my $str = '06/19/2007 10:50 AM <DIR> 1dfgfg'; my $spec = 'A12A12A15A10'; # adjust this spec to meet your columnar ne +eds my @arr = unpack($spec, $str); print Dumper(\@arr); print "\n"; print "username is ", $arr[3], "\n"; print "\n"; __OUTPUT__ $VAR1 = [ '06/19/2007', '10:50 AM', '<DIR>', '1dfgfg' ]; username is 1dfgfg

    Where do you want *them* to go today?
Re: Two regex problems (easy)
by FunkyMonk (Bishop) on Aug 13, 2007 at 16:33 UTC
    Your first regexp was very close. You want to match more than one alphanumeric.
    $line =~ m/([a-zA-Z0-9]+)$/;

    Two other points:

    • You don't need to check for spaces before the username
    • Always check to make sure the match succeeded

    For your second problem, you're looking for the text between two backslashes that is followed by a space:

    m{\\([a-zA-Z0-9]+)\\\s}

Re: Two regex problems (easy)
by moritz (Cardinal) on Aug 13, 2007 at 16:34 UTC
    $line =~ m/\s+([a-zA-Z0-9])$/;

    The char class needs to be quantified:

    $line =~ m/\s+([a-zA-Z0-9]+)$/;

    or

    $line =~ m/\s(\w+)$/;

    For the second problem (or better for both) use split.

    (Update: added more newlines)

Re: Two regex problems (easy)
by jhourcle (Prior) on Aug 13, 2007 at 16:34 UTC
    $line =~ m/\s+([a-zA-Z0-9])$/; print "$1<br>";

    You didn't test if it actually matched. In this case, it won't, unless the directory name is only a single character. You might try:

    my ($file) = ( $line =~ m/\b([a-zA-Z0-9]+)$/); print "$file<br>" if defined($file);

    I'm going to assume that once you look into regex quantifiers, you'll have a much easier time with the second one.

Re: Two regex problems (easy)
by jettero (Monsignor) on Aug 13, 2007 at 16:31 UTC
    I bet you'd have better luck with the first part like so:

    open LOG, "home.txt" or die $!; while(<LOG>) { print "$1<br>" if m/\b([\w\d]+)$/; } close LOG;

    for the second, I might try something like this

    print "$1<br>" if m/\S+\s+[^\/]+\/(\S+)/;

    I'm not sure I understand the problem to be solved in the second part though...

    -Paul