Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks! I'm a little new to this and I can't figure out a way to parse off of a back slash? My current output is:

6/21/2001 11:55:04 Object Name: \Device\HarddiskDmVolumes\PhysicalDmVolumes\BlockVolume2\CISER\ArchiveData^ Primary User Name: jch6^ but I want it to look like this:
6/21/2001 11:55:04 \CISER\ArchiveData Primary User Name: jch6
Current code is a mess and I'm sure it could be handled better
#!/user/bin/perl -w use win32; #use strict; use warnings; my $infile="dump1.txt"; my $outfile="newdump.txt"; my @x=(); open(FILE, $infile) or die "log560.pl can't open $infile for reading: +$!"; open(OUTPUT,">$outfile") or die "log560.pl can't open $outfile for wri +ting: $!"; #Look only at the summary lines where $_ == 560 while (defined ($_ = <FILE>)) { next unless ($_ =~ /560/); #we only want the files with 560 $_ =~ s/`/,/g; #this is here to get the user name because ' i +s after name @x=split(/,/); if (!($x[16] =~ /Primary User Name: CISERFS1/)) { #dont want t +he details from the sytem my $date=$x[1]; #good my $time=$x[2]; #good my $disk=$x[12]; #good my $user=$x[16]; #good print OUTPUT "$date $time $disk $user \n"; #good } #end if the one WAY up there the one for ciserf1 } # end while close FILE;

Replies are listed 'Best First'.
Re: parse a log file
by arturo (Vicar) on Jul 03, 2001 at 17:49 UTC

    A basic item in the toolset you need is the escape character, which (I assume you already know) is the backslash. But it can be used to escape itself, too:

    my @backslash_delimited_parts = split /\\/, $string;

    You didn't directly ask for it, but here's something I find useful in assigning bits of an array to named variables (which it doesn't seem you *need*, although here the names help to document what's being done):

    my ($date, $time, $disk, $user) = @x[1,2,12,16];

    That there syntax is an array slice, by the way.

    HTH!

    perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; *other_name = *rose; print "$oth +er_name\n"'
Re: parse a log file
by davorg (Chancellor) on Jul 03, 2001 at 18:16 UTC

    A few more little suggestions about your code.

    • You use both -w and use warnings. They both do pretty much the same thing so only one is needed.
    • Your while condition can be better written as while (<FILE>).
    • Rather than opening specific input and output files, your script would be more flexible if you read from STDIN and write to STDOUT. You could then call your script using IO redirection.
      myscript.pl < input.dat > output.txt
    --
    <http://www.dave.org.uk>

    Perl Training in the UK <http://www.iterative-software.com>

      I would like to but the admin. justs wants to type "myscript.pl"
Re: parse a log file
by nysus (Parson) on Jul 03, 2001 at 18:08 UTC
    The only difference I can see between your current output and your desired output is that 1) the directory path is much longer in the what you are getting now and 2) there are caret characters after the path and after the user name.

    I would like to help you but it's difficult to see how you might handle problem #1 without more information from you. What part of the path are you trying to chop? Is the path you want to get rid of the same every time?

    As far as problem #2, that could be handled with a simple RE expression similar to what you have already used: s/\^//g; Note the escaped caret to turn it into a literal instead of getting it interpreted as a metacharacter by the Perl RE engine.

    One little note, I prefer to use the while(<FILE>) syntax to grab the lines from a file. It will automatically detect the end of the file so there is no need to test to see if the line defined manually. It's a nifty shortcut built into Perl.

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot Bishop";
    $nysus = $PM . $MCF;
    Click here if you love Perl Monks

Re: parse a log file
by particle (Vicar) on Jul 03, 2001 at 18:02 UTC
    first off, can you post some data, so we know what you're reading in?

    looking at your script, there are a few things that could be improved upon.
    ~you're using warnings twice, with perl -w, and use warnings. you don't need both.
    ~why is use strict commented out?
    ~this is confusing~

    #Look only at the summary lines where $_ == 560 while (defined ($_ = <FILE>)) { next unless ($_ =~ /560/); #we only want the files with 560 $_ =~ s/`/,/g; #this is here to get the user name because ' i +s after name @x=split(/,/); if (!($x[16] =~ /Primary User Name: CISERFS1/)) { #dont want t +he details from the sytem
    probably it's better to split first, then search on the filename field, otherwise you may run into a year 2560 bug ;)
    something like
    #Look only at the summary lines where $x[???] contains '560' while (<FILE>) { @x=split /,|`/; next unless ($x[???] =~ /560/); #we only want the files with 560 unless($x[16] =~ /Primary User Name: CISERFS1/) { #dont want t +he details from the sytem
    ~also, you are assigning temporary variables, but i don't see a real need, if you're only printing them.
    try print OUTPUT "$x[1] $x[2] $x[12] $x[16] \n";

    ~Particle

      ORIGINAL INPUT:
      SEC,6/21/2001,11:48:01,Security,560,Success,Object Access ,S-1-5-21-58 +3907252-1958367476-682003330-1001,CISERFS1,Object Open:^` Object +Server: Security^` Object Type: File^` Object Name: +\Device\HarddiskDmVolumes\PhysicalDmVolumes\BlockVolume2\CISER\Tank\c +ompressed\cret\003\ret72.mdse4.gz^` New Handle ID: 2760^` +Operation ID: {0 3914260}^` Process ID: 1056^` Primary +User Name: CISERFS1$^` Primary Domain: CTC_ITH^` Primar +y Logon ID: (0x0 0x3E7)^` Client User Name: IUSR_CISERFS1^` + Client Domain: CISERFS1^` Client Logon ID: (0x0 0x2E17 +41)^` Accesses READ_CONTROL ^` SYNCHRONIZE ^` + ReadData (or ListDirectory) ^` ReadEA ^` + ReadAttributes ^` ^` Privileges -^`
        okay, i won't give it all away, but here's a good start.

        while(<FILE>) { my @x = split /,|\^`/; next unless ($x[4] =~ /560/); #we only want the files with 560 print join("\n",@x), "\n"; # debugging print line - remove in prod +uction unless($x[16] =~ /Primary User Name: CISERFS1/) { #only match t +his case print OUTPUT "$x[1] $x[2] $x[12] $x[16] \n"; # or whatever } # unless } # while
        by the way, you should get a login, so we know who you are when you come back!

        ~Particle

        Update: i guess i forgot about the '\' parsing, but i'm not sure just what you want to do. you can split on /\\/, and return the fields you want, put together with join "\\", much like i did in the debug print statement.

Re: parse a log file
by Hofmator (Curate) on Jul 03, 2001 at 18:05 UTC

    I'm more or less guessing here - you did not include the format of your input file and your question is somewhat unspecific.

    First a small remark on your code. The pattern match and substitution operators work on $_ implicitly when you don't specify a scalar variable. So your code simplifies to:

    # just the relevant part next unless /560/; s/`/,/g;

    Now supposing $string holds 'Object Name: \foo\bar\CISER\interesting' you can cut out the first part with: $string =~ s/^.*\\CISER/\\CISER/;Note that you have to escape the backslash. This replaces everything from the beginning of the string up to (and including) \CISER with \CISER - effectively leaving the part \CISER\interesting.

    I hope this answers your question, otherwise post here in this thread a follow-up to clarify what you want to know.

    -- Hofmator

Re: parse a log file
by mikeB (Friar) on Jul 03, 2001 at 18:11 UTC
    You might find it more clear to use regular expressions to parse the input. Properly constructed, they can be more forgiving of input format variations.

    Your problem could be done in a single regex, but I'll break it down here for ease of understanding.

    # grab the first two non-whitespace items, separated by whitespace. my ($date, $time) = $s =~ /(\S+)\s+(\S+)/; # grab the last two portions of the disk name, # which is immediately followed by the first ^ in the file. my ($disk) = $s =~ /\\(\w+\\\w+)\^/; # grab the user name, which is the string before the last ^, # which in turn is followed by some white space and the end of the str +ing. # You can take out the \s+ if the space between the ^ and end of strin +g was an artifact of your post. my ($user) = $s =~ /(\w+)\^\s+$/;
    Note that in this example, it doesn't matter if the number of elements in the disk path changes - it will always grab the last two. In your example, a change in the path format would break both $disk and $user.

    The same thing is true of the search for the user name. As long as it is the string before a ^ at the end of the line, it will always parse correctly with the regex, even if the format of what comes before it changes.

    Regular expressions take some work to get used to, but that effort is well rewarded.