Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Regex refresher

by blackadder (Hermit)
on Mar 22, 2007 at 14:48 UTC ( [id://606043] : perlquestion . print w/replies, xml ) Need Help??

blackadder has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks

Greeting

I'm feeling a bit rusty, and I wonder if you can help please?

I have data as such
21 10216503 27f14 208 1 vxdmp (VxVM 4.0R_p3.7 (MP1): DMP Drive) 22 78016000 1c8c80 223 1 vxio (VxVM 4.0R_p3.7 (MP1) I/O driver) 24 781bf440 847 225 1 vxspec (VxVM 4.0R_p3.7 (MP1) control/st) 127 7837cb56 bb3 234 1 vxportal (VxFS 4.0_REV-MP1j portal driver +) 128 7837e000 1636fd 16 1 vxfs (VxFS 4.0_REV-MP1j,PID=116688-01)
what I want to achieve is this
21,10216503,27f14,208,1,vxdmp,(VxVM 4.0R_p3.7 (MP1): DMP Drive) 22,78016000,1c8c80,223,1,vxio,(VxVM 4.0R_p3.7 (MP1) I/O driver) 24,781bf440,847,225,1,vxspec,(VxVM 4.0R_p3.7 (MP1) control/st) 127,7837cb56,bb3,234,1,vxportal,(VxFS 4.0_REV-MP1j portal driver) 128,7837e000,1636fd,16,1,vxfs,(VxFS 4.0_REV-MP1j,PID=116688-01)
so I wrote this script
#! c:/perl/bin/perl.exe use strict; use Archive::Tar; system(cls); my $tar = Archive::Tar->new(); $tar->read("c:\\emcgrab.tar.gz", 1); @_ = $tar->list_files(); print "\n\n"; for my $modinfo (@_) { next unless ($modinfo =~ /modinfo.txt/); $tar->extract_file( $modinfo, "c:\\Emcmodinfo.txt"); open(FILE,"c:\\Emcmodinfo.txt") || die; chomp (my @stuff = <FILE>); for my $line (@stuff) { next unless ($line =~ /vx/); $line =~ s/^\s+//; print "$line\n"; $line =~ /^(\d+)\s+(\d+\w+\d+)\s+(\d+\w+)/; } }
and what I get is not what I expected!
21 10216503 27f14 208 1 vxdmp (VxVM 4.0R_p3.7 (MP1): DMP Drive) 21,10216503,27f14 ------------------------------------------------------- 22 78016000 1c8c80 223 1 vxio (VxVM 4.0R_p3.7 (MP1) I/O driver) 22,78016000,1c8c80 ------------------------------------------------------- 24 781bf440 847 225 1 vxspec (VxVM 4.0R_p3.7 (MP1) control/st) 24,781bf440,847 ------------------------------------------------------- 127 7837cb56 bb3 234 1 vxportal (VxFS 4.0_REV-MP1j portal driver +) ,, ------------------------------------------------------- 128 7837e000 1636fd 16 1 vxfs (VxFS 4.0_REV-MP1j,PID=116688-01) 128,7837e000,1636fd ------------------------------------------------------- C:\Perl>
Could some please show me where I am going wrong here,..Thanks
Blackadder

Replies are listed 'Best First'.
Re: Regex refresher
by saintly (Scribe) on Mar 22, 2007 at 15:15 UTC
    Unless your text is especially irregular (and spaces are missing between some of the elements), it seems to me that you could just say:
    print join(",",split(/\s+/,$line,7)),"\n";
    If you only wanted the first three elements:
    print join(",",+(split(/\s+/,$line,4))[0 .. 2]),"\n";
    There doesn't seem to be a need for more complex regular expressions unless you're attempting to validate the data or the delimiting spaces are missing sometimes.
      Another way to do this(since you are asking for a regex solution)

      $line =~ s/(\s+)/my $n++ < 6 ? ',': $1/eg;

      not very efficient though..
Re: Regex refresher
by GrandFather (Saint) on Mar 22, 2007 at 19:23 UTC

    I'd be inclined to use split with the optional third parameter that tells it when to stop splitting:

    use strict; use warnings; while (<DATA>) { print join ",", split /\s+/, $_, 7; } __DATA__ 21 10216503 27f14 208 1 vxdmp (VxVM 4.0R_p3.7 (MP1): DMP Drive) 22 78016000 1c8c80 223 1 vxio (VxVM 4.0R_p3.7 (MP1) I/O driver) 24 781bf440 847 225 1 vxspec (VxVM 4.0R_p3.7 (MP1) control/st) 127 7837cb56 bb3 234 1 vxportal (VxFS 4.0_REV-MP1j portal driver +) 128 7837e000 1636fd 16 1 vxfs (VxFS 4.0_REV-MP1j,PID=116688-01)

    Prints:

    21,10216503,27f14,208,1,vxdmp,(VxVM 4.0R_p3.7 (MP1): DMP Drive) 22,78016000,1c8c80,223,1,vxio,(VxVM 4.0R_p3.7 (MP1) I/O driver) 24,781bf440,847,225,1,vxspec,(VxVM 4.0R_p3.7 (MP1) control/st) 127,7837cb56,bb3,234,1,vxportal,(VxFS 4.0_REV-MP1j portal driver) 128,7837e000,1636fd,16,1,vxfs,(VxFS 4.0_REV-MP1j,PID=116688-01)

    Update: removed chomp and print \n


    DWIM is Perl's answer to Gödel
Re: Regex refresher
by davorg (Chancellor) on Mar 22, 2007 at 14:54 UTC