roadtest has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am trying to read the username(first column), batchID(second column) and filename(third column). Filename may contain space,number,letters. I define the split pattern as two space before and after numbers as following. It works. The downside is I have to run split twice in order to read batchID. Is there more efficient way?

thanks,

use strict; use warnings; while (<DATA>) { chomp; print "Use character position for name - $NAME1\n"; my $batchID = (split/[ ]+/)[1]; my ( $Name,$File) = (split/[ ]+[ ]+[0-9]+[ ]+[ ]+/)[0,1]; print "$Name has $File and BatchID is $batchID\n\n"; } #username BatchID Filename size + Date Status Owner __DATA__ user1 1718 test2.txt.542 46 + 11/07/08-15:45 CREMFZ roadtest user1 1715 MIGRATE.DLL.698 40960 + 11/07/08-15:46 CREMFY roadtest user1 1712 New Microsoft Word Docu 35840 + 11/07/08-15:46 CREMFY roadtest user2 1707 amanda test 1 .doc.411 24064 + 11/07/08-15:47 CREMFY roadtest user2 1706 amanda test 1 .doc.544 24064 + 11/07/08-15:47 CREMFY roadtest user2 1705 amanda test 1 .doc.55 24064 + 11/07/08-15:47 CREMFY roadtest user3 1702 amanda test 2 .doc.322 24064 + 11/07/08-15:47 CREMFY roadtest user3 1701 amanda test 2 .doc.610 24064 + 11/07/08-15:47 CREMFY roadtest user3 1699 amanda test 2.doc.829 24064 + 11/07/08-15:47 CREMFY roadtest

Replies are listed 'Best First'.
Re: read the filename column
by AnomalousMonk (Archbishop) on Jul 09, 2011 at 17:35 UTC

    Assuming that the 'username', 'BatchID', 'size', 'Date', 'Status' and 'Owner' fields will never have embedded whitespace, then this is perhaps not more 'efficient', but I think more maintainable:

    >perl -wMstrict -le "my $rec = 'user2 1707 amanda test 1 .doc.412 4064 ' . '11/07/08-15:47 CREMFY roadtest' ; print qq{'$rec'}; ;; my ($username, $batchid, $remainder) = split /\s+/, $rec, 3; my $filename = reverse((split /\s+/, reverse($remainder), 5)[-1]); ;; print qq{'$username' '$batchid' '$filename'}; " 'user2 1707 amanda test 1 .doc.412 4064 11/07/08-15:47 CREMFY + roadtest' 'user2' '1707' 'amanda test 1 .doc.412'

    Update: Something like this approach would work well if the data fields are variable-width; if they are fixed-width as the example data of the OP suggest, the  unpack approach of Re: read the filename column would be far better.

      nice trick!:-)
Re: read the filename column
by Anonymous Monk on Jul 09, 2011 at 17:01 UTC
    Yes,
    my( $username , $BatchID , $Filename , $size , $Date , $Status , $Owne +r ) = unpack ' A17 A15 A27 A16 A26 A16 A10 +';

      ++ for spotting the fields are fixed-width. Use of unpack is certainly more efficient than split. But I don't see how you arrived at a field-width of 10 for the final 'Owner' field; isn't this field better unpacked with 'A*' to make it width-independent?

        Perhaps you didn't notice, but all the entries are off by one
      Thanks, nice to know unpack can be used this way:-) cheers,
Re: read the filename column
by GrandFather (Saint) on Jul 09, 2011 at 23:55 UTC

    If you have more extensive processing than shown it may be worth considering using DBD::AnyData to wrap your file as a database.

    use strict; use warnings; use DBI; my $dbh = DBI->connect ('dbi:AnyData(RaiseError=>1):'); $dbh->func ( 'Docs', 'Fixed', [<DATA>], { col_names => 'name,batch,file,size,date,status,owner', pattern => 'A16 A14 A26 A15 A25 A15 A9' }, 'ad_import' ); my $sql = 'SELECT name, batch, file FROM Docs order by name, batch'; my $sth = $dbh->prepare ($sql); $sth->execute (); while (my $row = $sth->fetchrow_hashref ()) { print "$row->{name} has '$row->{file}' and BatchID is $row->{batch +}\n"; } __DATA__

    Given data as shown in the OP prints:

    user1 has 'New Microsoft Word Docu' and BatchID is 1712 user1 has 'MIGRATE.DLL.698' and BatchID is 1715 user1 has 'test2.txt.542' and BatchID is 1718 user2 has 'amanda test 1 .doc.55' and BatchID is 1705 user2 has 'amanda test 1 .doc.544' and BatchID is 1706 user2 has 'amanda test 1 .doc.411' and BatchID is 1707 user3 has 'amanda test 2.doc.829' and BatchID is 1699 user3 has 'amanda test 2 .doc.610' and BatchID is 1701 user3 has 'amanda test 2 .doc.322' and BatchID is 1702

    To use a file instead of data provided in __DATA__ replace [<DATA>] with the file name and 'ad_import' with 'ad_catalog'.

    True laziness is hard work