read the filename column

roadtest has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am trying to read the username(first column), batchID(second column) and filename(third column). Filename may contain space,number,letters. I define the split pattern as two space before and after numbers as following. It works. The downside is I have to run split twice in order to read batchID. Is there more efficient way?

thanks,

use strict;
use warnings;

while (<DATA>) {
    chomp;
    print "Use character position for name - $NAME1\n";
    my $batchID = (split/[ ]+/)[1];
    my ( $Name,$File) = (split/[ ]+[ ]+[0-9]+[ ]+[ ]+/)[0,1];
    print "$Name has $File and BatchID is $batchID\n\n";
}


#username       BatchID       Filename                  size          
+ Date                     Status         Owner
__DATA__
user1           1718          test2.txt.542             46            
+ 11/07/08-15:45           CREMFZ         roadtest 
user1           1715          MIGRATE.DLL.698           40960         
+ 11/07/08-15:46           CREMFY         roadtest
user1           1712          New Microsoft Word Docu   35840         
+ 11/07/08-15:46           CREMFY         roadtest 
user2           1707          amanda test 1 .doc.411    24064         
+ 11/07/08-15:47           CREMFY         roadtest 
user2           1706          amanda test 1 .doc.544    24064         
+ 11/07/08-15:47           CREMFY         roadtest 
user2           1705          amanda test 1 .doc.55     24064         
+ 11/07/08-15:47           CREMFY         roadtest
user3           1702          amanda test 2 .doc.322    24064         
+ 11/07/08-15:47           CREMFY         roadtest 
user3           1701          amanda test 2 .doc.610    24064         
+ 11/07/08-15:47           CREMFY         roadtest 
user3           1699          amanda test 2.doc.829     24064         
+ 11/07/08-15:47           CREMFY         roadtest
[download]

Comment on read the filename column Download Code

Replies are listed 'Best First'.
Re: read the filename column by AnomalousMonk (Archbishop) on Jul 09, 2011 at 17:35 UTC
Assuming that the 'username', 'BatchID', 'size', 'Date', 'Status' and 'Owner' fields will never have embedded whitespace, then this is perhaps not more 'efficient', but I think more maintainable: `>perl -wMstrict -le "my $rec = 'user2 1707 amanda test 1 .doc.412 4064 ' . '11/07/08-15:47 CREMFY roadtest' ; print qq{'$rec'}; ;; my ($username, $batchid, $remainder) = split /\s+/, $rec, 3; my $filename = reverse((split /\s+/, reverse($remainder), 5)[-1]); ;; print qq{'$username' '$batchid' '$filename'}; " 'user2 1707 amanda test 1 .doc.412 4064 11/07/08-15:47 CREMFY + roadtest' 'user2' '1707' 'amanda test 1 .doc.412'` [download] Update: Something like this approach would work well if the data fields are variable-width; if they are fixed-width as the example data of the OP suggest, the `unpack` approach of Re: read the filename column would be far better.	[reply] [d/l] [select]
Re^2: read the filename column by roadtest (Sexton) on Jul 09, 2011 at 17:48 UTC
nice trick!:-)	[reply]
Re: read the filename column by Anonymous Monk on Jul 09, 2011 at 17:01 UTC
Yes, `my( $username , $BatchID , $Filename , $size , $Date , $Status , $Owne +r ) = unpack ' A17 A15 A27 A16 A26 A16 A10 +';` [download]	[reply] [d/l]
Re^2: read the filename column by AnomalousMonk (Archbishop) on Jul 09, 2011 at 18:14 UTC
++ for spotting the fields are fixed-width. Use of `unpack` is certainly more efficient than `split`. But I don't see how you arrived at a field-width of 10 for the final 'Owner' field; isn't this field better unpacked with 'A*' to make it width-independent?	[reply] [d/l] [select]
Re^3: read the filename column by Anonymous Monk on Jul 09, 2011 at 18:53 UTC
Perhaps you didn't notice, but all the entries are off by one	[reply]
Re^4: read the filename column by AnomalousMonk (Archbishop) on Jul 09, 2011 at 20:59 UTC
Re^5: read the filename column by Anonymous Monk on Jul 09, 2011 at 22:01 UTC
Some notes below your chosen depth have not been shown here
Re^2: read the filename column by roadtest (Sexton) on Jul 09, 2011 at 17:38 UTC
Thanks, nice to know unpack can be used this way:-) cheers,	[reply]
Re: read the filename column by GrandFather (Saint) on Jul 09, 2011 at 23:55 UTC
If you have more extensive processing than shown it may be worth considering using DBD::AnyData to wrap your file as a database. `use strict; use warnings; use DBI; my $dbh = DBI->connect ('dbi:AnyData(RaiseError=>1):'); $dbh->func ( 'Docs', 'Fixed', [<DATA>], { col_names => 'name,batch,file,size,date,status,owner', pattern => 'A16 A14 A26 A15 A25 A15 A9' }, 'ad_import' ); my $sql = 'SELECT name, batch, file FROM Docs order by name, batch'; my $sth = $dbh->prepare ($sql); $sth->execute (); while (my $row = $sth->fetchrow_hashref ()) { print "$row->{name} has '$row->{file}' and BatchID is $row->{batch +}\n"; } __DATA__` [download] Given data as shown in the OP prints: `user1 has 'New Microsoft Word Docu' and BatchID is 1712 user1 has 'MIGRATE.DLL.698' and BatchID is 1715 user1 has 'test2.txt.542' and BatchID is 1718 user2 has 'amanda test 1 .doc.55' and BatchID is 1705 user2 has 'amanda test 1 .doc.544' and BatchID is 1706 user2 has 'amanda test 1 .doc.411' and BatchID is 1707 user3 has 'amanda test 2.doc.829' and BatchID is 1699 user3 has 'amanda test 2 .doc.610' and BatchID is 1701 user3 has 'amanda test 2 .doc.322' and BatchID is 1702` [download] To use a file instead of data provided in __DATA__ replace `[<DATA>]` with the file name and 'ad_import' with 'ad_catalog'. True laziness is hard work	[reply] [d/l] [select]