Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^5: Index a file with pack for fast access

by BrowserUk (Patriarch)
on Dec 21, 2011 at 17:51 UTC ( [id://944672]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Index a file with pack for fast access
in thread Index a file with pack for fast access

Do I need to recode my index using N* or Z or Z* or A or A*? Thanks!

Dunno! Why do you think that would help?

You shouldn't need to if you used the code I posted, but I cannot see what you are now using.

I create a data file:

c:\test>perl -e"printf qq[Line %010d\n], $_ for 1 .. 25" > junk.dat c:\test>type junk.dat Line 0000000001 Line 0000000002 Line 0000000003 Line 0000000004 Line 0000000005 Line 0000000006 Line 0000000007 Line 0000000008 Line 0000000009 Line 0000000010 Line 0000000011 Line 0000000012 Line 0000000013 Line 0000000014 Line 0000000015 Line 0000000016 Line 0000000017 Line 0000000018 Line 0000000019 Line 0000000020 Line 0000000021 Line 0000000022 Line 0000000023 Line 0000000024 Line 0000000025

I then index it using the code I posted above:

c:\test>type indexFile.pl #! perl -sw use strict; open INDEX, '>:raw', "$ARGV[ 0 ].idx" or die $!; syswrite INDEX, pack( 'N', 0 ), 4; syswrite INDEX, pack( 'N', tell *ARGV ), 4 while <>; close INDEX; c:\test>indexFile junk.dat c:\test>dir junk.dat* 21/12/2011 17:45 425 junk.dat 21/12/2011 17:46 104 junk.dat.idx c:\test>

I then read through the data file via the index:

c:\test>type readIndexedFile.pl #! perl -sw use strict; use Time::HiRes qw[ time ]; our $N //= 100; open INDEX, '<:raw', "$ARGV[ 0 ].idx" or die $!; my $len = -s( INDEX ); sysread INDEX, my( $idx ), $len; close INDEX; sub getRecordN { my( $fh, $n ) = @_; seek $fh, unpack( 'N', substr $idx, ($n-1) * 4, 4 ), 0; return scalar <$fh>; } open DAT, '<', $ARGV[ 0 ] or die $!; for my $line ( 1 .. ( length( $idx ) / 4 ) - 1 ) { print "Expecting $line; got: ", getRecordN( *DAT, $line ); } c:\test>readIndexedFile junk.dat Expecting 1; got: Line 0000000001 Expecting 2; got: Line 0000000002 Expecting 3; got: Line 0000000003 Expecting 4; got: Line 0000000004 Expecting 5; got: Line 0000000005 Expecting 6; got: Line 0000000006 Expecting 7; got: Line 0000000007 Expecting 8; got: Line 0000000008 Expecting 9; got: Line 0000000009 Expecting 10; got: Line 0000000010 Expecting 11; got: Line 0000000011 Expecting 12; got: Line 0000000012 Expecting 13; got: Line 0000000013 Expecting 14; got: Line 0000000014 Expecting 15; got: Line 0000000015 Expecting 16; got: Line 0000000016 Expecting 17; got: Line 0000000017 Expecting 18; got: Line 0000000018 Expecting 19; got: Line 0000000019 Expecting 20; got: Line 0000000020 Expecting 21; got: Line 0000000021 Expecting 22; got: Line 0000000022 Expecting 23; got: Line 0000000023 Expecting 24; got: Line 0000000024 Expecting 25; got: Line 0000000025

And everything works as expected. If yours doesn't, then you will have to work out how your code differs from mine.

Or failing that, you could post your indexing and reading code, and we might be able to help you. But answering your questions without being able to see your current code isn't possible.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^6: Index a file with pack for fast access
by Ineffectual (Scribe) on Dec 21, 2011 at 18:56 UTC
    I was thinking it would help to recode it using something else because it seems like what's happening is that the entire line isn't fitting in 4 bytes. Maybe it's the tabs?

    I've uploaded the three files I'm using to gist
      I've uploaded the three files I'm using to gist

      Why there and not here?

      But, this is your error:

      open(IN, $oneper) or die "Can't open file $oneper for reading: $!\n"; open(INDEX, ">:raw","$file.idx") or die "Can't open $file.idx for read +/write: $!\n"; syswrite INDEX, pack('N',0),4; while (<IN>) { syswrite INDEX, pack('N', tell INDEX), 4; ##.................................^^^^^ } close INDEX;

      You are indexing your index file instead of your datafile.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        Thanks so much for looking at that. :) It works great now. I um... can never figure out how to link stuff on here, so I figured gist was as good.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://944672]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-03-29 06:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found