in reply to Re: Adding cols to 3d arrays - syntax
in thread Adding cols to 3d arrays - syntax

Storable is an XS module that (quickly) serializes and unserializes Perl data structures to and from its own binary format. The idea is to build the @PPN and @LPN indexes once and then save those as (presumably much smaller) files alongside the image. Actual usage is to read the index arrays back in full, then open the image file and seek/read/unpack only the data that you need for each analysis from the full image.

For efficiency, the controller is likely to batch writes until it has a full erase block and only then "flush the buffers" out to the NAND array, and there may even be structures larger than an erase block that are significant to the FTL. The odd "bank 32" data hints at such a structure. How long is that apparent field?

If the FTL uses a log structure, the "LBA 128" field might be the write sequence number you have been looking for. The nonsensical "LBA" list may simply be garbage, "unused" space that gets written with whatever happened to be in the controller's memory when writing the block. In other words, it may be a list of LPNs, but not LPNs that are relevant to the current state of the NAND array. Or, in C terms, the contents of an uninitialized buffer.

Also, a small note about this site: there is a "reply" link for each post, and your post appears as a child of that post if you use it, instead of appearing at the top-level in the thread. PerlMonks also notifies the author of the post you replied to when a reply is made in this way. Please use it. I will request to have this subthread reparented, but please try to maintain the threaded nature of the discussion. The "reply" link for this post should appear to the right of this paragraph. --->

Replies are listed 'Best First'.
Re^3: Adding cols to 3d arrays - syntax
by peterrowse (Acolyte) on Sep 21, 2019 at 00:12 UTC

    Sorry about replying with the wrong link - I was originally hitting 'reply' but then my posts seems to be hidden so reverted to the other, I see the reasons to use reply now.

    Anyway re storable I might as well try it tomorrow, quick to try and if it speeds things up it will be helpful. Since you mentioned log structured file systems I've been reading up and trying to get my head around how they would appear on disk - theres references to this in the OpenSSD source and it does seem likely (perhaps even inevitable) that this disk will use one. But my understanding of them is thin currently, although a bit better after reading on the topic. Certainly what I see so far seems to fit with a log structure (to me at least), its just missing a sequence number but I think its likely I just haven't found that yet.

    What info on SSD log structure design is available seems to point to a more complex design than the 'classroom' one, to be expected I suppose, with data on wear levelling, write count etc also needing to be stored somewhere. But then also it seems to me that a SSD would likely not want to scan too much on start up for speed reasons, preferring to cache a single map file of around 60MB somewhere for startup. Then the LBA area that I am reading is there for backup in case the cached map is damaged, although the stale page data should also be there in that case. That map file would need to move around the disk a lot I imagine, otherwise the physical block its assigned to would wear quickly. I have binary grepped the disk for some segments of the map file I created but found no match however, but perhaps I need to conduct a more sophisticated search than a simple hex grep. It would certainly be convenient to find such a file.

    The field in bank 32 is short - IIRC 128 bytes of 'active' data, IE data which varies across rows. The next couple of hundred bytes contain a pattern but its the same across all instances of bank 32 so it can't be significant. Today I spent a little time looking at the data in the field but not much, I'll hopefully do a little more tomorrow and can say a bit more. What I did see today though when I looked at it in binary is that in most rows there are just a few bits as 1, or pairs of 1s, with the vast majority being 0's. The numbers range from 0 to very high (IIRC 500M or more) so its not addresses. And theres far too little variation for it to store much data, apart from perhaps as a whole (the entire disks worth of bank 32 data). I'll post a bit of it, and if I can find somewhere to host it I'll put a couple of megs worth up for anyone to look at if they have the time.

    Re the LBA 128 field, although I have not analysed it properly I remember that simple analysis showed a lot of repetition. That put me off the idea of it being a sequence number, my initial thought. But reading up on the log structure there simply must be a sequence number somewhere, and it makes no sense to not put it in the block being written (perhaps elsewhere too), so I think I need to hunt more for that. Perhaps repetition is permissible, if multiple blocks are written in a 'transaction', for instance.

    I'm in the UK so its 1am now and I have to knock off but thanks for the assistance and ideas and I'll hopefully come back with some more info tomorrow.

    Thanks, Pete

      All Storable will do is provide a means for you to recover the @LPN and @PPN arrays quickly, after building them once. If they are as small as I expect, that will give you a large improvement in start-up time, and may allow you to examine less of the main image, therefore keeping more of your working data in the OS caches.

      For a simple log structure, the scan on start-up could be a binary search to find the greatest distance between sequence numbers, but that would require that existing data be "moved out of the way" to allow the writes to always proceed sequentially. A cached map could also be in some other storage with better endurance characteristics separate from the main NAND array. Or the drive could depend on the host system's POST latency to hide the delays for the start-up scan, or store a few pointers sufficient to locate the first data that the host will want in only a few locations while storing the "rest" of the log index with more flexibility and continuing the scan while servicing the first few host requests.

      Only a few bits set, with the vast majority clear, strongly suggests some kind of flags field. 128 bytes is 1024 bits, or (if I understand the structures you have found correctly) one bit for each 64KiB region in a group containing a "bank 32" field. If I recall correctly, NAND flash erases to all bits set, and most flash permits some number of write cycles, each only clearing additional bits, between erases. Hypothesis: only one of each set of duplicate LPN will have a corresponding bit set in "bank 32" if this is a validity field.

      Finding the sequence number depends on guessing the drive's write sequence. For a simple log-structured filesystem that moves data to allow writes to always be sequential, this should be easy, since the sequence numbers will monotonically increase with one break somewhere on the media. Unless the sequence numbers are not write sequence, but something derived from power-cycle count, since the drive could keep its actual write count in the controller's RAM and reconstruct this from the NAND array on start-up. That would explain the repetition, if it is some kind of session ID.

      How full was the filesystem and did the host use the TRIM command to release blocks? If the NAND array was mostly (or entirely) allocated, I would expect very little variation in an "in-use" or "valid data" field across the disk.

        Hmm not sure if I did the right thing replying here, had to click to see your post and the page I am seeing does not offer me a reply link only a comment one. If someone is going to clean up this thread thats great but let me know if my settings are wrong or something with your reply being hidden on the original page.

        Spoiler alert: I successfully mounted the drive today, although its quite 'damaged', but mount accepted it.

        Anyway re your post I haven't had time yet to explore the suggestions re flags, working this morning (while looking after one of my little ones so a bit disjointedly) on examining the field we are talking about in a more basic fashion. The repetition is considerable, but there is another 4 bytes after the LBA 128 field we discussed and I wondered if it might be part of the sequence number but it seems not. I'll have to check what you were saying about the bit fields marking only a single LBA valid - its an interesting thought.

        Re the write repeatedly point I think I remember seeing in the datasheet that this is forbidden - I'll have to check to be sure but I remember thinking its very restrictive. If it was permitted it could allow a validity field with 0 being invalid which would be very good but I don't think so, will check later though.

        I don't quite understand your point re the write count being reconstructed. Sounds interesting though and I will read this throughout the afternoon trying to understand it but might need to come back to you on it later.

        As for the filesystem it was quite full IIRC, about 90% probably (good because there are fewer unused and hence stale pages). I don't think TRIM was used on this drive, it was running OSX (on PC hardware - a hackintosh, stupid experiment I made).

        Now as for the mounting I mentioned earlier. Sleeping on the log structure details and a bit more reading made me wonder about something I saw in the drive some time ago. I made a bitmap image of sector average values to visualise the areas that might contain addresses (since they are 24 bit values occupying 32 bit space). I saw an interesting horizontal darker band about 10% of the drives capacity in size, occupying the space between about 80% and 90% of the drive space. IE towards the bottom. Looking in it did not show anything very significant and I put it aside for now. But the simple log structure file system I think can simply treat the whole drive as a log - at least that was my interpretation - and just keep writing to the head of the log, which would of course move along and wrap around to the bottom of the drive as it was written. I wondered if that dark band in any way marked this since I don't see why its there, the OS would not be aware of physical locations so could not be responsible. Anyway I decided to try writing out the map file starting from offset around 85% and wrapping via offset 0% back to 85%. Loading that map file into the kernel module allowed me to mount the image once I had provided the offset of the partitions starting block (found with HFS rescue).

        Now although this creates almost as many questions as it answers, its an interesting turn. Why is the area darker for instance. I originally thought it should be light (IE all FF) but of course if it was not erased yet because the drive did not know it was unused it should not be. Perhaps OSX writes 0x00 to the whole drive when it formats, although this install was several months old so I imagine I had turned over the whole 256 gig by this time.

        If its possible to upload images here I can upload the bitmap file. The dark band has fuzzy edges and is ill defined but certainly there.

        Many of the folders in the root directory are now accessible, although how many files are readable I haven't checked yet. Some folders yield a ' Input/output error', and as you drill down through working folders you hit further such inaccessible ones. Still this is far further than I have got before so its significant. It might I guess be by chance that one or two significant blocks that hold top level directory information have been correctly selected now rather than a large proportion of blocks I don't know (understanding of FS structure too hazy).

        I think next I should investigate your idea about the bitfields and duplicate LBA correlation now but if anything I have mentioned gives you any further ideas they would be much appreciated.

        Thanks, Pete