Re^8: Adding cols to 3d arrays

Replies are listed 'Best First'.
Re^9: Adding cols to 3d arrays - syntax by peterrowse (Acolyte) on Sep 22, 2019 at 09:54 UTC
Yes I guess thats the only likely reason. Perhaps checking for bad blocks - the datasheet says blocks will start showing errors as time goes on. The chips only recognise errors in cells that are written to 0 according to the datasheet (understandably) so writing a page of 0s checks cell integrity before the reuse of each block (this will happen transparently to the controller chip). The datasheet states that 'the number of consecutive partial page programming operation within the same page without an intervening erase operation must not exceed 1 time for the page' which I take to mean the page can only be programmed once, but whether this is imposed by the chips themselves it does not say. If you don't care about the integrity of the data you could write 0s to the whole block over the previous data and save some cell transitions (meaning lifespan I would guess). The smoother grey sections of each column before the dark sections is curious and I need to investigate it further, but quick checks just now show data that looks fairly normal (not addressing, quite random). Must be something though. Re the log structure of a log structure disk, initially I was thinking this would be far too inefficient a way to use the disk because blocks that didn't need to be copy-written because they had not changed would require moving as much as the rest of the disk. But is it the case that such a compromise is indeed made with this method? It seems to prioritise wear levelling over total life span. A more complicated system could surely do a better job in terms of total write cycles per block needed (I imagine without having thought about it too deeply). Would a log structure really be this simple, a pure ring buffer (per each of the 32 chips)? Its certainly an absolute simple as anything way to achieve 100% perfect wear levelling. But the cost seems very high. Perhaps I can accurately determine the exact point the 'new data' starts for each column (they are not even) and write a map file to reflect this and see if the file system is in better shape as a simple exercise.	[reply]
Re^10: Adding cols to 3d arrays - syntax by jcb (Parson) on Sep 22, 2019 at 21:25 UTC
No, if the datasheet says the block must be erased after one page write, issuing another page write before an erase is a good way to destroy some flash cells. A double write does not "save some cell transitions" but rather gives a good chance that that cell will not erase properly. This rules out any kind of incremental writes on this device, unless the firmware is really badly written. So "bank 32" is probably not (simple) validity flags. Those "smoother grey sections" might be the validity/log-state data. Presumably the controller maintains tables in RAM and flushes them to the NAND array at shutdown, possibly with some kind of checkpointing scheme ("bank 32"? the "extra LPNs"?) embedded in the map pages with normal writes. That this is not exactly robust against power failure does not rule out its use in this drive — we already know that the drive is not robust against power failure! Recopying live blocks as the "rewrite zone" approaches is what the early log-structured filesystems did, if I understand correctly. Trading a theoretical total life span for better wear leveling is not as absurd as it may sound — the SSD is dead as soon as it loses the last "extra" block anyway, no matter how many write cycles may remain on other blocks that are storing live data, since it can no longer provide its stated capacity and has no way to tell the host that the total space is dwindling, nor can most PC-ish (including Macs) filesystems handle storage devices that slowly dwindle away as SSDs do.	[reply]
Re^11: Adding cols to 3d arrays - syntax by peterrowse (Acolyte) on Sep 26, 2019 at 13:41 UTC
Well I've been busy hacking away at it for the last few days but no progress. I tried using Storable which worked well for arrays but when I tried to put the larger hashes into it (or reload them, I can't remember which) I kept getting out of memory errors and extreme slow downs (needed reisub once). A couple of weeks ago I found I had originally read the pages out in a slightly wrong order, changing that led to the larger text pages I was able to read. But I realised I scanned the second LBA area using the old addressing scheme (since I was looking for physical addresses rather than LPAs this wold have broken it). Running the scan again once again froze my machine repeatedly due to the extremely large hashes I suppose. Whether I coded it wrong or not I don't know but I gave up using perl for this and wrote it in c with mmap which took a while of course but it executes quite fast so it was worth doing. However running the scan again yielded nothing. I am looking here at whether the second LBA like array in each LBA block corresponds to any of the other rows sharing the same LBAs. So I find 2 references to a particular LBA in 2 different parts of the image. I then look in the second LBA like area in the LBA block for the physical address that I found the other reference to this particular LBA in. And I find not enough matches to be not from chance only. This might be because how I am assigning a physical address to each block that I read is wrong (IE the drive thinks of physical addresses differently to how I see it), or because there is no match. I just can't see what this second field is for though. I've done some more checking and have a decent description of the LBA block contents (each is for 127 data blocks of 16k) Word refers to a 32 bit word. Each 16k block which is last in an erase size block of 128 blocks consists of: Word 0 &ed with 0xffff is the superblock number. What the first 2 bytes do is unknown, they are always 8323072 when the superblock matches the second 2 bytes. Word 1 is the number of bits set in word 2,3,4 and 5 combined. Word 2,3,4 and 5 are the bit field for the LPN zone that indicate duplicates in this erase size block. Word 6 to 132 inclusive are LPNs (127 of them). Word 133 is unknown and suspicious to being a sequence number. This number occupies the position that LPN 128 would take. However it is very frequently a very large number with only a few bits set, like a bit field. Word 134 is the number of set bits in word 135, 136, 137 and 138 combined. These words (135 to 138 inclusive) are a bit field, marking the subsequent words as valid / existing. Whether 0s in the bit field indicate skipping or invalid words in the second word field (second 128 long zone of address size numbers) is unknown. Word 139 onwards comprise the mystery fields. There are usually a handful of words here up to around 120ish, rarely more. Their value is in the range 0 to the max LPN found in the first LPN area, an exact match. After this the rest of the block is 0xffffffff. In the OpenSSD source they state that since the controller can't access the chip spare area they store LPNs in the last block in each erase block, but I recently noticed they say this is for GC, since the GC can compare the LPN with its in memory value and if it does not match it would know the block can be erased. This would be a fast and convenient method, so perhaps this is why the LPN is stored where it is. In the OpenSSD source however they structure they store is a simple single array, the second LBA area is not there. So I am still puzzled as to what it might be for. So I wonder if this LPN data (the first LPN field) is purely for GC, or might it serve a second purpose as crash recovery. In the first case it seems there will not be a sequence number or way of determining sequence since it is not needed, in the second case a sequence is needed. It would make no sense whatsoever to not record this number in the erase block, since you are writing it anyway and theres many kb free. But the only place I can find thats a possible match for a sequence number is word 133, but it does not seem like one, unless its as you mention not a simple sequence but some kind of derivative of one. Still, it seems there should be a map block somewhere on the disk for loading at boot time. Its only 64MB or so needed and theres bags of spare unused space. I'm looking around for that, I did find a couple of interesting areas with LBA or PBA range numbers but they are peppered with the odd number an order of magnitude larger. I am wondering whether these could be sequence numbers or similar in blocks of subsets of map data. IE if the drive is writing log style its only concentrating on one region of its address space at once, so only writing updates to reflect that (diffs in essence, addressed to one part of the map). If map files or fragments thereof are written out, I would imagine there are many of them, scattered around the disk. The blocks I am looking at do seem scattered around the 'smooth grey area' I mentioned. As you say, perhaps this is an area set aside for such use. And it could also include spare blocks so that as the NAND blocks fail it replaces them, and once it is used up the drive fails fully. There is lots of fully 0x00 erase size blocks in this area, why 0x00 rather than 0xffffffff I wonder (better to store cells in the 0x00 state?). Anyway thats where I am up to, not much to say but I thought I would update. Any ideas appreciated.	[reply]