in reply to Re^2: Converting tesseract box data into 2d grid
in thread Converting tesseract box data into 2d grid
'm guessing the magic numbers can be found by averaging the dimensions from all the bounding boxes.
Actually no. I tried that; but the average x-dimension comes out at 8.5882352941176470588235294117647; which was no good at all.
I'm afraid I cheated a little. The Y dimension was obvious as all the Ys come out as 12.
For the X, I inspected the data and guessed at 10, but that put to many extra spaces in:
C:\test>1114833 { 1 => { 1 => "T", 4 => "S", 7 => "E", 9 => "C", 11 => "I", 14 => "E", + 16 => "N", 19 => "E", 21 => "O", 24 => "T", 26 => "Y", 29 => "U", 32 + => "E" }, 3 => { 1 => " ", 4 => "I", 6 => "I", 7 => " ", 9 => "L", 11 => " ", +12 => "V", 14 => "E", 16 => " ", 17 => "V", 19 => "N", 21 => "U", 24 +=> "Y", 26 => "Z", 29 => "L", 32 => " " }, 5 => { 1 => " ", 4 => "I", 6 => "O", 7 => " ", 9 => "L", 11 => "I", +14 => "E", 16 => "D", 19 => "P", 21 => " ", 22 => "B", 24 => "H", 26 +=> "O", 29 => "S", 32 => " " }, 7 => { 1 => " ", 4 => "B", 6 => "I", 7 => " ", 9 => "C", 11 => "C", +14 => "Y", 16 => "D", 19 => "I", 21 => "T", 24 => "Z", 26 => "I", 29 +=> " ", 32 => " " }, 9 => { 1 => " ", 4 => " ", 6 => "N", 7 => " ", 9 => " ", 11 => " ", +12 => "K", 14 => "A", 16 => " ", 19 => " ", 21 => " ", 24 => "L", 26 +=> " ", 29 => " ", 32 => " " }, }
So then I tried 11 but it still put one extra in the middle row:
C:\test>1114833 { 1 => { 1 => "T", 4 => "S", 6 => "E", 8 => "C", 10 => "I", 13 => "E", + 15 => "N", 17 => "E", 19 => "O", 22 => "T", 24 => "Y", 26 => "U", 29 + => "E" }, 3 => { 1 => " ", 4 => "I", 6 => "I", 8 => "L", 10 => "V", 13 => "E", + 15 => "V", 17 => "N", 19 => "U", 22 => "Y", 24 => "Z", 26 => "L", 29 + => " " }, 5 => { 1 => " ", 4 => "I", 6 => "O", 8 => "L", 10 => "I", 13 => "E", + 15 => "D", 17 => "P", 19 => " ", 20 => "B", 22 => "H", 24 => "O", 26 + => "S", 29 => " " }, 7 => { 1 => " ", 4 => "B", 6 => "I", 8 => "C", 10 => "C", 13 => "Y", + 15 => "D", 17 => "I", 19 => "T", 22 => "Z", 24 => "I", 26 => " ", 29 + => " " }, 9 => { 1 => " ", 4 => " ", 6 => "N", 8 => " ", 10 => "K", 13 => "A", + 15 => " ", 17 => " ", 19 => " ", 22 => "L", 24 => " ", 26 => " ", 29 + => " " }, }
So then I tried 10.5 and voilą!
I also tried dividing the overall width of the longest line by the number of chars: 327-18 / 13 = 23.769...
And just now I tried taking the differences of all the x1s in the longest row:
T 18 15 28 27 0 S 45 15 54 27 0 -> 27 E 70 15 77 27 0 -> 25 C 95 15 103 27 0 -> 25 I 119 15 128 27 0 -> 24 E 146 15 153 27 0 -> 27 N 168 15 179 27 0 -> 22 E 196 15 203 27 0 -> 28 O 218 15 229 27 0 -> 22 T 244 15 254 27 0 -> 26 Y 269 15 278 27 0 -> 25 U 295 15 305 27 0 -> 26 E 320 15 327 27 0 -> 25
Then averaging those: 27+ 25+ 25+ 24+ 27+ 22+ 28+ 22+ 26+ 25+ 26+ 25 = 302 / 12 = 25.166666666666666666666666666667
An' wadda ya know. It works perfectly!
|
|---|