Re: image clean up and alignment
by BrowserUk (Patriarch) on Apr 16, 2003 at 04:07 UTC
|
There are several potential problems both with what you are trying to do, and with trying to do it with perl.
The first is that you mentioned that the slides are in .JPG form (although the linked example was a .GIF). Unless the .JPG was saved without compression, you have already either lost information, or muddied the information that is there, as .JPG compression is 'lossy'. This means that some of the information present in the original data has been discarded in order to reduce the range of values present and aid the compression ratio. This shows up in a close inspection of the sample slide in several ways, the most fundemental of which is that the borders around the spots are not a single consistant colour, but instead contain blotches of different shades. This presents a fundemental problem with edge detection. If you then attempt to apply image rotation to correct the skew, you are going to further degrade the integrity of the image.
Assuming that the spots are 16 pixels square, the inter-spot borders are 5 pixels, and the outer frame and inter-block borders are 10 pixels. If the slides are two blocks deep, and there are 20x20 spots/block as in the linked example, if the run-out from top to bottom was 10 pixels, then the angle of rotation required to correct this would be less than 1 degree! If one edge is longer than two blocks, then the angle gets progresively smaller, and the problem that much harder. Apart from that I do know of any low-end image manipulation libraries or packages that will perform rotations of partial degrees, the process of rotation would further distort the edges between the borders and the spots. If the slides are available in a non-lossy or uncompressed form, then they would make a much better starting point, but even then, using the generic image manipulation of something like ImageMagik is likely to result in considerable loss of information.
If you could say that the frame was definitively black and that spots were not black, then scanning first horizontally to determine the difference in the width of the left-hand border, doing a little math to determine how many pixels to pad or trim that edge of each row is reasonable trivial. You then apply the same technique processing the image vertically again triming or padding one edge to realign things vertically, and you should correct the skew. From what I saw of the linked slide, the problem is that there is no diffinitive color for the borders as I mentioned earlier, so edge detecton then becomes a process of determining a threshold. Anything below this value is black and therefore border, anything above is color and therefore spot. Again, looking at the linked frame, some spots have no color at all and so are indistinguishable from border. In some places there are blotches in the border that are brighter than the center of some of the spots. You might be tempted to try and use contrast enhancement or spot removal to clean up the borders, but until you have detected them, you can only apply the algorithms to the whole slide and thereby affect the color of the spots as well. I assume that this would fundementally affect the nature of the expressions you are trying to detect and categorise?
I guess the upshot of what I am saying is that whilst it would be possible to manually isolate the border using threshold filters, construct a mask of the borders, apply this back to the original image and the extract the spots using general purpose photoimaging techniques, trying to automate the process using those techniques is going to be extremely difficult--requiring many passes and stepwise refinements applied to each image--if not impossible.
To stand any realistic chances of automating this without altering the nature of the data itself would require the use (or construction) of library of much lower level, and more highly tailored filters than are generally available in photoimaging libraries and packages.
It would also require specific knowledge of the nature of the information that you need to derive from the spots. Eg. It would be somewhat simpler if you only needed to determine the absolute color of the 'brightest' pixel in each spot than if you need to determine an average (mean or median) of each spot or the relative density of the colors in the spots?
The other problem with trying to do this using perl, unless you can find/obtain an existing library with the required facilities that is written in C, FORTRAN or similar that has a Perl callable interface, is that processing large, 2-dimensional arrays of numeric data is just about the weakest aspect of Perl. The very nature of perls dynamic array structures actively works against manipulating data that is essentially static in nature. You can drop into Inline::C or XS, but unless you manipulate the data in packed scalars (or blocks of memory allocated at the C-level)--in which case you would probably be better off using C for the entire proces--all the pointer chasing that make Perls arrays such a joy for most uses completely work against you in this case.
If no other monks come along with better options than those I've mentioned and I haven't completely put you off, I'd love to see the answers to the questions I've posed above. I did do some playing around with writing a module to allow direct manipulation of packed image data (in .BMP format) which I would gladly pass along if you think it would be helpful. It doesn't go very far but it might help.
Good luck.
Examine what is said, not who speaks.
1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.
| [reply] |
|
|
First off, thank you for the very extensive answer, your help is much appreciated.Now to clear up a few things. First, the sample image was only provided for those who have never seen a microarray slide to better understand what I mean by "slide", "spot", etc. the images I am working with are indeed jpegs are much larger and actually quite a bit cleaner (I wish I had a place to upload an actual sample... I'll see if I can find a way). Most importantly - I am not doing any sort of analysis on the spots themselves. The analysis is done on the original TIFF files (which are much too large to reasonably store for this application, which is why I am using the jpegs, I am not sure how compressed they are, but they are still of very high quality) using software written by many people much smater than I over the course of many years :) All I need is to get the spot given the coordinates - it's just another visual clue for the user in the final report, so the spot colour doesn't need to be anywhere near as precise as what is used for the actual analysis. It has not in fact occured to me that much easier than rotating the actual image would be to determine the distances to the border for each row/column and adjust the cropping accordingly, most likely this will be what I end up doing. In fact just doing this for each of the four cornet spots, which have a very high contrast with the background just for this purpose, should be sufficient to figure out the rest of them. Incidentally, speed and memory requirements are not an issue at all here - I would most likely to this once on the image to pick out all the spots and store them individually, and we are only talking about a few hundred of these over the course of several years.
| [reply] |
Re: image clean up and alignment
by halley (Prior) on Apr 16, 2003 at 02:43 UTC
|
Is there a module to do it? Not that I know of.
Looking for tips on how to do it?
Once you've loaded the image into memory (see PerlMagick or GD or some other image-buffer module), it's time to start doing some math. The PDL may help here, but I haven't used it much yet. These are only general advice; I've done exactly this sort of image processing in a past life, but not in Perl and not in years.
- Don't use resampling methods which will blur your data. If you have to skew, move each row or column by full pixel distances. If you have to stretch, favor stretching bigger, and only by full pixel distances (i.e., if you had 50 black then 50 white pixels, if you need to stretch out three pixels, it should end up 52 black then 51 white, and never introduce or "invent" new shades of gray).
- Decide on a tolerance, where pixels dimmer than X are considered "black" for the purposes of image alignment.
- One approach to vertical alignment would be to find the centroid of the top row of cells, and the left row of cells, then skew the image in each direction to square them up. There are a couple approaches to finding the centroids, depending on just how far out of whack your samples might be originally.
- If your samples aren't squared up (the bottom row might be significantly longer or shorter than the top row), then you'll have to correct for this as well. Skew the image for the first two axes, then stretch rows to form a constant height, then stretch columns to form a constant width.
- Lastly, once the image is rectangular, it should be easy to scan and trim any excess pixels around the border.
-- [ e d @ h a l l e y . c c ]
| [reply] |
Re: image clean up and alignment
by toma (Vicar) on Apr 16, 2003 at 04:18 UTC
|
You might try PDL::Image2D or
Tk::PhotoRotate. They both claim to
rotate images by arbitrary angles. The PDL
module can also crop regions of your image.
I think you'll find that PDL is a good thing to
learn for your application. Many of its routines
are based on fast C libraries. PDL is intended
for the type of work that you are doing.
It should work perfectly the first time! - toma | [reply] |
Re: image clean up and alignment
by Improv (Pilgrim) on Apr 16, 2003 at 01:44 UTC
|
Hey,
I don't have any advice on the bigger problem, but for
image manipulation, PerlMagick isn't a bad choice. It provides a decent API to crop and
rotate, along with lots of other good stuff. I suspect
there's no particularly elegant solution to the second
part, and that you'll just need to do a lot of custom code.
Perhaps the other monks will prove me wrong :) I hope this
helps! | [reply] |
|
|
| [reply] |
Re: image clean up and alignment
by jonadab (Parson) on Apr 16, 2003 at 11:45 UTC
|
Haven't done image manipulation in Perl myself, but I was thinking, does the Perl interface to the Gimp give you access to the magic wand tool? If so, then you probably just need to hit anywhere within the spot.
for(unpack("C*",'GGGG?GGGG?O__\?WccW?{GCw?Wcc{?Wcc~?Wcc{?~cc'
.'W?')){$j=$_-63;++$a;for$p(0..7){$h[$p][$a]=$j%2;$j/=2}}for$
p(0..7){for$a(1..45){$_=($h[$p-1][$a])?'#':' ';print}print$/}
| [reply] [d/l] |
Re: image clean up and alignment
by feloniousMonk (Pilgrim) on Apr 16, 2003 at 15:14 UTC
|
Maybe you should check over at bioperl.org.
But just an FYI - I work in a very Perl-heavy bioinformatics lab and have yet to see a perl solution for microarray image reading. Data processing, sure, but not from the level of the images.
-felonious | [reply] |
|
|
I haven't seen anything relevant from the bioperl folks (and let's face it, I spend half my time in bioperl code). Keep in mind that while the data is very bioinformatics specific, what I am trying to do with it is not in the slightest - I am not trying to read or analyze the spots, just crop a specified one, more or less precisely.
| [reply] |
Re: image clean up and alignment
by Anonymous Monk on Apr 17, 2003 at 05:21 UTC
|
Others have pointed out that once you determine the slide's skew, you
may be better off having the spot grabber compensate, rather than
trying to rotate the entire image.
One option for accessing the pixels which no-one has mentioned is
simply to use substr and unpack. Convert the jpg to ppm, which is
just a row-major sequence of pixels, 3 bytes (rgb) per pixel. Just
slurp it in as a string. The arithmetic for converting from pixel
coordinates to string offset is trivial. Substr 3 bytes and unpack.
Two lines of code. It is quite "fast" (in perl, rather than C terms).
I wouldn't suggest touching every pixel with it, but it's great for
sampling. And to grab the hypothetical 16x16 pixel spot, one can just
concatenate 16 substr's, slap "P6 16 16 255 " on the front, and you
have the spot ppm image. And with Inline::C, it is not difficult to
convert parts of this to C, should the need perhaps someday arise.
Perl is quite good at doing _simple_ things with images. It is only
when you get slightly more complex that you can get bogged down in the
zoo of partial solutions.
| [reply] |