in reply to image clean up and alignment

There are several potential problems both with what you are trying to do, and with trying to do it with perl.

The first is that you mentioned that the slides are in .JPG form (although the linked example was a .GIF). Unless the .JPG was saved without compression, you have already either lost information, or muddied the information that is there, as .JPG compression is 'lossy'. This means that some of the information present in the original data has been discarded in order to reduce the range of values present and aid the compression ratio. This shows up in a close inspection of the sample slide in several ways, the most fundemental of which is that the borders around the spots are not a single consistant colour, but instead contain blotches of different shades. This presents a fundemental problem with edge detection. If you then attempt to apply image rotation to correct the skew, you are going to further degrade the integrity of the image.

Assuming that the spots are 16 pixels square, the inter-spot borders are 5 pixels, and the outer frame and inter-block borders are 10 pixels. If the slides are two blocks deep, and there are 20x20 spots/block as in the linked example, if the run-out from top to bottom was 10 pixels, then the angle of rotation required to correct this would be less than 1 degree! If one edge is longer than two blocks, then the angle gets progresively smaller, and the problem that much harder. Apart from that I do know of any low-end image manipulation libraries or packages that will perform rotations of partial degrees, the process of rotation would further distort the edges between the borders and the spots. If the slides are available in a non-lossy or uncompressed form, then they would make a much better starting point, but even then, using the generic image manipulation of something like ImageMagik is likely to result in considerable loss of information.

If you could say that the frame was definitively black and that spots were not black, then scanning first horizontally to determine the difference in the width of the left-hand border, doing a little math to determine how many pixels to pad or trim that edge of each row is reasonable trivial. You then apply the same technique processing the image vertically again triming or padding one edge to realign things vertically, and you should correct the skew. From what I saw of the linked slide, the problem is that there is no diffinitive color for the borders as I mentioned earlier, so edge detecton then becomes a process of determining a threshold. Anything below this value is black and therefore border, anything above is color and therefore spot. Again, looking at the linked frame, some spots have no color at all and so are indistinguishable from border. In some places there are blotches in the border that are brighter than the center of some of the spots. You might be tempted to try and use contrast enhancement or spot removal to clean up the borders, but until you have detected them, you can only apply the algorithms to the whole slide and thereby affect the color of the spots as well. I assume that this would fundementally affect the nature of the expressions you are trying to detect and categorise?

I guess the upshot of what I am saying is that whilst it would be possible to manually isolate the border using threshold filters, construct a mask of the borders, apply this back to the original image and the extract the spots using general purpose photoimaging techniques, trying to automate the process using those techniques is going to be extremely difficult--requiring many passes and stepwise refinements applied to each image--if not impossible.

To stand any realistic chances of automating this without altering the nature of the data itself would require the use (or construction) of library of much lower level, and more highly tailored filters than are generally available in photoimaging libraries and packages.

It would also require specific knowledge of the nature of the information that you need to derive from the spots. Eg. It would be somewhat simpler if you only needed to determine the absolute color of the 'brightest' pixel in each spot than if you need to determine an average (mean or median) of each spot or the relative density of the colors in the spots?

The other problem with trying to do this using perl, unless you can find/obtain an existing library with the required facilities that is written in C, FORTRAN or similar that has a Perl callable interface, is that processing large, 2-dimensional arrays of numeric data is just about the weakest aspect of Perl. The very nature of perls dynamic array structures actively works against manipulating data that is essentially static in nature. You can drop into Inline::C or XS, but unless you manipulate the data in packed scalars (or blocks of memory allocated at the C-level)--in which case you would probably be better off using C for the entire proces--all the pointer chasing that make Perls arrays such a joy for most uses completely work against you in this case.

If no other monks come along with better options than those I've mentioned and I haven't completely put you off, I'd love to see the answers to the questions I've posed above. I did do some playing around with writing a module to allow direct manipulation of packed image data (in .BMP format) which I would gladly pass along if you think it would be helpful. It doesn't go very far but it might help.

Good luck.


Examine what is said, not who speaks.
1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

Replies are listed 'Best First'.
Re: Re: image clean up and alignment
by glwtta (Hermit) on Apr 16, 2003 at 15:18 UTC
    First off, thank you for the very extensive answer, your help is much appreciated.

    Now to clear up a few things. First, the sample image was only provided for those who have never seen a microarray slide to better understand what I mean by "slide", "spot", etc. the images I am working with are indeed jpegs are much larger and actually quite a bit cleaner (I wish I had a place to upload an actual sample... I'll see if I can find a way).

    Most importantly - I am not doing any sort of analysis on the spots themselves. The analysis is done on the original TIFF files (which are much too large to reasonably store for this application, which is why I am using the jpegs, I am not sure how compressed they are, but they are still of very high quality) using software written by many people much smater than I over the course of many years :) All I need is to get the spot given the coordinates - it's just another visual clue for the user in the final report, so the spot colour doesn't need to be anywhere near as precise as what is used for the actual analysis.

    It has not in fact occured to me that much easier than rotating the actual image would be to determine the distances to the border for each row/column and adjust the cropping accordingly, most likely this will be what I end up doing. In fact just doing this for each of the four cornet spots, which have a very high contrast with the background just for this purpose, should be sufficient to figure out the rest of them.

    Incidentally, speed and memory requirements are not an issue at all here - I would most likely to this once on the image to pick out all the spots and store them individually, and we are only talking about a few hundred of these over the course of several years.