in reply to A data selection problem(in3D).

Maybe you could try something like this. First, this code creates a hash containing an entry for each unique color in an image. Then the colors are sorted by Luma value and broken into 17 groups of colors with similar Luma. Then, an average RGB triple is calculated for each group, weighted by the pixel count of each color in the input image. This produces a coarse gradient of 17 colors with ascending luma. A linear interpolation is then performed between these reference colors to get a gradient of 256 colors.

I was lazy about two things in this code: The grouping of 17 colors is proportional to unique color count, rather than evenly distributing Luma differences. Also, a polynomial interpolation might be better than a linear one, although the linear result looks ok to me.

Here is the gradient generated from the jpeg you linked:

https://image.ibb.co/gMY085/grid.png

I hope this helps.

use strict; use GD; # GetReferenceColors returns a coarse gradient of colors typical # at several Luma levels sub GetReferenceColors { my ($img, $count) = @_; # build %indexes hash where the keys are the indexes of each uniqu +e color # and the values are the number of pixels counted for the color in +dex my ($width, $height) = $img->getBounds; my %indexes; for my $y (0..$height-1) { for my $x (0..$width-1) { my $idx = $img->getPixel($x, $y); ++$indexes{$idx}; } } # Build @colors array with one entry for each color # contains the color's RGB triple, its Luma (Y value) and pixel co +unt my @colors; for my $idx (keys %indexes) { my @rgb = $img->rgb($idx); my $y = $rgb[0]*0.299 + $rgb[1]*0.587 + $rgb[2]*0.114; my $pixel_count = $indexes{$idx}; push(@colors, { rgb => [@rgb], y => $y, count => $pixel_count +}); } # Sort @colors by ascending Luma value @colors = sort { $a->{y} <=> $b->{y} } @colors; # split @colors into $count groups, which overlap by one entry # calculate each group's average RGB value, weighted by pixel coun +t # add each group's average [r,g,b] triple to @ref_colors. my @ref_colors; my $step = @colors / $count; for my $i (0..$count - 1) { my $start = int($i * $step); my $end = int(($i + 1) * $step); my $wsum = 0; my @csum = (0, 0, 0); for my $j ($start .. $end) { my $color = $colors[$j]; my $weight = $color->{count}; $wsum += $weight; for my $ci (0..2) { $csum[$ci] += $color->{rgb}->[$ci] * $weight; } } for my $ci (0..2) { $csum[$ci] = int($csum[$ci] / $wsum + 0.5); } push(@ref_colors, \@csum ); } return \@ref_colors; } # InterpolateColors interpolates between two [r,g,b] triples # by a weight factor between 0 and 1 sub InterpolateColors { my ($ca, $cb, $pb) = @_; my @rgb; for my $i (0..2) { push(@rgb, int($ca->[$i] * (1 - $pb) + $cb->[$i] * $pb + 0.5)) +; } return \@rgb; } # Builds an interpolated set of colors based on an image's # most commonly occurring colors at a series of brightness levels sub InterpolatePalette { my ($img, $count) = @_; # Build a 256-entry @gradient by linearly interpolating between # a set of 17 colors returned by GetReferenceColors my @gradient; my $ref_colors = GetReferenceColors($img, 17); for my $i (1..@$ref_colors-1) { my $c0 = $ref_colors->[$i-1]; my $c1 = $ref_colors->[$i]; for my $j (0..15) { my $p = $j/16; push(@gradient, InterpolateColors($c0, $c1, $p)); } } return \@gradient; } # Read the input image from a file my $file = $ARGV[0] // '07_AH_Esfahan Gold 65-ab.jpg'; my $img = GD::Image->newFromJpeg($file); # Calculate an interpolated gradient from the image's dominant colors my $r = InterpolatePalette($img); # Create a new image for displaying the color gradient on a grid my $len = 20; my $width = 16*$len+1; my $grid = new GD::Image($width, $width, 1); # Draw the grid my $background = $grid->colorResolve(0, 0, 0); $grid->filledRectangle(0, 0, $width, $width, $background); my $loc = 0; for my $color (@$r) { my $x = ($loc & 15) * $len; my $y = ($loc >> 4) * $len; ++$loc; my $color = $grid->colorResolve(@$color); $grid->filledRectangle($x+1, $y+1, $x+$len-1, $y+$len-1, $color); } # Save the result open(my $fh, '>:raw', 'grid.png') or die $!; print $fh $grid->png;

Replies are listed 'Best First'.
Re^2: A data selection problem(in3D).
by BrowserUk (Patriarch) on Apr 08, 2017 at 01:58 UTC
    First, this code creates a hash containing an entry for each unique color in an image. Then the colors are sorted by Luma value and broken into 17 groups of colors with similar Luma. Then, an average RGB triple is calculated for each group, weighted by the pixel count of each color in the input image. This produces a coarse gradient of 17 colors with ascending luma. A linear interpolation is then performed between these reference colors to get a gradient of 256 colors.

    First: thank you for your response and code.

    However, there are problems with that approach.

    • Weighting the colors chosen by their pixel counts biases the selection according the amount of light and shade and the balance between light and shade within the source picture.

      The source picture is used to discover the range of tones and tints reflecting from the chosen material/surface, not their proportions.

      Once you apply the gradient to models, the proportions of light and shade are (need to be) dictated by the shape and lighting angles of the target model, not the source image.

    • Once you take away the weighting in the choice of the interpolation points, what you've effectively got is a straight forward linear interpolation through the colors present in the source image.

      The problem is that produces too wide a band of dark (near black) and light (near white) shades; and thus throws away too much of the primary shades that will dominate most(*) models.

    • Finally, by picking an average (weighted or otherwise) in the first and last groups, you guarantee to discard the darkest and lightest shades.

      That could be addressed by making 15 groups and then end-stopping with the darkest and lightest colors found; but that will tend to emphasis the final issue.

    • By interpolating (whether through rgb or hsv) between values present in the source range, you are likely to populate the gradient with colors that never appear in the source input, which unfortunately produces models that don't look right.

      This might be addressed by interpolating the between-chosen-points values and then going back to the dataset to find the 'nearest' value that exists there; but in my attempts, defining 'nearest' in a 3D space is fraught with problems, and inevitably results in uneven jumps in the gradient that stand out like sore thumbs when applied to a model.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.
      It is true that the end points lose the darkest and lightest shades -- this is another point I was too lazy to addres, but assumed you could rectify this. That source image only has 200 unique colors in it, and a lot of them are non-gold shades of gray -- this image does not contain a rich color selection -- so interpolation into absent colors is necessary. And your first point (undesired patterns of brightness density) would be addressed by using brightness levels, instead of the densities of unique colors at a similar brightness, to group the "reference" colors (which I was also too lazy to do here).
        That source image only has 200 unique colors in it,

        Are we talking about the same "source image"?

        Because this is the one (linked from the root node) that I was referring to, and I find 63889 unique colors in it.

        I then discarded 1119 outliers to concentrate on the 62770 in the upper half of this image.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.
      On a different note: have you thought about an approach using clustering (like k-means)? This is what I initially thought of when I saw this problem, but then I had a negative experience with a module called Image::DominantColors.