in reply to Re: RFC - Documentation Review
in thread Please review documentation of my AI::Embedding module

Thank you hv for your valued input.

Typos: compatator; tyhe; chargable (should be chargeable 1); "will random" => "will be random"; ACKNOWLEDGEMENTS section misses a trailing full stop.

All corrected. I use Grammarly for all the writing I do. But it doesn't work in my text editor so doesn't correct errors in POD. Perhaps I need to copy POD into something Grammarly does check before uploading it.

returning the HTTP::Tiny response object on failure of various methods means...

Good point!
It was probably laziness on my part which could do with revisiting. It's on the ToDo List.

it seems strange to have the comparator be built in to the object.

I feel the term 'comparator' is unclear. But I cannot think of a better one!
When the compare method is called with two parameters, there is some processing of both to convert them into hashrefs. If one is feeding the same parameter to compare repeatedly many times, this processing can add up. So the comparator method does the conversion just once and stores the hashref to be compared to the single parameter fed to compare.

If you can suggest a better method name, that would be great.

Use of the word "homogeneous" is odd

I mean that one would not be interested in the discreet values of the array, only the array as a whole. Because the whole array needs to be stored as a whole and not as parts, it makes sense (to me at least) to have it as a "homogeneous" string of values. This is easy to store in a database.

Hope this helps. :)

Tremendously thank you :)

Replies are listed 'Best First'.
Re^3: RFC - Documentation Review
by hv (Prior) on Jun 03, 2023 at 01:38 UTC

    I feel the term 'comparator' is unclear. But I cannot think of a better one!

    I have no problem with the name, only with the interface - it doesn't make sense (as far as I can see) to embed the comparator within the object. Apart from anything else that makes it harder to have multiple comparators, for no obvious benefit.

    I'm imagining a simple curry like:

    sub comparator { my($self, $embed) = @_; return sub { $self->compare($embed, @_); }; }

    .. and documentation like:

    comparator

    my $comparator = $embedding->comparator($csv_embedding1); ... my $comparison = $comparator->($csv_embedding2);

    Returns a subroutine reference that can be used for repeated compare calls for the given vector against different secondary vectors, that returns the same type of result as compare.

    Update: I forgot to include the extra work for comparator, it should probably look more like:

    sub comparator { my($self, $embed) = @_; my $vector1 = $self->_make_vector($embed); return sub { my($embed2) = @_; my $vector2 = $self->_make_vector($embed2); return $self->_compare_vector($vector1, $vector2); }; }

    .. where _compare_vector would be factored out of the last 9 lines of compare.

      I have no problem with the name, only with the interface...

      Sorry...I don't understand. Either that or I haven't properly explained what the comparator does.

      The comparator method doesn't do any work other than to set a value to be compared with the compare method. So these two are exactly equivalent:

      my $difference = $embedding->compare($embed2, $embed1); $embedding->comparator($embed1); my $difference = $embedding->compare($embed2);
      The only time it makes any sense to use the comparator method is when there are lots of values to compare to the same thing:
      $embedding->comparator($embed1); my $diff2 = $embedding->compare($embed2); # Compares $embed2 to $emb +ed1 my $diff3 = $embedding->compare($embed3); # Compares $embed3 to $emb +ed1 my $diff4 = $embedding->compare($embed4); # Compares $embed4 to $emb +ed1 my $diff5 = $embedding->compare($embed5); # Compares $embed5 to $emb +ed1 my $diff6 = $embedding->compare($embed6); # Compares $embed6 to $emb +ed1 my $diff7 = $embedding->compare($embed7); # Compares $embed7 to $emb +ed1 # ... etc ...

        Let me rephrase the relevant bit: it doesn't make sense (as far as I can see) to store the value to be compared inside the object. Apart from anything else that makes it harder to have multiple comparators, for no obvious benefit.

        In my suggested interface your example would look like:

        my $cmp1 = $embedding->comparator($embed1); my $diff2 = $cmp1->($embed2); # Compares $embed2 to $embed1 my $diff3 = $cmp1->($embed3); # Compares $embed3 to $embed1 my $diff4 = $cmp1->($embed4); # Compares $embed4 to $embed1 # ... etc ...

        But you could also simultaneously have a second comparator, without needing a whole second object for it:

        my $cmp2 = $embedding->comparator($embed2); # classify target embeddings according to which they are most similar +to for my $next_embed (@targets) { my $diff1 = $cmp1->($next_embed); my $diff2 = $cmp2->($next_embed); if ($diff1 > $diff2) { push @closer_to_embed1, $next_embed; } else { push @closer_to_embed2, $next_embed; } }

        Generally it is preferable for instances of a class to represent one thing: in this case an instance represents "the interface to a given model via an API". At the point you store a comparison vector, it represents "the interface to a given model via an API _and_ something that knows how to compare against one particular embedding" - that's a mash-up of two very different things.

        My suggested approach is the smallest change I could see that avoids the mash-up. Another approach would be to expose make_vector and compare_vector methods; then there is no need for the object to have special mechanisms to cache the vector. Yet another approach would be for embedding itself to return an object of a new type, would could tell you the string, array or vector representations of the embedding as needed. (But at that point you'd really want the existing class to be called AI::Interface or AI::Embedder so that this new object can be an actual AI::Embedding.)

        I don't understand. Either that or I haven't properly explained what the comparator does.

        The comparator method doesn't do any work other than to set a value to be compared with the compare method. So these two are exactly equivalent:

        my $difference = $embedding->compare($embed2, $embed1); $embedding->comparator($embed1); my $difference = $embedding->compare($embed2);

        The only time it makes any sense to use the comparator method is when there are lots of values to compare to the same thing:

        In my understanding, a comparator is a device that compares things. I'm thinking of something like an LM393, or the optional block or function name passed to sort.

        What your comparator() method seems to do is to set a reference value for some kind of comparator. So it should be named setReference() or similar.

        Alexander


        Update: repeated typo fixed, found by kcott.
        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)