Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: RFC - Documentation Review

by hv (Prior)
on Jun 02, 2023 at 22:21 UTC ( [id://11152621]=note: print w/replies, xml ) Need Help??


in reply to Please review documentation of my AI::Embedding module

Typos: compatator; tyhe; chargable (should be chargeable [1]); "will random" => "will be random"; ACKNOWLEDGEMENTS section misses a trailing full stop.

Interface: returning the HTTP::Tiny response object on failure of various methods means you always need to call success (or error) to know how to look at the response. If I were using this, I'd rather it returned something more easily testable such as undef - the HTTP::Tiny response could be included in the error() return value, or be provided by a separate http_error method.

Interface: it seems strange to have the comparator be built in to the object. To me it would make more sense to have the comparator method return a subref (and remove it as a new option).

Text: "It requires the 'key' parameter." is not needed, the following sentence already says this (and it is reiterated with the individual parameter listing). Use of the word "homogeneous" is odd: I'm not sure in what way the string form is more "homogeneous" than the array form. The text generally seems to assume that the only use for this interface is to rank search matches, I'm not sure how appropriate that is. More generally I'm unsure about calling the class "Embedding": if I correctly understand what I've read, an instance of this class is not an embedding but an embedder - it is something capable of providing embeddings.

Hope this helps. :)

[1] I use the phrase "city centre; cat cut cot" to remind me that in almost all English words, a 'c' or 'g' is hard when followed by 'i' or 'e', but soft when followed by 'a', 'u' or 'o'. Thus "chargable" can't be right - it would have to be pronounced with a hard 'g', so it gets an extra 'e' inserted to make it soft. Similarly "manageable" and "enforceable"; but "reproducible" is fine.

Replies are listed 'Best First'.
Re^2: RFC - Documentation Review
by Bod (Parson) on Jun 02, 2023 at 23:05 UTC

    Thank you hv for your valued input.

    Typos: compatator; tyhe; chargable (should be chargeable 1); "will random" => "will be random"; ACKNOWLEDGEMENTS section misses a trailing full stop.

    All corrected. I use Grammarly for all the writing I do. But it doesn't work in my text editor so doesn't correct errors in POD. Perhaps I need to copy POD into something Grammarly does check before uploading it.

    returning the HTTP::Tiny response object on failure of various methods means...

    Good point!
    It was probably laziness on my part which could do with revisiting. It's on the ToDo List.

    it seems strange to have the comparator be built in to the object.

    I feel the term 'comparator' is unclear. But I cannot think of a better one!
    When the compare method is called with two parameters, there is some processing of both to convert them into hashrefs. If one is feeding the same parameter to compare repeatedly many times, this processing can add up. So the comparator method does the conversion just once and stores the hashref to be compared to the single parameter fed to compare.

    If you can suggest a better method name, that would be great.

    Use of the word "homogeneous" is odd

    I mean that one would not be interested in the discreet values of the array, only the array as a whole. Because the whole array needs to be stored as a whole and not as parts, it makes sense (to me at least) to have it as a "homogeneous" string of values. This is easy to store in a database.

    Hope this helps. :)

    Tremendously thank you :)

      I feel the term 'comparator' is unclear. But I cannot think of a better one!

      I have no problem with the name, only with the interface - it doesn't make sense (as far as I can see) to embed the comparator within the object. Apart from anything else that makes it harder to have multiple comparators, for no obvious benefit.

      I'm imagining a simple curry like:

      sub comparator { my($self, $embed) = @_; return sub { $self->compare($embed, @_); }; }

      .. and documentation like:

      comparator

      my $comparator = $embedding->comparator($csv_embedding1); ... my $comparison = $comparator->($csv_embedding2);

      Returns a subroutine reference that can be used for repeated compare calls for the given vector against different secondary vectors, that returns the same type of result as compare.

      Update: I forgot to include the extra work for comparator, it should probably look more like:

      sub comparator { my($self, $embed) = @_; my $vector1 = $self->_make_vector($embed); return sub { my($embed2) = @_; my $vector2 = $self->_make_vector($embed2); return $self->_compare_vector($vector1, $vector2); }; }

      .. where _compare_vector would be factored out of the last 9 lines of compare.

        I have no problem with the name, only with the interface...

        Sorry...I don't understand. Either that or I haven't properly explained what the comparator does.

        The comparator method doesn't do any work other than to set a value to be compared with the compare method. So these two are exactly equivalent:

        my $difference = $embedding->compare($embed2, $embed1); $embedding->comparator($embed1); my $difference = $embedding->compare($embed2);
        The only time it makes any sense to use the comparator method is when there are lots of values to compare to the same thing:
        $embedding->comparator($embed1); my $diff2 = $embedding->compare($embed2); # Compares $embed2 to $emb +ed1 my $diff3 = $embedding->compare($embed3); # Compares $embed3 to $emb +ed1 my $diff4 = $embedding->compare($embed4); # Compares $embed4 to $emb +ed1 my $diff5 = $embedding->compare($embed5); # Compares $embed5 to $emb +ed1 my $diff6 = $embedding->compare($embed6); # Compares $embed6 to $emb +ed1 my $diff7 = $embedding->compare($embed7); # Compares $embed7 to $emb +ed1 # ... etc ...

Homogeneous (was: Re^2: RFC - Documentation Review)
by Bod (Parson) on Jun 11, 2023 at 13:12 UTC
    Use of the word "homogeneous" is odd

    I've been doing some weekend reading on PyTorch - not because I want to use it but because I want a high-level understanding of what it does and what it is used for...

    In the Wikipedia article, they too use the word "homogeneous" to describe the multi-dimensional arrays.

    "PyTorch defines a class called Tensor (torch.Tensor) to store and operate on homogeneous multidimensional rectangular arrays of numbers."

      I'm not familiar with PyTorch so this is purely a guess.

      In the examples, all of the array elements are floating point numbers: that's probably what homogeneous means in this context.

      By contrast, arrays with a mix of floats, integers, strings, objects, and so on, would be heterogeneous.

      — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11152621]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-24 06:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found