Just my 5c - I've programmed X11 clipboards using raw Xlib, and I remember that the idea (back then) was that a selection is defined in terms of a program that owns the selection, not text in the clipboard. Rather, the selection usually (i.e. for many of X11 clients, again back then) actualizes as a text chunk only after middle click was pressed. From that point of view, 1 and 3 are ruled out, because by the time middle click is pressed, text extraction procedure would take the current selection, "def", not the one before (technically speaking X11 client can ask for an older selection, but I think no program implements that).
Personally, I find #2 the most logical but least useful, and #4 the most annoying, because it takes into account text cursor position, not mouse position. I think that #3 is the best way to go.