in reply to extract ids
perl -lne 'print for /molecule_idref="([^"]+)/g' xmlfile [download]
I've used 'g' modifier to catch ids in a case they occur more than one on a line.