US 12,093,310 B2
Multi-modal image search
Flora Ponjou Tasse, London (GB); and Ghislain Fouodji Tasse, London (GB)
Assigned to STREEM, LLC, Portland, OR (US)
Appl. No. 16/492,062
Filed by SELERIO LIMITED, London (GB)
PCT Filed Mar. 7, 2018, PCT No. PCT/GB2018/050574
§ 371(c)(1), (2) Date Sep. 6, 2019,
PCT Pub. No. WO2018/162896, PCT Pub. Date Sep. 13, 2018.
Claims priority of application No. 1703602 (GB), filed on Mar. 7, 2017.
Prior Publication US 2020/0104318 A1, Apr. 2, 2020
Int. Cl. G06F 16/58 (2019.01); G06F 16/53 (2019.01)
CPC G06F 16/5866 (2019.01) [G06F 16/53 (2019.01)] 16 Claims
OG exemplary drawing
 
1. A method for combining image data, one or more 3D shapes, and tag data into a unified representation, comprising the steps of:
determining a vector representation for the image data, the vector representation for the image data found within in a fixed vector space of words;
determining a vector representation for the one or more 3D shapes, the vector representation for the one or more 3D shapes found within the vector space of words, by:
computing rendered views for each of the one or more 3D shapes from multiple viewpoints,
computing a descriptor for each view, and
averaging the descriptors for each view by computing a weighted average from the rendered views, the weight of each rendered view determined by a proportion of information of the 3D shape captured by the rendered view;
determining a vector representation for the tag data, the vector representation for the tag data found within the vector space of words; and
combining the vector representations into a unified representation by:
embedding the vector representation for the image data, the one or more 3D shapes, and the vector representation for the tag data into the vector space of words, and
determining linear combinations of the vector representations within the vector space of words to compute the unified representation, wherein image data and tag data are weighted to specify their respective influence.