US 12,361,202 B2
Method and system of generating an editable document from a non-editable document
Gaurav Tendolkar, San Jose, CA (US); Akshay Mallipeddi, Cupertino, CA (US); Gongjie Qi, Sunnyvale, CA (US); Sumithra Bhakthavatsalam, Kirkland, WA (US); and Tapan Bohra, Sunnyvale, CA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Nov. 18, 2022, as Appl. No. 17/990,419.
Prior Publication US 2024/0169143 A1, May 23, 2024
Int. Cl. G06F 40/166 (2020.01); G06F 40/103 (2020.01); G06F 40/106 (2020.01); G06F 40/109 (2020.01); G06V 30/412 (2022.01)
CPC G06F 40/166 (2020.01) [G06F 40/103 (2020.01); G06F 40/106 (2020.01); G06F 40/109 (2020.01); G06V 30/412 (2022.01)] 19 Claims
OG exemplary drawing
 
1. A data processing system comprising:
a processor; and
a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the data processing system to perform functions of:
accessing a non-editable document, the non-editable document including a plurality of objects;
automatically identifying a layout for one or more of the plurality of objects;
determining that the plurality of objects include a text object via a machine-learning model;
upon determining that the plurality of objects includes the text object, identifying a font for the text object via a trained font recommendation machine-learning model having an offline phase and an online phase;
selecting a color scheme for one or more of the plurality of objects, the color scheme corresponding to one or more color values associated with the plurality of objects;
automatically generating an editable document in accordance with at least one of the identified layout, identified font and selected color scheme;
receiving a request to resize the editable document to a desired aspect ratio;
identifying a plurality of style invariants between each two objects of the plurality of objects in the editable document to provide an identified plurality of style invariants, each style invariant identifying a positional relationship between two objects;
creating a list of style invariants for the plurality of objects based on the identified plurality of style invariants;
resizing a canvas of the editable document to the desired aspect ratio;
placing each object of the plurality of objects in the resized canvas in accordance with an associated style invariant from the list of style invariants;
cropping image objects to their bounding boxes; and
correcting overlaps by changing text size of the text object to prevent the overlaps of the text objects placed in the resized canvas;
wherein the offline phase of the trained font recommendation machine-learning model comprises a font library, an image generating engine to generate images of fonts in the font library, a pretrained vision model to produce font embeddings from the images of fonts and a similarity generation engine to generate a similarity matrix from the font embeddings; and
wherein the online phase of the trained font recommendation machine-learning model comprises the pretrained vision model to receive an input text image of the text object and output a font embedding for the input text image, and a matching unit to output the font for the text object based on the similarity matrix and the font embedding for the input text image.