US 12,283,284 B2
	Method and system for real-time and low latency synthesis of audio using neural networks and differentiable digital signal processors
Lamtharn Hantrakul, Singapore (SG); David Trevelyan, London (GB); Haonan Chen, Beijing (CN); Matthew David Avent, London (GB); and Janne Jayne Harm Renée Spijkervet, London (GB)
Assigned to LEMON INC., Grand Cayman (KY)
Filed by Lemon Inc., Cayman Islands (KY)
Filed on May 19, 2022, as Appl. No. 17/748,882.
Prior Publication US 2023/0377591 A1, Nov. 23, 2023
Int. Cl. G10L 19/00 (2013.01); G10L 21/0264 (2013.01); G10L 21/04 (2013.01); G10L 25/18 (2013.01); G10L 25/90 (2013.01)

CPC G10L 21/0264 (2013.01) [G10L 21/04 (2013.01); G10L 25/18 (2013.01); G10L 19/00 (2013.01); G10L 25/90 (2013.01)]

17 Claims

1. A method of audio processing comprising:

generating a frame by sampling audio input in increments, which are based on a first buffer size associated with an input/output buffer of a host device, until a threshold buffer size, corresponding to a frame size used to train a machine learning model, is reached, wherein the first buffer size does not match the threshold buffer size;

extracting, from the frame, amplitude information, pitch information, and pitch status information;

determining, by the machine learning model, control information for audio reproduction based on the amplitude information, the pitch information, and the pitch status information, the control information including pitch control information and noise magnitude control information;

generating filtered noise information by inverting the noise magnitude control information using an overlap and add technique, including:

receiving the noise magnitude control information according to the frame size from the machine learning model;

rendering the filtered noise information in a block size not equal to the frame size;

writing, via the overlap and add technique, the filtered noise information to a circular buffer; and

reading, in the first buffer size, the filtered noise information from the circular buffer;

generating, based on the pitch control information, additive harmonic information by combining a plurality of scaled wavetables; and

rendering audio output based on the filtered noise information and the additive harmonic information.