US 12,333,543 B2
	Voice-based payment system
Gang Liu, Seattle, WA (US); Qingen Zhao, Hangzhou (CN); and Guangxing Liu, Hangzhou (CN)
Assigned to ALIBABA GROUP HOLDING LIMITED, Grand Cayman (KY)
Filed by ALIBABA GROUP HOLDING LIMITED, Grand Cayman (KY)
Filed on Dec. 20, 2022, as Appl. No. 18/085,334.
Application 18/085,334 is a continuation of application No. 16/008,765, filed on Jun. 14, 2018, granted, now 11,551,219.
Claims priority of application No. 201710457967.6 (CN), filed on Jun. 16, 2017.
Prior Publication US 2023/0127314 A1, Apr. 27, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06Q 20/40 (2012.01); G06F 18/213 (2023.01); G06F 18/22 (2023.01); G06Q 20/10 (2012.01); G06Q 20/42 (2012.01); G06V 40/10 (2022.01); G10L 15/08 (2006.01); G10L 15/22 (2006.01); G10L 17/00 (2013.01); G10L 17/02 (2013.01); G10L 17/22 (2013.01); G10L 17/24 (2013.01)

CPC G06Q 20/4014 (2013.01) [G06F 18/213 (2023.01); G06F 18/22 (2023.01); G06Q 20/102 (2013.01); G06Q 20/40145 (2013.01); G06Q 20/42 (2013.01); G06V 40/10 (2022.01); G10L 15/22 (2013.01); G10L 17/00 (2013.01); G10L 17/22 (2013.01); G10L 17/24 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01); G10L 17/02 (2013.01)]

17 Claims

1. A computer-implemented method, comprising:

receiving, by a device, a spoken payment instruction from a user for purchasing a product through a voice user interface of the device at a time point;

determining, by the device, product information of the product based on audio being played, wherein the determining comprises:

capturing voice input from an audio source being played on a media channel at the time point when the spoken payment instruction is received;

determining media channel information based on the voice input;

obtaining a program list of the media channel based on the media channel information; and

determining the product information based on the time point of the received spoken payment instruction and the program list;

generating audio information according to the spoken payment instruction;

generating, according to the audio information, a feature matrix comprising at least one feature of audio data in the audio information, the at least one feature of the audio data comprising at least one of frequency data or amplitude data;

inputting the feature matrix and multiple feature dimensions for a voice feature vector of the audio information into a neural network;

obtaining, from the neural network, multiple dimension values representing the multiple feature dimensions based on the feature matrix;

generating the voice feature vector of the audio information based on the multiple dimension values;

authenticating an identity of the user by performing matching between the voice feature vector and a pre-stored user feature vector; and

sending the product information and personal information associated with the user feature vector to a server that performs a payment operation for the product information.