US 12,424,211 B2
Method and device for compressing finite-state transducers data
Zhenxing Liang, Guangdong (CN)
Assigned to GUANGZHOU ZIIPIN NETWORK TECHNOLOGY CO., LTD, Guangdong (CN)
Appl. No. 17/782,152
Filed by GUANGZHOU ZIIPIN NETWORK TECHNOLOGY CO., LTD, Guangdong (CN)
PCT Filed Mar. 3, 2021, PCT No. PCT/CN2021/078808
§ 371(c)(1), (2) Date Jun. 2, 2022,
PCT Pub. No. WO2022/021876, PCT Pub. Date Feb. 3, 2022.
Claims priority of application No. 202010737012.8 (CN), filed on Jul. 28, 2020.
Prior Publication US 2023/0005474 A1, Jan. 5, 2023
Int. Cl. G10L 15/19 (2013.01); G10L 15/193 (2013.01); H03M 7/30 (2006.01)
CPC G10L 15/193 (2013.01) [H03M 7/30 (2013.01)] 9 Claims
OG exemplary drawing
 
1. A method for compressing finite-state transducer (FST) data to reduce memory usage in a computing device, comprising:
acquiring to-be-compressed FST data, wherein the FST data comprises state transition data and state data, and wherein the FST data is used in at least one of text retrieval, search engine, natural language processing, machine translation, speech recognition, signal processing and automated control;
decomposing the state transition data based on first data categories to acquire first decomposition data, comprising:
decomposing the state transition data based on data categories of signal label, weight and next state identifier, to acquire signal label decomposition data, weight decomposition data and next state identifier decomposition data;
after decomposing the state transition data based on the first data categories to acquire the first decomposition data, removing output signal label decomposition data from the signal label decomposition data in a case that information presented by the FST data is suitable to be presented by finite-state automaton (FSA) data; and removing the weight decomposition data in a case that the information presented by the FST data is suitable to be presented by Trie data;
decomposing the state data based on second data categories to acquire second decomposition data;
sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category;
alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data;
performing classification statistics on the first arrangement data and the second arrangement data to acquire index data; and
combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data, wherein the compressed FST data is stored in a memory of the computing device and reduces memory resource consumption during the at least one of text retrieval, search engine, natural language processing, machine translation, speech recognition, signal processing and automated control.