| CPC G06F 40/263 (2020.01) [G06F 16/116 (2019.01); G06F 16/148 (2019.01)] | 19 Claims |

|
1. A tangible, non-transitory, computer-readable medium, comprising computer-readable instructions that, when executed by one or more processors of an electronic device, cause the electronic device to:
prior to determining an encoding scheme of a file, identify a language of the file, wherein the file is encoded in a language-specific encoding scheme, by:
identifying, for each potential language of the file, a language score of one or more bit sequences of the file using a term frequency (TF) indicating a frequency of the one or more bit sequences within a training document of the potential language and an inverse document frequency (IDF) based upon a frequency of the one or more bit sequences in a collection of training documents of a plurality of the potential languages; and
selecting a language associated with a highest one of the language scores as the language of the file;
associate the language of the file with the file; and
decode the file using a decoding scheme that corresponds to the language associated with the file.
|