| CPC G06F 7/20 (2013.01) [G06F 16/2455 (2019.01); G06F 16/3347 (2019.01); G06F 40/205 (2020.01); G06F 40/279 (2020.01)] | 13 Claims |

|
1. A computer-implemented method for identifying a level of similarity between a first data item and a data item within a set of data documents, the method comprising:
clustering, by a reference map generator executing on a first computing device, in a two-dimensional metric space, a set of data documents selected according to at least one criterion and associated with a medical diagnosis, generating a semantic map;
associating, by the semantic map, a coordinate pair with each of the set of data documents;
generating, by a parser executing on the first computing device, an enumeration of terms occurring in the set of data documents;
determining, by a representation generator executing on the first computing device, for each term in the enumeration, occurrence information including: (i) a number of data documents in which the term occurs, (ii) a number of occurrences of the term in each data document, and (iii) the coordinate pair associated with each data document in which the term occurs;
generating, by the representation generator, for each term in the enumeration, a sparse distributed representation (SDR) using the occurrence information, resulting in a plurality of generated SDRs;
storing, by a processor on a second computing device, in each of a plurality of memory cells on the second computing device, one of the plurality of generated SDRs, each of the plurality of memory cells including a bitwise comparison circuit;
receiving, by a filtering module executing on a second computing device and in communication with the first computing device, from a third computing device, a filtering criterion;
generating, by the representation generator, at least one SDR for the filtering criterion, wherein generating further comprises:
determining, by the representation generator, that the filtering criterion is not an SDR stored in the SDR database; and
generating, by the representation generator, the at least one SDR for the filtering criterion, based upon the determining that the filtering criterion is not an SDR;
storing, by the representation generator, the at least one SDR for the filtering criterion in at least one of the plurality of memory cells;
receiving, by the filtering module, a first plurality of streamed documents from a third-party data source specified by a user of the third computing device;
generating, by the representation generator, for a first document in the first plurality of streamed documents, a compound SDR for the first document, before receiving, by the filtering module, a second document in the first plurality of streamed documents;
providing, by the processor on the second computing device, via a data bus, to each of the plurality of memory cells, the compound SDR;
determining, by each of the plurality of bitwise comparison circuits, a level of overlap between the compound SDR and the generated SDR stored in the memory cell associated with the bitwise comparison circuit;
determining, by each of the plurality of bitwise comparison circuits, whether the level of overlap satisfies a threshold provided by the processor; and
acting, by the filtering module, on the first document based upon determination by at least one of the plurality of bitwise comparison circuits, that the level of overlap satisfies the threshold, wherein acting further comprises:
adding the first document in the first plurality of streamed documents to a sub-stream of streamed documents stored in a database accessible by a client agent executing on the third computing device; and
responding to a polling request from the client agent by transmitting the sub-stream to the client agent.
|