US 12,461,742 B2
Secure code clustering through LLM-based semantic analysis
Damian Monea, Slatina (RO); Paul Sumedrea, Bucharest (RO); Mihaela-Petruta Gaman, Bucharest (RO); and Alexandru Dinu, Bucharest (RO)
Assigned to CrowdStrike, Inc., Sunnyvale, CA (US)
Filed by CrowdStrike, Inc., Sunnyvale, CA (US)
Filed on Oct. 27, 2023, as Appl. No. 18/496,722.
Prior Publication US 2025/0138819 A1, May 1, 2025
Int. Cl. G06F 9/50 (2006.01); G06F 8/40 (2018.01); G06F 8/75 (2018.01); G06F 9/445 (2018.01); G06F 9/455 (2018.01); G06F 16/2457 (2019.01); G06F 21/56 (2013.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01)
CPC G06F 8/75 (2013.01) [G06F 21/563 (2013.01); G06F 40/30 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
providing a plurality of source code samples to an artificial intelligence model (AIM) trained to describe source code based on performing semantic analysis on the source code;
producing, by a processing device using the AIM, a plurality of semantic descriptions that describe the plurality of source code samples;
converting the plurality of semantic descriptions into a plurality of semantic embeddings; and
creating a plurality of clusters from the plurality of semantic embeddings, wherein each one of the plurality of clusters corresponds to two or more of the plurality of source code samples.