Using AI to identify cybercrime masterminds - Source: news.sophos.com

Analyzing dark web forums to identify key experts on e-crime

Online criminal forums, both on the public internet and on the “dark web” of Tor .onion sites, are a rich resource for threat intelligence researchers. The Sophos Counter Threat Unit (CTU) have a team of darkweb researchers collecting intelligence and interacting with darkweb forums, but combing through these posts is a time-consuming and resource-intensive task, and it’s always possible that things are missed.

As we strive to make better use of AI and data analysis, Sophos AI researcher Francois Labreche, working with Estelle Ruellan of Flare and the Université de Montréal and Masarah Paquet-Clouston of the Université de Montréal, set out to see if they could approach the problem of identifying key actors on the dark web in a more automated way. Their work, originally presented at the 2024 APWG Symposium on Electronic Crime Research, has recently been published as a paper.

The approach

The research team combined a modification of a framework developed by criminologists Martin Bouchard and Holly Nguyen to separate professional criminals from amateurs in an analysis of the criminal cannabis industry with social-network analysis. With this, they were able to connect accounts posting in forums to exploits of recent Common Vulnerabilities and Exposures (CVEs), either based upon the naming of the CVE or by matching the post to the CVEs’ corresponding Common Attack Pattern Enumerations and Classifications (CAPECs) defined by MITRE.

Using the Flare threat research search engine, they gathered 11,558 posts by 4,441 individuals from between January 2015 and July 2023 on 124 different e-crime forums. The posts mentioned 6,232 different CVEs. The researchers used the data to create a bimodal social network that connected CAPECs to individual actors based on the contents of the actors’ posts. In this initial stage, they focused the dataset down to eliminate, for instance, CVEs that have no assigned CAPECs, and overly general attack methods that many threat actors use (and the posters who only discussed those general-purpose CVEs). Filtering such as this ultimately whittled the dataset down to 2,321 actors and 263 CAPECs.

The research team then used the Leiden community detection algorithm to cluster the actors into communities (“Communities of Interest”) with a shared interest in particular attack patterns. At this stage, eight communities stood out as relatively distinct. On average, individual actors were connected to 13 different CAPECs, while CAPECs were linked with 118 actors.

Figure 1: Bimodal actor-CAPEC networks, colored according to Communities of Interest; the CAPECs are shown in red for clarity

Pinpointing the key actors

Next, key actors were identified based on the expertise they exhibited in each community. Three factors were used to measure level of expertise:

1) Skill Level: This was based on the measurement of skill required to use a CAPEC, as assessed by MITRE: ‘Low,’ ‘Medium,’ or ‘High,’ using the highest skill level among all the scenarios related to the attack pattern, to prevent underestimating actors’ skills. This was done for every CAPEC associated with the actor. To establish a representative skill level, the researchers used the 70th percentile value from each actor’s list of CAPECs and their associated skill levels. (For example, if John Doe discussed 8 CVEs that MITRE maps to 10 CAPECs – 5 rated High by MITRE, 4 rated Medium, and one rated Low – his representative skill level would be considered High.) Choosing this percentile value ensured that only actors with over 30 percent of their values equivalent to “High” would be classified as actually highly skilled.

OVERALL DISTRIBUTION OF SKILL LEVEL VALUES

Skill Level Value	CAPECs	% of Skill Level Values among all values in actors’ list
Low	118 (44.87%)	57.71%
Medium	66 (25.09%)	24.14%
High	79 (30.04%)	18.14%

SKILL LEVEL VALUES PROPORTION STATISTICS

Skill Level Value	Average proportion of members in the list of actors	Median	75th percentile	Std
High	29.07%	23.08%	50.00%	30.76%
Medium	36.12%	30.77%	50.00%	32.41%
Low	33.74%	33.33%	66.66%	31.72%

Figure 2: A breakdown of the skill-level assessments of the actors analyzed in the research

2) Commitment Level: This was quantified by the proportion of ‘in-interest’ posts (posts relating to a set of related CAPECs based on similar Communities of Interest) relative to an actor’s total posts. Actors who had three or fewer posts were disregarded, reducing the set to be evaluated to 359 actors.

3) Activity Rate: The researchers added this element to the Bouchard/Nguyen framework to quantify each actor’s activity level in forums. It was measured by dividing the number of posts with a CVE and corresponding CAPEC by the number of days of the actor’s activity on the relevant forums. Activity rate actually turns out to be inverse to the skill level at which threat actors operate. More highly skilled actors have been on the forums for a long time, so their relative activity rate is much lower, despite having significant numbers of posts.

DESCRIPTIVE STATISTICS OF SAMPLE

	Mean	Std	Min	Median	75th percentile	Max
Length of Skill Level values list	99.42	255.76	4	25	85	3449
Skill Level (70th percentile value)	2.19	0.64	1	2	3	3
Number of posts (CVE with CAPEC)	14.55	31.37	4	6	10	375
% commitment	36.68	29.61	0	25	50	100
Activity time (days)	449.07	545.02	1	227.00	690.00	2669.00
Activity rate	0.72	1.90	0.002	0.04	0.20	14.00

Figure 3: A breakdown of the skill, commitment, and activity rate scores for the sample group

As shown above, the sample for the identification of key actors consisted of 359 actors. The average actor had 36.68% of posts committed to their Community of Interest and had a skill level of 2.19 (‘Medium’). The average activity rate was 0.72.

COMMUNITIES OF INTEREST (COI) OVERVIEW

Community	Community of Interest	Nodes	CAPEC	Actors	% one timers	Mean out-degree per actor	Std (out-degree)	Mean number of specialized posts	Std (posts)
0	Privilege escalation	544	19	525	65.14	4	7.11	2	4.76
1	Web-based	497	26	471	71.97	5	12.98	3	18.33
2	General / Diverse	431	103	328	56.10	14	33.15	7	24.89
3	XSS	319	10	309	71.52	2	1.18	1	1.46
4	Recon	298	55	243	51.44	61	9.04	3	6.99
5	Impersonation	296	25	271	54.61	12	7.88	3	5.49
6	Persistence	116	22	94	41.49	26	25.76	5	7.96
7	OIVMM	83	3	80	85.00	1	0.31	1	1.62

Figure 4. The relative scores of actors grouped into each Community of Interest

14 needles in a haystack
Finally, to identify the truly key actors — those with high enough skill level and commitment and activity rate to identify them as experts in their domains — the researchers used the K-means clustering algorithm. Using the three measurements created for each actor’s relationship with CAPECs, the 359 actors were clustered into eight clusters with similar levels of all three measurements.

OVERVIEW OF CLUSTERS

Cluster	Bouchard & Nguyen framework *	Centroid [Skill; Commitment; Activity]	Number of actors	% of sample population
0	Amateurs	[2.00; 22.47; 0.11] [Mid; Low; Discrete]	143	39.83
1	Pro-Amateurs	[2.81; 97.62; 5.14] [High; High; Short-lived]	21	5.85
2	Professionals	[2.96; 90.37; 0.28] [High; High; Active]	14	3.90
3	Pro-Amateurs	[2.96; 25.32; 0.12] [High; Low; Discrete]	86	23.96
4	Amateurs	[1.05; 24.32; 0.05] [Low; Low; Discrete]	43	11.98
5	Average Career Criminals	[1.86; 84.81; 0.50] [Low; High; Active]	36	10.02
6	Pro-Amateurs	[2.38; 18.46; 10.67] [Mid; Low; Hyperactive]	5	1.39
7	Amateurs	[1.95; 24.51; 4.14] [Mid; Low; Hyperactive]	11	3.06

Figure 5: An analysis of the eight clusters with scoring based on the methodology from the framework developed from the work of criminologists Martin Bouchard and Holly Nguyen; as described above, activity rate was added as a modification to that framework. Note the low number of truly professional actors, even among the dataset of 359

One cluster of 14 actors was graded as “Professionals” — key individuals; the best in their field; with high skill and commitment and low activity rate, again because of the length of their involvement with the forums (an average of 159 days) and a post rate that averaged about one post every 3-4 days. They focused on very specific communities of interest and did not post much beyond them, with a commitment level of 90.37%. There are inherent limitations to the analysis approach in this research— primarily because of the reliance on MITRE’s CAPEC and CVE mapping and the skill levels assigned by MITRE.

Conclusion

The research process includes defining problems and seeing how various structured approaches might lead to greater insight. Derivatives of the approach described in this research could be used by threat intelligence teams to develop a less biased approach to identifying e-crime masterminds, and Sophos CTU will now start looking at the outputs of this data to see if it can shape or improve our existing human-led research in this area.

Original Post URL: https://news.sophos.com/en-us/2025/06/30/using-ai-to-identify-cybercrime-masterminds/

Category & Tags: AI Research,Threat Research,AI,cybercrime,Dark Web,featured,threat activity cluster,threat actors – AI Research,Threat Research,AI,cybercrime,Dark Web,featured,threat activity cluster,threat actors