Source: news.sophos.com – Author: gallagherseanm
Analyzing dark web forums to identify key experts on e-crime
Online criminal forums, both on the public internet and on the “dark web” of Tor .onion sites, are a rich resource for threat intelligence researchers. The Sophos Counter Threat Unit (CTU) have a team of darkweb researchers collecting intelligence and interacting with darkweb forums, but combing through these posts is a time-consuming and resource-intensive task, and it’s always possible that things are missed.
As we strive to make better use of AI and data analysis, Sophos AI researcher Francois Labreche, working with Estelle Ruellan of Flare and the Université de Montréal and Masarah Paquet-Clouston of the Université de Montréal, set out to see if they could approach the problem of identifying key actors on the dark web in a more automated way. Their work, originally presented at the 2024 APWG Symposium on Electronic Crime Research, has recently been published as a paper.
The approach
The research team combined a modification of a framework developed by criminologists Martin Bouchard and Holly Nguyen to separate professional criminals from amateurs in an analysis of the criminal cannabis industry with social-network analysis. With this, they were able to connect accounts posting in forums to exploits of recent Common Vulnerabilities and Exposures (CVEs), either based upon the naming of the CVE or by matching the post to the CVEs’ corresponding Common Attack Pattern Enumerations and Classifications (CAPECs) defined by MITRE.
Using the Flare threat research search engine, they gathered 11,558 posts by 4,441 individuals from between January 2015 and July 2023 on 124 different e-crime forums. The posts mentioned 6,232 different CVEs. The researchers used the data to create a bimodal social network that connected CAPECs to individual actors based on the contents of the actors’ posts. In this initial stage, they focused the dataset down to eliminate, for instance, CVEs that have no assigned CAPECs, and overly general attack methods that many threat actors use (and the posters who only discussed those general-purpose CVEs). Filtering such as this ultimately whittled the dataset down to 2,321 actors and 263 CAPECs.
The research team then used the Leiden community detection algorithm to cluster the actors into communities (“Communities of Interest”) with a shared interest in particular attack patterns. At this stage, eight communities stood out as relatively distinct. On average, individual actors were connected to 13 different CAPECs, while CAPECs were linked with 118 actors.
Figure 1: Bimodal actor-CAPEC networks, colored according to Communities of Interest; the CAPECs are shown in red for clarity
Pinpointing the key actors
Next, key actors were identified based on the expertise they exhibited in each community. Three factors were used to measure level of expertise:
1) Skill Level: This was based on the measurement of skill required to use a CAPEC, as assessed by MITRE: ‘Low,’ ‘Medium,’ or ‘High,’ using the highest skill level among all the scenarios related to the attack pattern, to prevent underestimating actors’ skills. This was done for every CAPEC associated with the actor. To establish a representative skill level, the researchers used the 70th percentile value from each actor’s list of CAPECs and their associated skill levels. (For example, if John Doe discussed 8 CVEs that MITRE maps to 10 CAPECs – 5 rated High by MITRE, 4 rated Medium, and one rated Low – his representative skill level would be considered High.) Choosing this percentile value ensured that only actors with over 30 percent of their values equivalent to “High” would be classified as actually highly skilled.
OVERALL DISTRIBUTION OF SKILL LEVEL VALUES
Skill Level Value | CAPECs | % of Skill Level Values among all values in actors’ list |
Low | 118 (44.87%) | 57.71% |
Medium | 66 (25.09%) | 24.14% |
High | 79 (30.04%) | 18.14% |
SKILL LEVEL VALUES PROPORTION STATISTICS
Skill Level Value | Average proportion of members in the list of actors |
Median | 75th percentile | Std |
High | 29.07% | 23.08% | 50.00% | 30.76% |
Medium | 36.12% | 30.77% | 50.00% | 32.41% |
Low | 33.74% | 33.33% | 66.66% | 31.72% |
Figure 2: A breakdown of the skill-level assessments of the actors analyzed in the research
2) Commitment Level: This was quantified by the proportion of ‘in-interest’ posts (posts relating to a set of related CAPECs based on similar Communities of Interest) relative to an actor’s total posts. Actors who had three or fewer posts were disregarded, reducing the set to be evaluated to 359 actors.
3) Activity Rate: The researchers added this element to the Bouchard/Nguyen framework to quantify each actor’s activity level in forums. It was measured by dividing the number of posts with a CVE and corresponding CAPEC by the number of days of the actor’s activity on the relevant forums. Activity rate actually turns out to be inverse to the skill level at which threat actors operate. More highly skilled actors have been on the forums for a long time, so their relative activity rate is much lower, despite having significant numbers of posts.
DESCRIPTIVE STATISTICS OF SAMPLE
Mean | Std | Min | Median | 75th percentile | Max | |
Length of Skill Level values list | 99.42 | 255.76 | 4 | 25 | 85 | 3449 |
Skill Level (70th percentile value) | 2.19 | 0.64 | 1 | 2 | 3 | 3 |
Number of posts (CVE with CAPEC) | 14.55 | 31.37 | 4 | 6 | 10 | 375 |
% commitment | 36.68 | 29.61 | 0 | 25 | 50 | 100 |
Activity time (days) | 449.07 | 545.02 | 1 | 227.00 | 690.00 | 2669.00 |
Activity rate | 0.72 | 1.90 | 0.002 | 0.04 | 0.20 | 14.00 |
Figure 3: A breakdown of the skill, commitment, and activity rate scores for the sample group
As shown above, the sample for the identification of key actors consisted of 359 actors. The average actor had 36.68% of posts committed to their Community of Interest and had a skill level of 2.19 (‘Medium’). The average activity rate was 0.72.
COMMUNITIES OF INTEREST (COI) OVERVIEW
Community | Community
of Interest |
Nodes | CAPEC | Actors | % one timers | Mean out-degree per actor | Std (out-degree) | Mean number of specialized posts | Std (posts) |
0 | Privilege escalation |
544 | 19 | 525 | 65.14 | 4 | 7.11 | 2 | 4.76 |
1 | Web-based | 497 | 26 | 471 | 71.97 | 5 | 12.98 | 3 | 18.33 |
2 | General / Diverse | 431 | 103 | 328 | 56.10 | 14 | 33.15 | 7 | 24.89 |
3 | XSS | 319 | 10 | 309 | 71.52 | 2 | 1.18 | 1 | 1.46 |
4 | Recon | 298 | 55 | 243 | 51.44 | 61 | 9.04 | 3 | 6.99 |
5 | Impersonation | 296 | 25 | 271 | 54.61 | 12 | 7.88 | 3 | 5.49 |
6 | Persistence | 116 | 22 | 94 | 41.49 | 26 | 25.76 | 5 | 7.96 |
7 | OIVMM | 83 | 3 | 80 | 85.00 | 1 | 0.31 | 1 | 1.62 |
Figure 4. The relative scores of actors grouped into each Community of Interest
14 needles in a haystack
Finally, to identify the truly key actors — those with high enough skill level and commitment and activity rate to identify them as experts in their domains — the researchers used the K-means clustering algorithm. Using the three measurements created for each actor’s relationship with CAPECs, the 359 actors were clustered into eight clusters with similar levels of all three measurements.
OVERVIEW OF CLUSTERS
Cluster |
Bouchard & Nguyen framework * |
Centroid [Skill; Commitment; Activity] |
Number
|
% of sample population |
0 | Amateurs | [2.00; 22.47; 0.11] [Mid; Low; Discrete] | 143 | 39.83 |
1 | Pro-Amateurs | [2.81; 97.62; 5.14] [High; High; Short-lived] | 21 | 5.85 |
2 | Professionals | [2.96; 90.37; 0.28] [High; High; Active] | 14 | 3.90 |
3 | Pro-Amateurs | [2.96; 25.32; 0.12] [High; Low; Discrete] | 86 | 23.96 |
4 | Amateurs | [1.05; 24.32; 0.05] [Low; Low; Discrete] | 43 | 11.98 |
5 | Average Career Criminals | [1.86; 84.81; 0.50] [Low; High; Active] | 36 | 10.02 |
6 | Pro-Amateurs | [2.38; 18.46; 10.67] [Mid; Low; Hyperactive] | 5 | 1.39 |
7 | Amateurs | [1.95; 24.51; 4.14] [Mid; Low; Hyperactive] | 11 | 3.06 |
Figure 5: An analysis of the eight clusters with scoring based on the methodology from the framework developed from the work of criminologists Martin Bouchard and Holly Nguyen; as described above, activity rate was added as a modification to that framework. Note the low number of truly professional actors, even among the dataset of 359
One cluster of 14 actors was graded as “Professionals” — key individuals; the best in their field; with high skill and commitment and low activity rate, again because of the length of their involvement with the forums (an average of 159 days) and a post rate that averaged about one post every 3-4 days. They focused on very specific communities of interest and did not post much beyond them, with a commitment level of 90.37%. There are inherent limitations to the analysis approach in this research— primarily because of the reliance on MITRE’s CAPEC and CVE mapping and the skill levels assigned by MITRE.
Conclusion
The research process includes defining problems and seeing how various structured approaches might lead to greater insight. Derivatives of the approach described in this research could be used by threat intelligence teams to develop a less biased approach to identifying e-crime masterminds, and Sophos CTU will now start looking at the outputs of this data to see if it can shape or improve our existing human-led research in this area.
Original Post URL: https://news.sophos.com/en-us/2025/06/30/using-ai-to-identify-cybercrime-masterminds/
Category & Tags: AI Research,Threat Research,AI,cybercrime,Dark Web,featured,threat activity cluster,threat actors – AI Research,Threat Research,AI,cybercrime,Dark Web,featured,threat activity cluster,threat actors
Views: 0