02 Mar 2026

Scraping the Darknet

Structure and Defenses of Underground Forums

CONTENTS

We examine underground forums where stolen data is traded and network access is sold: their access model, commerce mechanisms, anti-scraping defenses, and what it takes to collect data from them at scale.

These underground forums attract initial access brokers, ransomware affiliates, fraud operators, and data brokers. Most run on popular forum software (such as MyBB, XenForo, Invision Community), and that shared foundation means a single extraction approach can cover many targets with minimal adaptation. What makes them different is the architecture layered on top.


Surface Level

At first glance, the common structure of an underground forum looks familiar: categorized sections, thread listings, user profiles. The sections are commonly gated by account tiers: public (unauthenticated), authenticated, premium (paid), and manually vetted.

Some underground forums allow guest browsing with no account required, so the public layer is visible to anyone with a Tor browser. But even within visible threads, individual posts may hide their payload behind a credit cost, a reply requirement, or an access level. Sellers and brokers advertise in the open and mostly transact behind the paywall. What any given tier exposes varies by site: some have substantial public content, while others gate nearly everything.


Marketplace and Actors

FIG 1. MARKETPLACE & ACTORS

Most underground forums have no formal marketplace (no cart, no order system). Actors sell through threads and private messages, posting samples (record counts, a handful of rows, redacted excerpts) as proof, and gating the full dump behind credits or private negotiation.

Capturing these is time-sensitive. Listings can be edited or deleted without notice. Some platforms go further with a database index where users can browse available dumps and spend credits to unlock them directly.

Usernames can be versioned, making tracking difficult. A username change we missed means a gap in the attribution chain. Some underground forums allow abandoned handles to be claimed by other accounts, which compounds the problem. The same username may represent different actors a month apart.

Beyond threads and posts, which are mostly static, these platforms also have an ephemeral layer: chatbox messages that commonly expire within 24 hours. This is one of the narrowest capture windows in the system.


Defense Mechanisms

The underlying engine is common forum software. What sits on top of it is hardened.

FIG 2. DEFENSE STACK

Although some underground forums operate on the clear net, many run as Tor hidden services where domains shift across multiple .onion mirrors that rotate, go down, or are chosen by user polls. These need to be tracked and treated as a single source. New mirrors have been observed masquerading as seized or defunct ones. Data sourced from mirrors must be treated as mutable, so it can be corrected during analysis.

Beyond technical defences, access itself is gated. Some underground forums require vouching: an existing member stakes their reputation on the newcomer’s behavior. If the newcomer burns trust, the voucher takes the hit. Others gate sections behind minimum post counts, forcing new accounts to contribute before reaching anything valuable. Premium tiers are paid in cryptocurrency. Staff also manually cull accounts that exhibit suspicious patterns (automated browsing behavior, dormant registrations, or anomalous request volumes).

Some users implement their own countermeasures. One tool we observed uses homoglyph obfuscation (replacing Latin characters with visually identical Cyrillic ones). To a human reader the text looks normal. To a scraper or keyword search, it’s unusable. A technique borrowed from phishing, repurposed as a direct counter to the knowledge that they are being scraped.

FIG 3. HOMOGLYPH OBFUSCATION

What Collection Requires

FIG 4. COLLECTION INFRASTRUCTURE

Collecting at scale requires:

  • Account management across access tiers, each with a tailored persona (specific timezone, browsing pattern, activity profile).
  • Session handling and cookie persistence, scoped per account and Tor circuit. Each session cookie must stay bound to the Tor circuit and exit IP it was issued on. Mixing cookies across Tor circuits or accounts triggers recaptcha events.
  • Proxy orchestration: Tor circuit pinning to maintain session continuity. Request patterns must appear organic (varying intervals, non-sequential page access) because underground forum operators can and do inspect their logs.
  • Scheduling and throttling: aggressive on public content, conservative on premium accounts. Some underground forums have a large volume of public content worth collecting.
  • Text normalization: homoglyph detection, leetspeak decoding, multilingual content.
  • Timestamped storage for everything we capture.

Every record is versioned. Mutations between collection passes are tracked as events, not overwritten. The history of changes is itself data.


Conclusion

Underground forum operators know they’re being scraped, and they adapt accordingly. Defenses gain another layer. New interstitials appear. Tor-circuit binding tightens. More log-checking and account culling.

FIG 5. ESCALATION CYCLE

The asymmetry between attacker and defender (where one side needs a single gap while the other must cover every surface) is usually cited in favor of threat actors targeting orgs. In the collection domain, that asymmetry inverts. The underground forum must defend every vector. The collector only needs one that works.

But gaining access is only the first problem. Infrastructure shifts, countermeasures evolve, content expires: maintaining collection through all of it is continuous work. One mistake can burn an account that took months of persona-building and paid access to establish.

The analysis that follows (tracing aliases, detecting deletions) depends on what was captured and when. Every gap weakens the attribution chain. The infrastructure to close those gaps must be tended for as long as the forums run.

More insights

02 Mar 2026

US Treasury Sanctions Russian Zero Day Broker

On February 24, 2026, the U.S. invoked the first-ever sanctions under the Protecting American Intellectual Property Act (PAIPA). The primary target: Russian national Sergey Zelenyuk and his exploit brokerage, Operation...

Read More