Anonymous Messaging via DHTs Doesn't Work

Everything turns into bitmessage

I ran across discussion of chitchatter on HN and felt an obligation to write up an unpublished "Negative Result" from my misspent PhD days working on optimizations for Distributed Hash Tables.

First note that this isn't an attack piece on the authors of chitchatter or similar chat-on-dht tools. Webtorrent makes this sort of thing relatively easy to do now (you had to run a kademlia client before this). It is a breakdown of why almost any method that actually leverages the DHT by definition can't be anonymous.

Note that "Anonymous Messaging '' is solved (it could stand for a re-design, but principles are settled) in bitmessage. I don't actually recommend folks use that software, but the basic principle of it is:

I want to minimize information gain about the destination of my messages over time

The bitmessage system achieves this by sending all messages to all participants. This only reveals the information that the intended recipient will use bit-message at some point in the future.

A "meet on the dht" model of anonymity only works when the nodes serving that DHT are not tracking your requests. This leads to a core property of highly successful p2p systems: "Make sure the most effective way to spy on your system is to support it". DHTs and TOR leverage this feature. The primary means of spying (or even attack/DOS) is to provide nodes that at least seem to be "good citizens". So to "de-anonymize" a user of chitchatter or a similar app, one needs to host the dht node they are sharing. Then an attacker can at best know when (and how large) messages are sent to a group. By default the infrastructure of WebTorrent supports scraping, so records are effectively public. A malicious node can effectively choose any hash it wants when joining the DHT (yes you can make it arbitrarily hard via proof of work but that doesn't buy much without breaking nonmalicious use cases) and so can ensure it is the primary "meet in the middle" server for any given hash.

WebRTC servers are even worse, because they can MITM attack any connections. Even if it can't read messages, it can gain total knowledge of what messages are sent (along with origin and destination) by providing bogus NAT-busting and acting as a fallback intermediary.

"But I can do something clever with changing our meet-in-the-middle hash over time to prevent attackers from knowing where to look". This is a good idea (that I chased for a while) but the solution degrades into "I'll send messages to X% of the addresses on the DHT and mask our meet-in-the-middle in obfuscating traffic". This DOES WORK, until your chat app actually gets popular and regular piracy traffic ceases to obfuscate your chat messages. If you want to scale independently of web-piracy, your system degrades into bitmessage or tor as you increase redundancy of messaging to obfuscate sender and recipient.

A motivated attacker can still probably sybil and eclipse an entire DHT trivially if they want. The reason this doesn't happen often in the wild (or if it does, they don't disrupt things) isn't because these systems are resistant to it, it is because there isn't a motivation to break the system when it provides such a potent surveillance device while functioning as intended. Things like I know what you download are obvious applications of this.

It is worth reiterating, because it basically dooms anonymity in p2p systems. All successful p2p systems are those that provide effective surveillance in exchange for being supported by malicious nodes. Sybil/Eclipse attack vulnerability is a feature to ensure plentiful nodes.