Counting Grains of Sand in an Infinite Universe: Why CoinJoin Privacy Doesn't Need Perfect Math

Coinjoin entropy becomes computationally impossible to calculate exactly for large transactions, but lower-bound estimates provide rigorous cryptographic guarantees through information theory.

The Ancient Problem of Impossible Counting

In the third century BC, Archimedes set out to prove something that most people thought was nonsense: that the number of grains of sand needed to fill the entire universe was finite and expressible. His treatise “The Sand Reckoner” didn’t just solve the problem, it invented new mathematical notation to handle numbers far beyond what Greek mathematics could previously express.

Archimedes arrived at 10^63 grains, a number so vast it exceeds human intuition but remains mathematically tractable. The key insight wasn’t knowing the exact count but rather establishing rigorous bounds.

Today, Bitcoin privacy researchers face a strikingly similar problem. When analyzing a WabiSabi CoinJoin transaction with 300 inputs and 300 outputs, they must answer: how many valid interpretations of this transaction exist? And the answer, much like Archimedes’ sand, turns out to be a number so large it exceeds not just human intuition but computational feasibility, somewhere around 2^600 possible mappings.

But here’s the twist that makes this a feature rather than a bug: unlike Archimedes, we don’t need to know the exact count. We only need to establish lower bounds. And information theory guarantees that those lower bounds translate directly into cryptographic security.

What Is Boltzmann Entropy and Why Does It Matter?

In 2015, a researcher named LaurentMT introduced the Boltzmann algorithm for analyzing Bitcoin transactions. The name isn’t accidental. It borrows directly from Ludwig Boltzmann’s statistical mechanics, where entropy measures the number of microscopic arrangements (microstates) that produce the same observable outcome (macrostate).

For a Bitcoin transaction, the analogy works like this:

The macrostate is what any observer can see on the blockchain: the list of inputs, the list of outputs, and their respective amounts. The microstates are all the different ways those inputs could have actually paid those outputs while still producing the same observable transaction.

A simple payment with one input and two outputs (payment + change) typically has exactly one valid interpretation: the input paid both outputs. Entropy equals zero. But a CoinJoin deliberately creates ambiguity. If five people each contribute one input and receive one output, and all outputs are the same denomination, which input paid which output? Without additional information, there are 5! = 120 equally valid interpretations.

The entropy is simply the logarithm (base 2) of the number of valid interpretations: E = log₂(N). For 120 interpretations, that’s roughly 6.9 bits of entropy.

This number has a direct, concrete meaning: an analyst trying to determine the true mapping has, at best, a 1-in-120 chance of guessing correctly. Each bit of entropy roughly doubles their work.

The Computational Wall

The Boltzmann tool works by exhaustively enumerating all valid input-output mappings. For each potential grouping of inputs, it checks whether any subset of outputs sums to the same value. If so, that’s a valid partial mapping, and the algorithm continues recursively.

The problem is that this enumeration grows combinatorially. The number of ways to partition n items is given by the Bell number B_n, which grows faster than exponentially:

B_10 = 115,975
B_15 = 1,382,958,545
B_20 ≈ 5.17 × 10^13

And that’s just the input partitions. Each partition must then be matched against output partitions, which have their own Bell number growth. The full complexity is approximately O((n/ln(n))^n × (m/ln(m))^m) for n inputs and m outputs.

LaurentMT, with characteristic self-deprecation, describes his implementation as “the worst implementation in the world” because it’s fundamentally brute-force. The tool defaults to a maximum of 12 inputs or outputs (maxnbtxos=12) and times out after 600 seconds. Even with optimizations like input merging and constraint propagation, transactions beyond roughly 15-20 UTXOs become computationally intractable.

This isn’t a software limitation. The underlying problem, counting solutions to a subset-sum variant, is #P-complete. This complexity class captures problems where even verifying a count is polynomial-time, but computing the count is believed to require exponential resources. No algorithmic breakthrough will make exact Boltzmann calculation tractable for large transactions because the problem itself is fundamentally hard.

WabiSabi’s Deliberate Design

The WabiSabi protocol, deployed in Wasabi Wallet 2.0, doesn’t try to avoid this computational wall. It deliberately runs straight into it.

Earlier CoinJoin protocols like Whirlpool used fixed denominations: everyone contributes the same amount and receives the same amount. This makes entropy calculation trivial. If k outputs share the same denomination, the anonymity set is k and entropy is log₂(k). Simple, calculable, and limited.

WabiSabi takes a different approach. Using keyed-verification anonymous credentials and Pedersen commitments, it allows participants to contribute arbitrary amounts and receive multiple outputs of various denominations. The “Dynadenomination” algorithm decomposes inputs into 79 standard denominations (powers of 2, powers of 3, 1-2-5 series) specifically chosen to maximize overlap between participants.

A typical WabiSabi round might include:

300 inputs from dozens of participants
300 outputs across multiple denomination tiers
Variable change handling that eliminates deterministic links
Minimum participation threshold of just 5,000 sats

The mapping space for such a transaction is approximately 2^n × 2^m, where n and m are the input and output counts. For n=300 and m=300, that’s around 2^600 possible interpretations, a number that exceeds the estimated atoms in the observable universe (roughly 10^80) by a factor of 10^100.

No computer that could ever exist, not even one consuming the energy output of every star in every galaxy, could enumerate all these mappings before the heat death of the universe. The combinatorial explosion is the privacy mechanism.

Why Lower Bounds Are Enough

Here’s where critics sometimes object. If we can’t calculate exact entropy, how do we know the transaction is actually private? Isn’t this just “trust me bro” security?

The answer lies in information theory, specifically in the relationship between entropy and guessing probability.

The min-entropy of a distribution is defined as:

H_min(X) = -log₂(max p_i) = log₂(1/p_max)

If we can prove that entropy is at least k bits, then the adversary’s probability of guessing the correct mapping in a single attempt is at most 2^(-k). This isn’t hand-waving; it’s the same mathematical foundation that underlies all modern cryptography.

Consider what different entropy levels mean:

Entropy (bits)	Single-Guess Success	Guesses for 50%
10	~0.001 (1 in 1,024)	~700
20	~10^-6	~700,000
40	~10^-12	~7 × 10^11
80	~10^-24	~2 × 10^23

A transaction with just 40 bits of proven minimum entropy gives an adversary worse than one-in-a-trillion odds. And that’s the conservative lower bound, not the true entropy.

The connection between Shannon entropy and min-entropy further strengthens this: H_min ≤ H_Shannon for any distribution. If we prove a lower bound on Shannon entropy through conservative estimation, that bound also applies to min-entropy and thus to guessing probability.

How Wasabi Calculates Anonymity Scores

Wasabi Wallet displays an “anonymity score” for each coin, but this score is explicitly designed to be a conservative underestimate. The calculation:

Counts only outputs with exactly matching denominations
Applies reduction factors when the same user has multiple outputs of the same denomination
Uses weighted averaging for inherited anonymity from input coins
Deliberately ignores cross-denomination ambiguity

The documentation is explicit about this philosophy: “We are trying to do lower estimations, rather than higher ones.”

A typical “Maximize Privacy” target might be an anonymity score of 27-76. The documentation suggests 50 is “sufficient to evade low-level blockchain forensics.” But these numbers represent only the most easily countable form of ambiguity, the number of same-denomination outputs.

The true anonymity includes additional layers:

Tier 1 (Verifiable): The displayed score counts identical outputs. This is the minimum guarantee.

Tier 2 (Probabilistic): Multiple valid sub-mappings exist across different denomination tiers. These can’t be precisely enumerated but provide genuine additional protection.

Tier 3 (Computational): Full analysis requires 2^600+ operations. Even an adversary who knew the exact algorithm couldn’t complete it.

The Cryptographic Parallel

This approach mirrors how we evaluate cryptographic security generally. No one demands a proof that AES-256 is literally impossible to break. Instead, we prove that any attack requires at least 2^128 operations (for the best known attacks), which exceeds the computational capacity of any conceivable adversary.

The security claim isn’t “this can never be broken” but rather “breaking this requires resources that don’t exist and won’t exist.”

Similarly, WabiSabi’s privacy claim isn’t “we’ve calculated exactly how many interpretations exist.” It’s “we’ve proven the minimum number of interpretations exceeds any adversary’s ability to enumerate, and even the conservative lower bounds translate to astronomically low success probabilities.”

Formal Guarantees from Approximation Theory

For those who want even stronger assurance, approximation theory provides rigorous frameworks for estimating #P-hard counting problems.

Chernoff bounds establish that with N independent samples estimating a quantity μ, the probability of deviation greater than εμ is at most 2e^(-Nε²/3). This means randomized sampling can provide arbitrarily tight estimates with provable confidence intervals.

Fully Polynomial Randomized Approximation Schemes (FPRAS) can achieve (1+ε)-approximation with probability at least (1-δ) in time polynomial in input size, 1/ε, and log(1/δ). Even for #P-hard problems, we can get arbitrarily good estimates with provable guarantees.

Massey’s guessing entropy inequality provides another bound: G(X) ≥ 2^(H(X)-2) + 1, meaning the expected number of guesses to identify the correct interpretation grows exponentially with entropy.

These aren’t ad-hoc arguments. They’re the same mathematical tools used throughout theoretical computer science and cryptography to reason about problems we can’t solve exactly.

The Information-Theoretic Intuition

Perhaps the most intuitive way to understand why lower bounds suffice is through information theory’s basic premise: each bit of entropy represents one binary question an adversary must answer correctly.

If a transaction has 40 bits of entropy, an adversary needs to obtain 40 bits of auxiliary information to deanonymize it. That might mean:

Learning which IP address submitted which input (but this is hidden behind Tor)
Timing analysis across multiple transactions (but participants’ timing is randomized)
Cross-referencing with exchange data (but this only helps if the user cashes out through KYC)

Each additional bit of entropy doubles the adversary’s information requirements. The privacy doesn’t depend on them being unable to calculate the exact entropy; it depends on them lacking the information to identify the correct interpretation among the vast possibility space.

What This Means in Practice

For WabiSabi CoinJoin users, the practical implications are:

The displayed anonymity score is a floor, not a ceiling. Your true privacy is higher than the number shown, potentially by orders of magnitude.

Computational intractability is your friend. The fact that exact entropy can’t be calculated isn’t a limitation; it’s a manifestation of the same mathematical barriers that protect you from adversaries.

Conservative estimates are honest estimates. A system that claims “at least 50” rather than “exactly 847” is being transparent about what can be rigorously proven versus what is likely but unverifiable.

The security model is standard cryptography. You’re trusting the same type of computational hardness assumptions that protect your Bitcoin private keys, your TLS connections, and your encrypted messages.

Conclusion: Counting Beyond the Countable

Archimedes proved that even the apparently uncountable, every grain of sand in the universe, could be bounded and reasoned about mathematically. He didn’t need to count each grain individually. He needed to establish upper bounds that demonstrated the finite nature of a seemingly infinite quantity.

WabiSabi inverts this approach. Rather than proving a large number is actually finite, it proves that the number of transaction interpretations is so astronomically large that exact calculation is impossible. The privacy doesn’t come from uncertainty about whether the number is big; it comes from mathematical proof that it’s big enough to exceed any adversary’s computational capacity.

The “trust me bro” critique misunderstands both the information-theoretic foundations and the practical reality. We don’t need to trust anyone’s claim about exact entropy. We can verify that:

The lower-bound estimation method is conservative
The lower bounds translate to concrete probability bounds via information theory
The full enumeration problem is provably #P-hard
The scale of the combinatorial explosion exceeds all computational resources

This is how modern cryptographic security works. Not through certainty about exact quantities, but through mathematical proofs that whatever the exact quantities are, they’re sufficient for the security guarantees we need.

Archimedes would have appreciated the elegance. You don’t need to count grains of sand if you can prove there are more grains than any adversary could ever examine.