Organization use batch payments to simplify operations. A single transaction to pay all salaries, providers, … But in doing so, they reveal more than they probably would like to. And there are some simple solutions to make things much better.
Organization use batch payments to simplify operations. A single transaction to pay all salaries, providers, … But in doing so, they reveal more than they probably would like to. And there are some simple solutions to make things much better.
The Problem: One Output Per Payee
A typical batch payment looks like this on-chain:
Inputs: [funding UTXO(s)]
Outputs:
0.08 BTC → Alice
0.15 BTC → Bob
0.12 BTC → Carol
0.08 BTC → Dave
0.21 BTC → Eve
... → (95 more)
Each output is a direct, permanent association between an address and an amount. The transaction is already public. The adversary does not need to hack anything, they can just read it.
What leaks immediately:
- The full payment amount for every recipient in the batch
- A clustering signal: all these recipients were paid by the same entity at the same time
What leaks on the next spend:
This is the subtler problem. Suppose Alice receives 0.08 BTC and later pays a contractor 0.05 BTC from that UTXO. The contractor now knows Alice had exactly 0.08 BTC from this source, because the entire UTXO was hers. Anyone watching can trace 0.05 BTC → “came from 0.08 BTC output of batch transaction X.” The contractor learns Alice’s income from the batch.
One UTXO per payee creates a perfect mapping from address to amount. Affecting their privacy instantly and in future spending.
The Design Space
Several approaches exist to address this, each with different tradeoffs:
CoinJoin this would be the best one in terms of privacy. Several CoinJoin transaction should be done to hand craft each payment’s amount from equal outputs, ideally trough multiple rounds. This process is not automated and would be very tedious if the organization uses multisig. If done properly, it would offer unbeatable privacy.
Lightning Network could also be an option to improve privacy by moving the payment off-chain. The challenge would be in the income liquidity for the payees and would be limited for really big payments. A few additional considerations such as forcing Multi Path Payments for improved privacy. There would still be the challenge of implementing this with multisig. And for medium-large payments fees would be considerably higher.
Now for the low hanging fruits:
GCD mixing decomposes all outputs into multiples of their greatest common divisor, producing many identical small outputs. It creates significant ambiguity but generates enormous transaction sizes and is practically unusable at scale with non-uniform amounts.
Naive splitting lets each payee receive multiple outputs that sum to their amount, chosen independently. Better than a single output, but without coordination the splits are likely unique and remain traceable via amount correlation.
Knapsack mixing is the approach that could actually work.
Knapsack Mixing with Variable Output Count
The core idea comes from a 2017 paper on anonymous CoinJoin transactions (Maurer et al., RWTH Aachen) 1, later explored further by nopara73 2. The insight is that privacy in a transaction is about how many ways an adversary can assign outputs to payers while keeping all sums consistent.
Formally: given a set of outputs O and a set of payment amounts P, every valid partition is an assignment of subsets of O to elements of P such that each subset sums to the correct amount. The adversary’s uncertainty about who received what is proportional to the number of valid partitions. The goal of knapsack mixing is to maximize this count.
And this is if the attacker knows what the payment amounts are to begin with. Which for the one output per payee is trivial, but wouldn’t anymore.
Denomination Design
The first step is choosing a denomination set D, a small set of output values from which all payment amounts can be constructed as sums. The knapsack heuristic for selecting D is:
- Compute all pairwise differences between payment amounts
- The values that appear most frequently as differences are structural denominators, use them as the core of D
- Verify that every payment amount can be expressed as a sum of k or fewer values from D
For a batch of 100 payments ranging from 0.04 BTC to 0.25 BTC, pairwise differences cluster around 0.04, 0.05, 0.08, and 0.09 BTC. A denomination set of {0.04, 0.05, 0.08, 0.09} covers the large majority of amounts with at most 3 outputs each:
0.25 BTC = 0.08 + 0.08 + 0.09 (×38 recipients at this tier)
0.17 BTC = 0.04 + 0.04 + 0.09
0.13 BTC = 0.04 + 0.04 + 0.05
0.08 BTC = 0.04 + 0.04 (k=2 optimal here)
0.05 BTC = 0.05 (k=1 optimal here)
This produces a pool where, for 100 payees, the denomination 0.04 BTC appears roughly 90 times, 0.08 BTC appears 70 times, and 0.09 BTC appears 50 times. An adversary looking at the output pool now faces a combinatorial assignment problem rather than a lookup.
Variable Output Count (variable k)
A natural instinct is to fix k=3 outputs per payee for all amounts. The data shows this is suboptimal in both directions:
For small amounts, fixed k=3 destroys privacy. A payee receiving 0.05 BTC forced into 3 outputs might get (0.01 + 0.02 + 0.02) — denominations that appear rarely in the pool and are highly identifiable. With variable k, the same payee takes a single output of 0.05 BTC and becomes one of ~50 identical outputs in the pool. Their partition count goes from ~50 (forced k=3) to ~600+ (k=1 or k=2 chosen optimally).
For large amounts, variable k adds nothing — and this is a feature. A payee receiving 0.25 BTC optimally split as (0.08 + 0.08 + 0.09) gets ~120,000 valid partitions from the pool. No k=1 or k=2 subset of the pool sums to 0.25 BTC, because the denomination design deliberately excludes 0.25 as a primitive. So variable k is structurally a no-op for large payees: they always use exactly 3 outputs, the adversary learns nothing from the output count, and the partition count is already large.
Stochastic Denomination Selection
This is the final and most important piece. If every payee at a given amount receives the same deterministic split — say, 0.17 BTC always becomes (0.04, 0.04, 0.09) — then an adversary who knows or guesses the splitting algorithm can reverse it (or at least gain some more information about it). The denomination splits are public by construction (they appear in the transaction), and the algorithm can be published or inferred.
The fix is to randomize over all valid splits. For a 0.17 BTC payee, valid k=3 splits from D include:
- (0.04, 0.04, 0.09)
- (0.04, 0.05, 0.08)
- (0.05, 0.05, 0.07) — if 0.07 is in D
- and so on
Randomly selecting among these at signing time means that even two payees with identical amounts receive different output configurations. An adversary cannot use the “all 0.17 BTC payees look like X” heuristic, because they don’t. Each 0.17 BTC payee is a different draw from the split distribution.
Formally, you want to sample splits weighted by how much they increase pool collision density, preferring splits that use denominations already well-represented in the pool. This can be computed greedily as each payee’s outputs are assigned.
What the Numbers Look Like
For a batch payment of 100 payees with amounts drawn from the tiers above, using denomination set {0.04, 0.05, 0.08, 0.09} with variable k and stochastic selection:
The per-payee partition count — the adversary’s true uncertainty about who received what — increases by four to five orders of magnitude compared to a naive one-output-per-payee batch.
Spending Privacy
With this scheme, when a recipient spends one of their outputs:
- The output value is a denomination (e.g., 0.08 BTC) shared by ~70 other outputs in the pool
- Any counterparty who sees the spend learns only that this person held at least 0.08 BTC from the batch
- They cannot determine whether the payer received 0.08 BTC, 0.17 BTC, 0.25 BTC, or any other amount that includes 0.08 as a component
If a recipient later consolidates all their outputs — effectively revealing their total — they expose only their own amount. The partition counts for all other recipients are reduced by at most a few percent (one fewer 0.08 and 0.09 in the pool), and the combinatorial ambiguity protecting everyone else remains sufficient.
Implementation Notes
A few practical considerations for anyone building this:
Transaction size. Moving from 1 output per payee to ~2.5 on average increases transaction weight by roughly 2.3×, which means proportionally higher fees.
Coordination. Unlike CoinJoin, this scheme requires no coordination with recipients. The sender constructs the entire output set unilaterally. Recipients simply receive multiple UTXOs instead of one and can handle them like any other outputs.
Change outputs. If the funding input does not match the sum of payments exactly, the change output will likely be a unique value and is trivially identifiable. This is separate from the recipient privacy problem that can still benefit from the denominations (slightly).
Denomination set choice. The optimal denomination set is specific to each payment batch — it depends on the distribution of amounts. Running the pairwise-difference analysis takes milliseconds and should be done fresh per transaction rather than using a global constant.
Summary
Batch Bitcoin payments leak the full payment schedule to anyone reading the chain. The naive one-output-per-payee structure creates a perfect amount-to-address mapping that persists permanently and compounds with every subsequent spend.
Knapsack mixing with variable output count and stochastic denomination selection eliminates this: it decomposes each payment into 1–3 outputs drawn from a small denomination set chosen to maximize cross-payee collision, randomizes the specific split per payee, and produces a transaction where the adversary faces a combinatorial assignment problem with millions of valid solutions rather than a trivial lookup. Recipients gain both on-chain privacy (ambiguous assignment) and future spending privacy (denomination outputs don’t reveal the total received).
The construction requires no changes to Bitcoin consensus rules, no cooperation from recipients, and no cryptographic overhead beyond standard UTXO management. It is deployable today.