In federated learning (FL), $K$ clients jointly train a model without sharing raw data.
Because each participant invests data and compute, clients need mechanisms to later prove
the provenance of a jointly trained model.
Model watermarking embeds a hidden signal in the weights, but naive approaches either do
not scale with many clients—as per-client watermarks dilute as $K$ grows—or give any
individual client the ability to verify and potentially remove the watermark.
We introduce $(t,K)$-threshold watermarking: clients collaboratively
embed a shared watermark during training, while only coalitions of at least $t$ clients
can reconstruct the watermark key and verify a suspect model. We secret-share the
watermark key $\tau$ so that coalitions of fewer than $t$ clients cannot reconstruct it,
and verification can be performed without revealing $\tau$ in the clear.
We instantiate our protocol in the white-box setting and evaluate on image
classification. Our watermark remains detectable at scale ($K=128$) with minimal
accuracy loss and stays above the detection threshold ($z \geq 4$) under attacks
including adaptive fine-tuning using up to 20% of the training data.