Privacy in distributed machine learning

As machine learning moves away from data-center architectures to learning with edge devices (like smartphones, IoT sensors etc.), new paradigms such as federated learning are emerging. Here data resides on edge devices (not sent to a central server), and a collaborative model using distributed devices is built through interactive communication. This necessitates efficient mechanisms that provide privacy guarantees for user data.

We  considered distributed model learning within the federated learning (FL) framework. We combined new private, communication efficient mean-estimation algorithms with privacy amplification opportunities inherent to FL. These include client sampling and data sampling at each client (through Stochastic Gradient Descent) as well as the recently developed anonymization framework, which effectively presents to the server responses that are randomly shuffled with respect to the clients (a.k.a shuffled model). Building on these, we proved that one can achieve the same privacy, optimization-performance operating points of schemes that use full-precision communication, but at a much lower communication cost, i.e., effectively getting communication efficiency for “free”.

We explored a new question on privacy where one needs to provide multiple levels of privacy when there are multiple levels of trust. We do this through the lens of local differential privacy (LDP), where one does not trust the collector and therefore randomizes the responses to \emph{simultaneously} provide multiple levels of privacy. We provided (order-wise) tight characterizations of privacy-utility-randomness trade-offs in several cases for distribution estimation, including the standard LDP setting under a randomness constraint, demonstrating it through designing non-trivial privacy mechanisms for multi-level privacy.