Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020); pages 865--872; New York City, New York, USA, February 7-12, 2020.
Machine learning models are increasingly being deployed in practice. Machine Learning as a Service (MLaaS) expose such models to queries by end users through an application programming interface (API). Prior work has developed model extraction attacks, in which an adversary extracts an approximation of MLaaS models by making blackbox queries to it. We design ActiveThief---a model extraction framework for DNNs that makes use of active learning and large public datasets to perform model extraction, that does not expect strong domain knowledge on the part of the attacker. We demonstrate that it is possible to use ActiveThief to steal deep classifiers trained on a variety of datasets from image and text domains, while evading detection by the state-of-the-art model extraction defense, PRADA. By querying a model via black-box access for its top prediction, ActiveThief improves performance on an average over a uniform noise baseline by 4.70x for image tasks and 2.11x for text tasks respectively, while using only 30% (30,000 samples) of the public dataset at its disposal.
Supplement: [ PDF ]
Slides: [ PDF ]
DOI: [ 10.1609/aaai.v34i01.5432 ]