Published in Computer Graphics Forum, Volume 38 (2019), Number 7 (Pacific Graphics 2019)

Interactive Curation of Datasets
for Training and Refining Generative Models

¹Tsinghua University
²Microsof Research Asia
³ College of William & Mary

          Examples of Generative Adversarial Networks (GANs) learned from user-curated training sets of (a) rough stone textures, (b) rusted metal
         textures, (c) images of bedrooms with twin beds, and (d) portraits of happy faces. Each of these GANs are trained on datasets curated using our
         interactive system in approximately 12 (stone) to 35 (face) minutes starting from a larger more general dataset of stones, metals, bedrooms, and
         faces respectively.

Abstract

We present a novel interactive learning-based method for curating datasets using user-defined criteria for training and refining Generative Adversarial Networks. We employ a novel batch-mode active learning strategy to progressively select small batches of candidate exemplars for which the user is asked to indicate whether they match the, possibly subjective, selection criteria. After each batch, a classifier that models the user’s intent is refined and subsequently used to select the next batch of candidates. After the selection process ends, the final classifier, trained with limited but adaptively selected training data, is used to sift through the large collection of input exemplars to extract a sufficiently large subset for training or refining the generative model that matches the user’s selection criteria. A key distinguishing feature of our system is that we do not assume that the user can always make a firm binary decision (i.e., “meets” or “does not meet” the selection criteria) for each candidate exemplar, and we allow the user to label an exemplar as “undecided”. We rely on a non-binary query-by-committee strategy to distinguish between the user’s uncertainty and the trained classifier’s uncertainty, and develop a novel disagreement distance metric to encourage a diverse candidate set. In addition, a number of optimization strategies are employed to achieve an interactive experience. We demonstrate our interactive curation system on several applications related to training or refining generative models: training a Generative Adversarial Network that meets a user-defined criteria, adjusting the output distribution of an existing generative model, and removing unwanted samples from a generative model.

Keywords

Interactive systems and tools

Paper and video

Paper .pdf | 24.0 MB

Trained model and code

GitHub Repo Code and model

Acknowledgements

We thank the reviewers for their insightful feedback. Pieter Peers was partially supported by NSF grant IIS- 1350323 and a gift from Nvidia.