Data Exploration and Privacy Preservation Over Hidden Web Databases

Wednesday, January 26, 2011
6:00 PM
Free and open to the public

A large number of online databases are hidden behind form-like web interfaces which allow users to execute search queries by specifying desired (ranges of) attribute values of the sought-after tuple(s). We consider the problem of approximate query processing over hidden databases, and propose novel sampling-based techniques which use a small number of queries through the web interface of a hidden database to produce unbiased estimates with small variance. We also explain the threats posed by such sampling-based techniques on sensitive aggregate information of hidden databases. The protection of sensitive aggregates stands in contrast to the traditional privacy problem where individual tuples must be protected while ensuring access to aggregating information. We propose privacy-preserving techniques to thwart bots from sampling the hidden database to infer aggregate information.

* Collaborative work with Xin Jin of George Washington University, Arjun Dasgupta, Bradley Jewell, Anirban Maiti, and Dr. Gautam Das of University of Texas at Arlington, and Dr. Surajit Chaudhuri of Microsoft Research

x x


Nan Zhang

Assistant Professor
George Washington University