On the Bias of Traceroute Sampling
(or: Power-Law Degree Distributions in Regular Graphs)
David Kempe
Department of Computer Science
University of Southern California
Wednesday, March 14, 2007
4:30 - 5:30 PM
Terman Engineering Center, Room 453
Abstract:
Understanding the structure of the Internet graph is a crucial step
for building accurate network models and designing efficient
algorithms for Internet applications. Yet, obtaining its graph
structure is a surprisingly difficult task, as edges cannot be
explicitly queried. Instead, empirical studies rely on traceroutes to
build what are essentially single-source, all-destinations,
shortest-path trees. These trees only sample a fraction of the
network's edges, and a recent paper by Lakhina et al. found
empirically that the resuting sample is intrinsically biased. For
instance, the observed degree distribution under traceroute sampling
exhibits a power law even when the underlying degree distribution is
Poisson.
In this talk, we study the bias of traceroute sampling systematically,
and, for a very general class of underlying degree distributions,
calculate the likely observed distributions explicitly. To do this, we
use a continuous-time realization of the process of exposing the BFS
tree of a random graph with a given degree distribution, calculate the
expected degree distribution of the tree, and show that it is sharply
concentrated. As example applications of our machinery, we show how
traceroute sampling finds power-law degree distributions in both
d-regular and Poisson-distributed random graphs.
Thus, our work puts the observations of Lakhina et al. on a rigorous footing,
and extends them to nearly arbitrary degree distributions.
(Joint work with Dimitris Achlioptas, Aaron Clauset, and Cristopher Moore.)
Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html