Professor David Tse, Stanford University
The Science of Information: From Pushing Bits Over the Air to Assembling World's Largest Jigsaw Puzzles
For General Audience
Monday, November 2, 3:30pm | Reception to follow | RSVP here
W Campus, Paul G. Allen Center, CSE Atrium
Map and Directions
Information theory is the science behind the engineering of all modern day communication systems. Before information theory, the design of communication systems was ad hoc and tied to the specific source and specific physical medium of communication. By focusing instead on an abstract but quantifiable notion of information, information theory provides a unified basis for the design of all communication systems, identifies fundamental tradeoffs and introduces new ways of communicating that upends decades of engineering intuition. Although originally invented in the context of communication, this way of thinking can be broadened to other fields as well. In this talk, we give success stories of applying the theory in two fields: wireless communication and computational biology. In the field of wireless communication, information theory has played a key role in the orders-of-magnitude of increase in the efficiency of spectrum utilization. In biology and medicine, high-throughput sequencing has revolutionized how science is done in the past decade. High throughput sequencing generates hundreds of millions of short fragments called reads and a key computational problem is the assembly of these reads to reconstruct the underlying DNA or RNA sequence. We describe an information theoretic framework for this problem and how it led to the design of an RNA assembler, which is significantly more accurate than prior art. In these stories, a curious theme recurs: solutions that are information theoretically optimal can often be achieved computational efficiently as well.
Haplotype Phasing, Convolutional Codes and Community Detection
Tuesday, November 3, 10:30am
Electrical Engineering Building, Room 105
Map and Directions
The era of high-throughput sequencing, when large amounts of DNA and RNA sequence data are generated at increasingly lower costs, presents interesting algorithmic problems that have connections to multiple fields. In this talk, we will present one such problem. Humans have 23 pairs of homologous chromosomes, which are identical except on certain positions called single nucleotide polymorphisms (SNPs). A haplotype of an individual is the pair of sequences of SNPs on the two homologous chromosomes. Knowing the haplotypes of individuals can lead to a better understanding of the interplay of genetic variation and disease as well as better inference of human demographic history. In this talk, we discuss the problem of inferring haplotypes from high-throughput sequencing data in the form of short fragments called reads. We give a simple formula for the number of reads needed to accurately reassemble a haplotype. The analysis leverages connections between this problem and decoding convolutional codes, a well-studied problem in communication theory. Finally, we will discuss an interesting connection with the problem of community detection, where communities have to be inferred based on the friendship graph of users.
David Tse, professor of electrical engineering at Stanford University, received his B.A.Sc in systems design engineering from the University of Waterloo and his M.S. and Ph.D in electrical engineering from MIT. He is coauthor, with Pramod Viswanath, of the text “Fundamentals of Wireless Communication.” Tse is also the inventor of the proportional-fair scheduling algorithm used in all third and fourth-generation cellular systems. He was a postdoctoral member of technical staff at A.T. & T. Bell Laboratories 1994-1995 and was on the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley 1995-2014. He has received an NSERC graduate fellowship from the government of Canada, an NSF CAREER award, the Erlang Prize, numerous best paper awards and several teaching awards. His research interests are in information theory and its applications in various fields, including wireless communication, energy and computational biology.