Tel-Aviv University, School of Computer Science

Leveraging Big Data, Fall 2013/2014 : Schedule

Week Date Instructor Topics Slides Notes
1 14/10/2013 Edith Cohen Synopsis structures
Data Streams
Misra Gries Summaries and frequent elements
Morris approximate counters
Distinct counting: Introduction
PPT PDF see updated slides 22/10
2 21/10/2013 Edith Cohen
Approximate Distinct counting:
Min-Hash sketches (k-mins, bottom-k, k-partition)
Estimators and applying concepts from statistics:
Maximum Likelihood Estimator
Sufficient statistic and Rao-Blackwell Theorem
Cramer-Rao Lower Bound
3 28/10/2013 Edith Cohen
Min-Hash sketches as random samples
Subset-size queries from samples
Modeling documents as sets of features:
Applications of finding similar documents
Broder's shingling technique
Jaccard and Cosine Similarity
Estimating similarity from Min-Hash sketches
Inverse-probability distinct count estimators
PPT PDF Updated October 29
4 04/11/2013 Amos Fiat Clustering:
Distance functions
k-center, k-means, k-medians
Generational models
Mixture of Gaussians
EM (Expectation Maximization)
Facility location
PPT PDF Updated November 5
5 11/11/2013 Haim Kaplan Dimensionablity reduction via projections
Locality Sensitive Hashing
Johnson-Lindenstrauss Lemma
The AMS sketch and estimating the second frequency moment
PPT PDF Updated November 20
6 18/11/2013 Haim Kaplan Application of dimensionality reduction: Approximate period of a time series
The Locality Sensitive Hashing (LSH) formulation
PPT PDF Lecture included last (application) part of Lecture 5; Updated Nov 20
7 25/11/2013 Amos Fiat
Principle Components Analysis (PCA)
Singular Value Decomposition (SVD)
PPT PDF see combined lecture8
8 02/12/2013 Amos Fiat Streaming SVD PPT PDF lectures 7,8 combined, Dec 2
9 09/12/2013 Eran Rom (IBM) Distributed Hash Tables (DHT): Chord
Consistency: The CAP theorem
Cloud storage systems
PPT PDF (Consistency)
Guest lecture
10 16/12/2013 Paula Ta-Shma (IBM)
Tova Milo
MapReduce PPT (part1) PDF (part1)
PDF (part2) PDF (part3)
Guest lecture.
Part2 slides are by Paul Kryzanowski
Part3 slides are by Dan Suciu
11 23/12/2013 Edith Cohen Graph datasets
Mining social and other graphs: properties
Some techniques/algorithms for very large graphs:
Min-Hash sketches of reachability sets
All-Distances sketches (ADS)
PPT PDF preliminary PDF posted 22/12
12 30/12/2013 Edith Cohen Computing All-Distances Sketches:
Pruned Dijkstra and Dynamic Programming methods Applications of All-Distances Sketches:
Closeness centrality and similarity estimation
Approximate shortest-path distance queries (distance oracles)
PPT PDF Updated 31/12 (more examples, corrections)
13 06/01/2014 Edith Cohen Back to Linear sketches:
counting (Exactly1? sketch)
Sampling (Sample1 sketch)
Sketching adjacency vectors of nodes in a graph (Ahn Guha McGregor 2012):
Connected Components in sketch space
More on random sampling:
Reservoir sampling,
weighted sampling: Poisson PPS
Bottom-k (order,priority) sampling
PPT PDF Updated 07/01
14 13/01/2014 Haim Kaplan I/O Efficient Algorithms
Merge sort
Cache oblivious sorting: Funnel sort
Lower bound
PPT (Funnel sort)
PDF (Funnel sort)
PPT (Lower bound)
PDF (Lower bound)
Updated 14/01.
Lower bound slides from Lars Arge.

Last modified: Tue Jan 14 14:58:24 IST 2014