#### Tel-Aviv University, School of Computer Science

### Leveraging Big Data, Fall 2013/2014 : Schedule

Week | Date | Instructor | Topics | Slides | Notes |
---|---|---|---|---|---|

1 | 14/10/2013 | Edith Cohen |
Synopsis structures Data Streams Misra Gries Summaries and frequent elements Morris approximate counters Distinct counting: Introduction | PPT PDF | see updated slides 22/10 |

2 | 21/10/2013 | Edith Cohen |
Approximate Distinct counting: Min-Hash sketches (k-mins, bottom-k, k-partition) Estimators and applying concepts from statistics: Maximum Likelihood Estimator Sufficient statistic and Rao-Blackwell Theorem Cramer-Rao Lower Bound | PPT PDF | |

3 | 28/10/2013 | Edith Cohen |
Min-Hash sketches as random samples Subset-size queries from samples Modeling documents as sets of features: Applications of finding similar documents Broder's shingling technique Jaccard and Cosine Similarity Estimating similarity from Min-Hash sketches Inverse-probability distinct count estimators | PPT PDF | Updated October 29 |

4 | 04/11/2013 | Amos Fiat |
Clustering: Distance functions k-center, k-means, k-medians Generational models Mixture of Gaussians EM (Expectation Maximization) Facility location | PPT PDF | Updated November 5 |

5 | 11/11/2013 | Haim Kaplan |
Dimensionablity reduction via projections Locality Sensitive Hashing Johnson-Lindenstrauss Lemma The AMS sketch and estimating the second frequency moment Applications | PPT PDF | Updated November 20 |

6 | 18/11/2013 | Haim Kaplan |
Application of dimensionality reduction: Approximate period of
a time series The Locality Sensitive Hashing (LSH) formulation | PPT PDF | Lecture included last (application) part of Lecture 5; Updated Nov 20 |

7 | 25/11/2013 | Amos Fiat |
Principle Components Analysis (PCA) Singular Value Decomposition (SVD) Eigenvectors Streaming | PPT PDF | see combined lecture8 |

8 | 02/12/2013 | Amos Fiat | Streaming SVD | PPT PDF | lectures 7,8 combined, Dec 2 |

9 | 09/12/2013 | Eran Rom (IBM) |
Distributed Hash Tables (DHT): Chord Consistency: The CAP theorem Cloud storage systems |
PPT
PDF (DHTs)
PPT PDF (Consistency) |
Guest lecture |

10 | 16/12/2013 | Paula Ta-Shma (IBM) Tova Milo |
MapReduce |
PPT (part1)
PDF (part1)
PDF (part2) PDF (part3) |
Guest lecture. Part2 slides are by Paul Kryzanowski Part3 slides are by Dan Suciu |

11 | 23/12/2013 | Edith Cohen | Graph datasets
Mining social and other graphs: properties Some techniques/algorithms for very large graphs: Min-Hash sketches of reachability sets All-Distances sketches (ADS) | PPT PDF | preliminary PDF posted 22/12 |

12 | 30/12/2013 | Edith Cohen |
Computing All-Distances Sketches: Pruned Dijkstra and Dynamic Programming methods Applications of All-Distances Sketches: Closeness centrality and similarity estimation Approximate shortest-path distance queries (distance oracles) | PPT PDF | Updated 31/12 (more examples, corrections) |

13 | 06/01/2014 | Edith Cohen |
Back to Linear sketches: counting (Exactly1? sketch) Sampling (Sample1 sketch) Sketching adjacency vectors of nodes in a graph (Ahn Guha McGregor 2012): Connected Components in sketch space More on random sampling: Reservoir sampling, weighted sampling: Poisson PPS Bottom-k (order,priority) sampling | PPT PDF | Updated 07/01 |

14 | 13/01/2014 | Haim Kaplan |
I/O Efficient Algorithms Merge sort Cache oblivious sorting: Funnel sort Lower bound |
PPT (Funnel sort) PDF (Funnel sort) PPT (Lower bound) PDF (Lower bound) |
Updated 14/01. Lower bound slides from Lars Arge. |