Tel-Aviv University, School of Computer Science

Leveraging Big Data, Fall 2013/2014

Time & Location

Mondays 16:00-19:00, Sherman Building Room 105 (Life Sciences building)

Course Outline

The purpose of this class is to introduce some key tools and techniques that are used to leverage massive data and are currently not covered in the basic curriculum.

The class meets once a week for a 3 hour lecture. Pointers and notes/slides for the material covered in each lecture will be posted on this Web page.


Completion of all first-year courses (linear-algebra, calculus, probability), programming, and data structures and algorithms. The class is open to graduate students and third-year undergraduates.


Office hours:

By appointment { edco, fiat, haimk, milo } AT

Course Workload and Grading

  • 2-4 Problem sets/projects (some require programming) 50%

  • Final exam 50%

Problem Sets

Submission Instructions:
Please submit in pdf format to the email address taubigdata (at)
Email title should include HWx and your name -- x= HW number (1,2,3).
File name: firstname_lastname_HWx.pdf
Full name and ID to be part of title.
  • Homework1 Solutions: Asaf Ezra , Tal Saiag

    Posted October 31, 2013. Due November 17, 2013.
    EXTENSION: Due to multiple requests, due date is now on or before Thursday, November 21.
    Work submitted by November 17 will recieve 8 points extra credit.
    If you have already handed in the assignment and want to use the extra time to work on it more, you can of course resubmit, but loose the bonus points.

  • Homework2 Solutions: Guy Lev

    Posted December 4, 2013. Due December 29, 2013.

  • Homework3 Solutions: Barak Cohen , Asaf Ezra

    Posted January 8, 2014. Due January 27, 2013.
    EXTENSION: Due to multiple requests, due date is now on or before 18:00 (6pm) January 31.

Practice Final Exam

Mock-up final


Initial login is with your TAU user name and password, then register to create an account.
Please post questions/inquiries that are of general interest to students

Schedule and slides

Topics (tentative)

  • Data Streams, Synopsis structures, Min-Hash sketches/samples (2-3 sessions): Approximate distinct counting, frequent items/heavy hitters, similarity estimation, sliding windows and time-decay on streams
  • Mining large graphs (2-3 sessions), with focus on social networks and web graphs. Centrality, similarity, all-distances sketches, community detection, link analysis, spectral techniques.
  • Multi-dimensional data and more data mining (4-6 sessions): dimensionality reduction,locality-sensitive hashing, spectral techniques (Latent Semantic Analysis), Principal components analysis, collaborative filtering and recommendation systems, clustering, summarization (sketching and sampling), mining association rules ("correlation" rules).
  • Map-reduce, Pig Latin, and NoSQL (2 sessions)
  • Data Sharing Incentives and Markets (1-2 sessions)
  • Privacy Issues, Differential privacy (1 session)

Last modified: Mon Feb 3 05:43:01 PST 2014