Publications of Edith Cohen

Papers/Presentations by topic (with links to chronological lists)

Chronological lists (with links to topics)

Papers and presentations sorted by topic.

(provided in postscript/compressed postcript/pdf/HTML/ppt/pdf formats)

Copyright Notice: Since most of these papers are published, the copyright has been transferred to the respective publishers. Therefore, the papers cannot be duplicated for commercial purposes. The following is ACM's copyright notice; other publishers have similar ones.

Copyright © 199x by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that new copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.

List of Topics: Items that fit in multiple categories are listed under the most relevant one.

Random samples as synopses of datasets

Algorithmic Game Theory

WWW performance

Networking: Routing, Path selection, Measurements, Packet filtering

Content Distribution: P2P overlays, search, replication

Data: XML, Data streams, Spatial aggregation, time and spatial decay

Data Mining / Machine Learning

Algorithms for graphs and networks

Optimization / Linear Programming

Misc

Random samples as synopses of datasets

Random samples are flexible general-purpose synopses of data sets that are too massive to store in full, manipulate, or transmit. Answers to queries posed over the original data can be quickly estimated from the sample, and the same sample can be used for many types of queries. My work on sampling aims to facilitate a more effective use of sampling as synopses by designing sampling schemes and estimators for fundamental classes of queries.
Scalable design of sampling algorithms should depend on how the data is presented, to avoid the time and cost of aggregation. We distinguish between data sets presented as key value pairs (which can be streamed or distributed), data sets where there is a vector of values associated with each key and these values are dispersed across servers or time, and finally unaggregated data sets, where the data contains multiple elements with the same key and the value of the key is the sum of element weights.
The value of our sample hinges on the performance of our estimators. The sampling scheme should be geared to the statistics (queries) we would like supported and estimators should optimally use the information in the sample. I am also interested in understanding the limits of a sampling scheme: Characterize the queries we can estimate well and provide optimal estimators for a given query.

Sampling schemes for data presented as key value pairs

Multi-objective sampling

Common queries over key-value data sets are segment f-statistics, which are the sum over keys in a selected segment of a function f applied to the value. This formulation includes segment sum, counts, moments, thresholds, and capping statistics. A weighted sample computed with respect to a particular f provides estimaes with statistical guarantees for f-statistics. This is achieved by including each key with probability roughly proportional to f applied to its value. But estimation quality of g-statistics from a weighted sample computed with respect to f deteriorates when g is very different than f (for example, when f and g are sum and count, or are threshold or capping functions with very different parameters.) We study here multi-objective samples which provide estimates with statistical guarantees on quality for a set of different functions. In particular, we show how to construct efficiently a small sample which provides estimates with statistical guarantees for all monotone non-decreasing functions f.

Multi-objective weighted sampling
Edith Cohen.

HotWeb 2015 | cite | arXiv:1509.074451325 | Talk (animated pdf)

Structure-aware sampling

Data is typically structured and queries conform to the structure. Sampling schemes, however, are oblivious to the structure. We propose sampling schemes that are structure-aware and allow for more accurate approximate range queries while retaining the benefits of sample-based synopses. Part of the work builds on the flexibility within the VarOpt family of distributions to gain structure-awareness.

Structure-aware sampling: Flexible and accurate summarization.
Edith Cohen, Graham Cormode, Nick Duffield.

PVLDB 2011 | cite| pdf | pptx (by Graham Cormode)
Structure-aware sampling on data streams.
Edith Cohen, Graham Cormode, Nick Duffield.

SIGMETRICS 2011 | cite| pdf

VarOpt sampling

With classic Poisson Probability Proportional to Size (PPS) samples, the inclusion probability of each key is proportional to its weight and inclusions of different keys are independent. VarOpt samples have distinct advantages over Poisson sampling. They have PPS inclusion probabilities but also maintain a fixed-size sample and variance-optimal subset-sum estimators. We present efficient VarOpt stream-sampling algorithms. We also characterize a VarOpt family of sampling distribution, showing they all satisfy some important properties including Chernoff-Hoeffding bounds.

Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums.
Edith Cohen, Nick Duffield, Haim Kaplan , Carsten Lund , and Mikkel Thorup.

SODA 2009 | cite
SIAM J. Comput. 2011 | cite | pdf

Bottom-k (order) sampling

Bottom-k sampling (in Statistics literature also known as order sampling) is performed by assigning keys random ranks that may depend on their value and including the k keys with minimum rank. By appropriately choosing rank distributions, bottom-k includes successive weighted sampling without replacement and Sequential Poisson (Priority) sampling. Bottom-k samples, like Poisson samples, (and as it seems, unlike VarOpt) can be coordinated . The sample has fixed size k, which is an advantage over Poisson samples and over with-replacement samples. We also explore the application of bottom-k sampling to the analysis of large graphs, expanding on my size-estimation work. We study data structures, sampling algorithms, and subset-sum estimators for bottom-k samples.

Tighter estimation using bottom-k sketches.
Edith Cohen and Haim Kaplan .
VLDB 2008 | cite| pdf
SIGMETRICS 2007 (early poster of PODC 2007 and VLDB 2008 papers)| cite | pdf
Summarizing data using bottom-k sketches.
Edith Cohen and Haim Kaplan .
PODC 2007| cite | pdf

Sampling dispersed data: Coordinated sampling, Monotone sampling, Estimation

Data sets, such as request and traffic logs and sensor measurements, are repeatedly collected in multiple instances: time periods, locations, or snapshots. The data can be viewed as a matrix, with rows corresponding to different keys (users, sensors, cookies) and columns to different instances. Each entry is the value of a key in a given instance. Queries which span multiple instances, such as distinct counts and distance measures are used for planning, management, and anomaly or change detection. To scalably summarize such data, the sampling scheme of one instance should not depend on values in other instances. We would like, however, to use the same sample to estimate different statistics, which may depend on multiple instances. Since we are estimating a function from samples which provide only partial information on the set of values, the estimation problem is challenging. We study sampling schemes and estimation when instances are sampled independently and when the samples are coordinated.
Coordinated sampling is a method of sampling different instances so that the sampling of each instance is classic Poisson or bottom-k but samples of different instances are more similar when the instances are more similar. Statisticians proposed and used coordination as a method of controlling overlap in repeated surveys. Computer Scientists, myself included, (re-)discovered coordination for data analysis from samples: aggregates such as distinct counts and Jaccard Similarity can be better estimated when samples are coordinated. Coordination generalizes the technique popularly known as MinHash sketches. Sample coordination is a form of Locality Sensitive Hashing (LSH). I first came across coordination when observing that coordinated samples of all neighborhoods (or reachability sets) in a graph can be computed much more efficiently than independent samples. With time, I recognized the potential of coordination for large scale data analysis and opted to better understand it. We presented tighter estimators for basic queries, and more recently (see ), proposed a model of monotone sampling, which generalizes coordinated sampling, and provided a full characterization for the queries we can successfully answer along with efficient and practical optimal estimators.
In particular, since in general there may not exist one estimator with minimum variance for all possible data, we seek admissibility (Pareto variance optimality), meaning that estimators can not be improved for one data set in the domain without increasing variance for another. We also propose the stricter order-based optimality, where an order is specified over the data domain and we look for estimators which can not be improved for one data without increasing variance for preceding data.

Get the most out of your sample: Optimal unbiased estimators using partial information.
Edith Cohen and Haim Kaplan .
PODS 2011 | cite| pdf | arXiv:1109.1325
Talk (pdf) (pptx) given at TAU, Stanford, MSR SV
Independent sampling was widely believed to be useless for basic queries, including distinct values counts: Information theoretic lower bounds show that we need to sample almost the full data set in order to obtain meaningful estimates. We show that when sampling with "reproducible randomization," these lower bounds do not apply and we obtain good estimators when only a small fraction of the data is sampled for: distinct counts, their generalization of L1 norm of the coordinate-wise maximum, and in follow-up work for Lp difference. We also propose a principled method of deriving order-optimal estimators.
What you can do with Coordinated Samples
Edith Cohen and Haim Kaplan .
RANDOM-APPROX 2013 | cite| arXiv:1206.5637 | conference talk | (pdf) (pptx) .

For coordinated sampling, which has simpler structure than independent sampling, we were able to precisely characterize the queries for which good estimators exist. We introduce a notion of variance competitiveness, meaning that for any data, the expectation of the square of the estimate is bounded by the minimum possible for the data. Moreover, we show that O(1) variance competitiveness is always attainable when an estimator with finite variances exists.
Estimation for Monotone Sampling: Competitiveness and Customization
Edith Cohen.
arXiv:1212.0243 | PODC 2014 | cite | PODC talk slides (pptx) (pdf)

We define a model for monotone sampling and monotone estimation: Suppose we are interested in estimating a nonnegative function f of the data from a "sample" of the data. We define a "sampling scheme" here as a function of the data and a (uniform at random) seed value from [0,1]. Monotonicity of the sampling scheme means that the output provides more information on the data when the seed value is lower. Our goal is to obtain a good estimator for f. We are interested in estimators that are unbiased (because for applications we sum aggregate many estimation problems) and nonnegative (because the function f is). We also seek admissible and variance competitive estimators. This formulation generalizes coordinated sampling. Interestingly, the characterization we provided here for this special case extends to monotone sampling.
We make several contributions aimed at both a better theoretical understanding of monotone sampling and for providing practical and effective estimators to be used as tools for data mining and analysis. We explore the space of admissible and variance competitive estimators. We show how to derive an order-optimal estimator for any specified order on the data domain. This allows us to customize the estimator by choosing an order which reflects patterns present in our data. We present the L* estimator, which has a very compelling and natural set of properties: The L* estimator is defined (and can be constructed) for any monotone estimation problem for which an estimator with finite variances exists. It is guaranteed to be (at most) 4-competitive. Moreover, it is the unique admissible monotone estimator, meaning that the estimate we obtain is non-decreasing with the information we have on the data. In that, it improves over the classic Horvitz-Thompson estimator. Lastly, we explore and tighten the gap between the worst-case upper and lower bounds of variance competitiveness.
Distance Queries from Sampled Data: Accurate and Efficient
Edith Cohen.
arXiv:1203.4903 | KDD 2014 | KDD talk slides (pptx) (pdf) | cite

A particularly important class of queries are Lp difference queries. The query specifies a domain via a selection predicate over keys and we are interested in the Lp difference on the values assumed by these keys in the different instances. When the data is sampled, we are interested in approximating the Lp difference from the samples. We derive Lp difference estimators that are applied to sampled instances. We consider both independent and coordinated sampling. For independent sampling extending our technique and for coordinated sampling using our monotone estimation framework. We experimentally study these estimators to data sets with different properties. We demonstrate accurate estimates with only a small fraction of items sampled. For coordinated samples, we consider the L* estimator, which is optimized for small differences, the U* estimator, which is optimized for large differences, and the optimally competitive estimator, which minimizes the worst ratio over the data domain. We demonstrate the benefit of an informed choice of estimators when we have information on data patterns.
Coordinated weighted sampling for estimating aggregates over multiple weight assignments.
Edith Cohen, Haim Kaplan , and Shubhabrata Sen .
VLDB 2009 | cite| pdf | arXiv:0906.4560v2
Leveraging discarded samples for tighter estimation of multiple-set aggregates.
Edith Cohen and Haim Kaplan .
Sigmetrics 2009 | cite| pdf | ppt

Sampling unaggregated data: Streamed and distributed

Many data sets occur in an unaggregated form, where multiple data points are associated with each key. In the aggregated view of the data, the weight of a key is the sum of the weights of data points associated with the key and queries (such as heavy hitters, sum statistics, distinct counts) are posed over the aggregate view. Examples of unaggreagated data sets are IP packet streams where keys are flow keys, individual requests to resources where key is resource ID, interactions of users with Web services, where keys are cookies, and distributed streams of events registered by sensor networks, where keys are event types. Since data points are scattered in time or across multiple servers, aggregation is subject to resource limitation on storage and transmission. Therefore, we aim for efficient processing of queries and computing sample-based summaries directly over the unaggregated data. We consider both a model where the algorithm can "see" the complete data set (stream) and another model, where the data is subjected to per-entry fixed-rate sampling before it is made accessible to the summarization algorithm. This second model is motivated by sampled NetFlow, which was deployed in high speed IP routers. Also belongs in this category (but listed elsewhere) is the application to approximate distinct counting of my HIP estimators.

Stream Sampling for Frequency Cap Statistics.
Edith Cohen.

KDD 2015 | cite| talk ( animated pdf , handouts pdf ) | arXiv 1502.05955

Two common statistics over unaggreagated data are distinct keys, which is the number of active keys in a specified segment, and sum, which is the sum of the frequencies of keys in the segment. These are two special cases of frequency cap statistics, defined as the sum of frequencies capped by a parameter T, which are popular in online advertising platforms. We propose the first solution for scalable and accurate estimation of general frequency cap statistics. Our framework brings the benefits of approximate distinct counters to general frequency cap statistics.
Don't let the negatives bring you down: Sampling from streams of signed updates.
Edith Cohen, Graham Cormode, and Nick Duffield.

SIGMETRICS 2012 | cite| pdf
Composable, scalable, and accurate weight summarization of unaggregated data sets.
Edith Cohen, Nick Duffield, Haim Kaplan , Carsten Lund , and Mikkel Thorup

PVLDB 2009 | cite| pdf
Confident estimation for multistage measurement sampling and aggregation.
Edith Cohen, Nick Duffield, Carsten Lund , and Mikkel Thorup.

SIGMETRICS 2008 | cite| pdf
Sketching unaggregated data streams for subpopulation-size queries.
Edith Cohen, Nick Duffield , Haim Kaplan , Carsten Lund , and Mikkel Thorup
PODS 2007| cite| pdf
Algorithms and estimators for accurate summarization of Internet traffic.
Edith Cohen, Nick Duffield , Haim Kaplan , Carsten Lund , and Mikkel Thorup
IMC 2007| cite | pdf
Algorithms and estimators for summarization of unaggregated data streams.
Edith Cohen, Nick Duffield , Haim Kaplan , Carsten Lund , and Mikkel Thorup
JCSS 2014 | pdf : Full version based on IMC 2007 and PODS 2007 papers.
Processing Top-k queries from samples.
Edith Cohen, Nadav Grossaug, and Haim Kaplan .
CONEXT 2006| cite
Computer Networks 2008| cite | pdf

More with sampling

Average distance queries through weighted samples in graphs and metric spaces: High scalability with tight statistical guarantees
S. Chechik, E. Cohen, and H. Kaplan
Random-Approx 2015| cite| arXiv 1503.08528.

The average distance from a node to all other nodes in a graph, or from a query point in a metric space to a set of points, is a fundamental quantity in data analysis. The inverse of the average distance, known as the (classic) closeness centrality of a node, is a popular importance measure in the study of social networks. We develop novel structural insights on the sparsifiability of the distance relation via weighted sampling. Based on that, we present novel algorithms with strong statistical guarantees for estimating average distances from a query point/node or between all pairs of points/nodes. Our results significantly improve over previous work, which was mostly based on uniform sampling, in terms of computation and estimation quality tradeoffs.

Algorithmic Game Theory: Envy Freeness

Truth, Envy, and Truthful Market Clearing Bundle Pricing
Edith Cohen, Michal Feldman , Amos Fiat , Haim Kaplan , Svetlana Olonetsky.

WINE 2011 | cite| pdf
Envy-free makespan approximation.
Edith Cohen, Michal Feldman , Amos Fiat , Haim Kaplan , Svetlana Olonetsky.

EC 2010 | cite
SIAM J. Comput. 2012 | cite| pdf

WWW performance

Freshness/Aging issues in the performance of Web caches

Refreshment policies for Web content caches.
Edith Cohen and Haim Kaplan.

Infocom 2001 | cite | talk (ppt) | proceedings (ps) | (HTML)
Computer Networks 2002 | cite | full (ps)
The age penalty and its effect on cache performance.
Edith Cohen and Haim Kaplan.

USITS 2001 | cite | proceedings (ps) (HTML) | talk ppt
Aging through cascaded caches: performance issues in the distribution of Web content.
Edith Cohen and Haim Kaplan.

SIGCOMM 2001| cite | ps (proc)
ps (full) | pps , ppt: 1hr presentation at Stanford Networking Seminar (October 25, 2001)
Performance aspects of distributed caches using TTL-based consistency.
Edith Cohen, Eran Halperin and Haim Kaplan.

ICALP 2001 | cite | ps (proc)
Theoretical Computer Science 2001 | cite | ps (full)

Improved Web user-perceived latency through reduced communication-establishment overhead

Prefetching the means for document transfer: A new approach for reducing Web latency.
Edith Cohen and Haim Kaplan.

Infocom 2000 | cite | ps (proc)
Computer Networks 2002 | cite | ps (full)
Proactive caching of DNS records: Addressing a performance bottleneck.
Edith Cohen and Haim Kaplan.

SAINT 2001 | cite | ps (proc) | Talk (ppt)
Computer Networks 2003 | cite | ps
Talk (ppt): TAU CS Colloquium (April 9, 2000) and Stanford Networking Seminar (May 10, 2000)

Improved Web performance by enhanced end-to-end communication

Improving end-to-end performance of the Web using server volumes and proxy filters
Edith Cohen and Balachander Krishnamurthy and Jennifer Rexford.

SIGCOMM '98 | cite | ps.gz | Talk (ps) by Bala Krishnamurthy
Talk (ps) 1 hour by Jennifer Rexford
Efficient algorithms for predicting requests to Web servers.
Edith Cohen, Balachander Krishnamurthy, and Jennifer Rexford.

Infocom 1999| cite | ps.gz

Cache replacement for Web pages: theory and experiments

Evaluating server-assisted cache replacement in the Web
Edith Cohen, Balachander Krishnamurthy, and Jennifer Rexford.

ESA 1998| cite| ps | Talk (ps)
Exploiting regularities in Web traffic patterns for cache replacement.
Edith Cohen and Haim Kaplan.

STOC 1999| cite| proceedings (ps.gz) (ps) || Talk (ps.gz) by Haim Kaplan
Algorithmica 2002 | cite | paper (ps.gz) (ps)
Caching documents with varying sizes and fetching costs: An LP-Based approach.
Edith Cohen and Haim Kaplan.

SODA 1999 | cite
Algorithmica 2002 | cite | ps
Competitive analysis of the LRFU paging algorithm.
Edith Cohen, Haim Kaplan, and Uri Zwick.

WADS 2001| cite
Algorithmica 2002 | cite | ps | HTML | Talk (ppt) by Haim Kaplan

Connection Caching: theory and experiments

Connection caching.
Edith Cohen, Haim Kaplan, and Uri Zwick.
STOC 1999| cite| ps
JCSS 2003 | cite| ps | Talk (ps)
Connection caching under various models of communication.
Edith Cohen, Haim Kaplan, and Uri Zwick.
SPAA 2000| cite| ps | Talk (ps) at SPAA
Managing TCP connections under persistent HTTP.
Edith Cohen, Haim Kaplan , and Jeffrey Oldham.
WWW 1999| cite| HTML
Computer Networks 1999| cite | ps

Networking: Routing, Path selection, Measurements, Packet filtering

Probe Scheduling for Efficient Detection of Silent Failures
Edith Cohen, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Danny Raz and Yoav Tzur.
arXiv:1302.0792 | Performance | cite | 2014.
Packet Classification in Large ISPs: Design and Evaluation of Decision Tree Classifiers.
Edith Cohen and Carsten Lund.

Sigmetrics 2005| cite | (ps) (pdf)
Making Intra-Domain Routing Robust to Changing and Uncertain Traffic Demands: Understanding Fundamental Tradeoffs.
David Applegate and Edith Cohen.

SIGCOMM 2003 | cite | ps | Talk (ppt)
TON 2006 | cite
Coping with Network Failures: Routing Strategies for Optimal Demand Oblivious Restoration.
David Applegate, Lee Breslau, and Edith Cohen.

Sigmetrics/Performance 2004 | cite| ps
Optimal oblivious routing in polynomial time.
Yossi Azar, Edith Cohen, Amos Fiat, Harald Räcke, and Haim Kaplan.

STOC 2003| cite | ps (proceedings) | Talk (ppt)
JCSS 2004 | cite
Predicting and bypassing end-to-end Internet service degradations.
Anat Bremler-Barr, Edith Cohen, Haim Kaplan , and Yishay Mansour.
IMW 2002| cite | proceedings (ps) | IMW02 Talk (ppt) by Anat Bremler-Barr
JSAC 2003 | cite | full (ps)
Restoration by path concatenation: Fast recovery of MPLS paths.
Anat Bremler-Barr, Yehuda Afek, Edith Cohen, Haim Kaplan , and Michael Merritt.
SIGMETRICS 2001 (poster)
PODC 2001| cite | proceedings (ps)
Distributed Computing 2002 | cite | full (ps)

Content Distribution

Peer-to-Peer networks; replication and search in distributed datasets

Replication Strategies in Unstructured Peer-to-Peer Networks.
Edith Cohen and Scott Shenker.

SIGCOMM 2002| cite | proceedings (ps)| Talk (ppt)
A Case for Associative Peer to Peer Overlays.
Edith Cohen, Amos Fiat, and Haim Kaplan.

HotNets-I 2002 (position paper) | cite | proceedings (ps)
Associative Search in Peer to Peer Networks: Harnessing Latent Semantics.
Edith Cohen, Haim Kaplan, and Amos Fiat.

Infocom 2003| cite | proceedings (ps)
Talk (ppt) at Stanford Networking Seminar (CS 548) (May 15, 2002)
Computer Networks 2007 | cite | full (pdf)
Search and replication in unstructured peer-to-peer networks.
Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker.

Sigmetrics 2002 (poster) | 2-page abstract (pdf)
ICS 2002| cite | proceedings (pdf)
full (pdf)
Balanced-replication algorithms for distribution trees.
Edith Cohen and Haim Kaplan.

ESA 2002| cite | proceedings (ps)
Siam J. Computing 2005 | cite| full (ps)
Efficient Sequences of Trials.
Edith Cohen and Amos Fiat and Haim Kaplan.

SODA 2003 | cite | proceedings (ps)

Data: XML, Data streams, Spatial aggregation

Labeling Dynamic XML Trees.
Edith Cohen, Tova Milo, and Haim Kaplan.

PODS 2002| cite | proceedings (ps)
Siam J. Computing 2010 | cite | full (pdf)
Maintaining Time-Decaying Stream Aggregates.
Edith Cohen and Martin Strauss.

PODS 2003 | cite | proceedings (ps)
J. Algorithms 2006 | cite | full (ps) (pdf)
Talk (ppt)
Efficient Estimation Algorithms for Neighborhood Variance and Other Moments.
Edith Cohen and Haim Kaplan .

SODA 2004| cite | proceedings (ps)
full (ps) (pdf)
Spatially-Decaying Aggregation over a Network: Model and Algorithms.
Edith Cohen and Haim Kaplan .

SIGMOD 2004| cite | (ps)
JCSS 2007 | cite | full (ps) (pdf)

Data Mining / Machine Learning

Learning noisy perceptrons by a perceptron in polynomial time.

FOCS 1997| cite | proceedings (ps) | full (ps) | Talk (ps)
Approximating matrix multiplication for pattern recognition tasks.
Edith Cohen and David Lewis.
SODA 1998 | cite
J. Algorithms 1999 | cite | paper (ps.gz) (ps) | paper without plots (ps) | Talk (ps)
Finding Interesting Associations without Support Pruning.
Edith Cohen, Mayur Datar, Shinji Fujiware, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Jeffrey D. Ullman, and Cheng Yang.

ICDE 2000 | cite | proceedings (ps)
TKDE 2001 | cite | full (ps)
A Short Walk in The Blogistan.
Edith Cohen, Balachander Krishnamurthy

Computer Networks 2006 | cite | (pdf)

Algorithms for graphs and networks

Scalable Analysis of Massive Graphs

Graphs with billions of edges, such as social and Web graphs, are common and are a powerful source of information. The graph structure, alone or combined with meta-data, a static snapshot or evolving, encodes information on centralities, interests, similarities, influence, and communities. To effectively mine this information, algorithms must be highly scalable and models should accurately capture the real-world properties we are after.
I use a combination of algorithmic techniques, sketching and estimation techniques, modeling, and data sets, to develop effective tools for mining massive graphs.

Greedy Maximization Framework for Graph-based Influence Functions
E. Cohen
HotWeb 2016 | cite | arXiv 1608.04036 | HotWeb presentation slides (pptx) (pdf)

Influence functions in a graph are defined for subsets of nodes, according to graph structure. The aim is to find subsets of seed nodes with (approximately) optimal tradeoff of size and influence. We define a rich class of graph-based influence functions which unifies and extends previous work. Influence is defined through pairwise utility values derived from relations such as reachability, distance, reverse rank. The utility of a seed set to a node is then defined as a submodular aggregate of individual seed utilities. Finally, the influence of the seed set is the sum over nodes of its utility to the node. Previous work focused on maximum aggregation only. More general submodular aggregation captures fault tolerance of the seed set and more generally allows value from additional seeds. We present a meta-algorithm for approximate greedy maximization with strong approximation quality guarantees and worst-case near-linear computation for all functions in our class. Our meta-algorithm generalizes recent designs specific for maximum aggregation and specific utility functions.
Reverse Ranking by Graph Structure: Model and Scalable Algorithms.
E. Buchnik and E. Cohen
SIGMETRICS 2016 | cite | arXiv 1506.02386 | code (by Eliav Buchnik).

Distances in a network capture relations between nodes and are the basis of centrality, similarity, and influence measures. Often, however, the relevance of a node u to a node v is more precisely measured not by the magnitude of the distance, but by the number of nodes that are closer to v than u. That is, by the rank of u in an ordering of nodes by increasing distance from v. We identify and address fundamental challenges in rank-based graph mining. We first consider single-source computation of reverse-ranks and design a ``Dijkstra-like'' algorithm which computes nodes in order of increasing approximate reverse rank while only traversing edges adjacent to returned nodes. We then define reverse-rank influence and present a near-linear algorithms for greedy approximate reverse-rank influence maximization. The design relies on our single-source algorithm and on the SKIM sketch-based approximate greedy maximization framework. Our algorithms utilize near-linear preprocessing of the network to compute all-distance sketches. As a contribution of independent interest, we present a novel algorithm for computing these sketches, which have many other applications, on multi-core architectures. We complete this work by establishing the hardness of computing exact reverse-ranks for a single source and exact reverse-rank influence, which show that with near-linear computation, the small relative errors we obtain are the best we can currently hope for. Finally, we conduct an experimental evaluation on graphs with tens of millions of edges, demonstrating both scalability and accuracy. This project is Eliav's M.Sc. thesis (Tel-Aviv university).
Distance-Based Influence in Networks: Computation and Maximization
Edith Cohen, Daniel Delling , Thomas Pajor, and Renato F. Werneck.
arXiv 1410.6976

A premise at a heart of network analysis is that entities in a network derive utilities from their connections. The influence of a seed set S of nodes is defined as the sum over nodes j of the utility of S to j. Distance-based utility, which is a decreasing function of the distance from S to j, was explored in several successful research threads from social network analysis and economics: Network formation games [Bloch and Jackson 2007], Reachability-based influence [Richardson and Domingos 2002; Kempe et al. 2003]; ``threshold'' influence [Gomez-Rodriguez et al. 2011]; and {\em closeness centrality} [Bavelas 1948]. We formulate a model that unifies and extends this previous work and address the two fundamental computational problems in this domain: Influence oracles and influence maximization (IM). An oracle performs some preprocessing, after which influence queries for arbitrary seed sets can be efficiently computed. With IM, we seek a set of nodes of a given size with maximum influence. Since the IM problem is computationally hard, we instead seek a greedy sequence of nodes, with each prefix having influence that is at least 1-1/e of that of the optimal seed set of the same size. We present the first highly scalable algorithms for both problems, providing statistical guarantees on approximation quality and near-linear worst-case bounds on the computation. We perform an experimental evaluation which demonstrates the effectiveness of our designs on networks with hundreds of millions of edges.
Sketch-based Influence Maximization and Computation: Scaling up with Guarantees
Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F. Werneck.
ACM CIKM 2014 | CIKM slides by Thomas Pajor (pdf) | Microsoft Research TR 2014-110 | arXiv 1408.6268 | cite | SKIM code and documentation (by Thomas Pajor, copyright held by Microsoft Corporation)

Propagation of contagion is a fundamental process in social, biological, and physical networks. It is used to model the spread of information, influence, or a viral infection. Diffusion patterns can be specified by a probabilistic model, such as Independent Cascade (IC), or captured by a set of representative traces. Basic computational problems in the study of diffusion are influence queries (determining the potency of a specified {\em seed set} of nodes) and Influence Maximization (identifying the most influential seed set of a given size). Answering each influence query involves many edge traversals, and does not scale when there are many queries on very large graphs. The gold standard for Influence Maximization is the greedy algorithm, which iteratively adds to the seed set a node maximizing the marginal gain in influence. Greedy has a guaranteed approximation ratio of at least (1-1/e) and produces a sequence of nodes, with each prefix having approximation guarantee with respect to the same-size optimum. Since Greedy does not scale well beyond a few million edges, for larger inputs one must currently use either heuristics or alternative algorithms designed for a pre-specified small seed set size. We propose a novel sketch-based design for influence computation. Our greedy Sketch-based Influence Maximization (SKIM) algorithm scales to graphs with billions of edges, with one to two orders of magnitude speedup over the best greedy methods. It still has a guaranteed approximation ratio, and in practice its quality nearly matches that of exact greedy. We also present influence oracles , which use linear-time preprocessing to generate a small sketch for each node, allowing the influence of any seed set to be quickly answered from the sketches of its nodes.
In terms of techniques, this work utilizes my 1994 reachability sketching technique for directed graphs and efficient estimators. We extend reachability sketches to be across multiple graphs or a probabilistic model, retaining the same storage requirement and approximation guarantees. For influence maximization, our efficient approximate greedy algorithm SKIM only computes sketches to the point needed to determine the approximate maximum marginal gain with sufficient confidence. It also updates the sketches after each selection to be with respect to the residual problem, so that no work is wasted.
All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis
Edith Cohen.
PODS 2014 | PODS slides (pptx) (pdf) | arXiv:1306.3284 | cite

The All-Distances Sketches (ADS), which I introduced back in 1994, turn out to be a very effective tool for mining massive graphs. A small-size ADS can be efficiently computed for each node in the graph. In my original work I used the ADS of each node to estimate neighborhood cardinalities (number of nodes within a certain distance) and also showed how sketches of different nodes can be used to estimate neighborhood relations. We provide a unified view of several ADS definitions which evolved over time and propose the novel HIP estimators. The HIP estimators obtain tighter estimates of neighborhood cardinalities and closeness centralities from the sketch of a node. They also facilitate good estimates for a powerful natural class of queries. We also show how the HIP estimators can be applied to approximate distinct counting on data streams, improving over state-of-the-art practical algorithms.
Computing Classic Closeness Centrality, at Scale
Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F. Werneck.
ACM COSN 2014 (best paper) | Microsoft Research TR 2014-71 | arXiv:1409.0035 | talk slides (pptx) (pdf) | cite

Closeness centrality is an importance measure of nodes in social and massive graphs which is based on the distances from the node to all other nodes. The classic definition, proposed by Bavelas (1950), Beauchamp (1965), and Sabidussi (1966), is (the inverse of) the average distance to all other nodes. We propose the first highly scalable (near linear-time processing and linear space overhead) algorithm for estimating, within a small relative error, the classic closeness centralities of all nodes in the graph. Our algorithm provides strong probabilistic guarantees on the approximation quality for all nodes of any undirected graph, as well as for centrality computed with respect to round-trip distances in directed graphs. For directed graphs, we also propose algorithms that approximate generalizations of classic closeness centrality to outbound and inbound centralities. This algorithm does not provide worst-case theoretical approximation guarantees, but performs well on the networks we tested. We perform extensive experiments on large networks, demonstrating high scalability and accuracy.
Scalable Similarity Estimation in Social Networks: Closeness, Node Labels, and Random Edge Lengths
Edith Cohen, Daniel Delling, Fabian Fuchs, Andrew V. Goldberg, Moises Goldszmidt, and Renato F. Werneck
ACM COSN 2013 | Microsoft Research TR 201773 | cite

Similarity estimation between nodes based on structural properties of graphs is a basic building block in the analysis of massive networks, used for link prediction, product recommendations, advertisement, attribute completion, and more. Similarity measures that are based on global properties have better recall than local ones but are harder to compute. We make several contributions aiming for both accuracy and scalability: First, we define closeness similarity, a natural measure that compares two nodes based on the similarity of their relations to all other nodes. Second, we show how the all-distances sketch (ADS) node labels, which are efficient to compute, can support the estimation of closeness similarity in logarithmic query time. ADSs can also be used as SP distance oracles, matching the performance of dedicated techniques. Third, we propose the randomized edge lengths (REL) technique and define the corresponding REL distance, which captures both path length and path multiplicity and therefore improves over the SP distance as a similarity measure. The REL distance can also be the basis of closeness similarity and can be estimated using SP computation or using the ADS labels.

Size-Estimation Scheme and Applications

My size-estimation scheme is based on a simple but powerful technique: the least-element (LE) method. This method is better known today as min-hash: Suppose you have subset from a universe of items. You (implicitly) assign a random permutation (or random values) to all items. We refer to the permutation rank or the random value assigned to an item as its rank. Now, the LE of a subset is the minimum rank of its members. The LE is in expectation smaller when there are more elements in the set. Therefore, it is possible to estimate the size of the set from its LE rank. When LE ranks of different subsets are obtained using the same random permutation, they are coordinated. Coordination facilitates many applications. In particular, it means that the LE ranks are mergeable: The LE of the union of subsets is the minimum of the LEs of its members. The LE ranks can also be used to estimate similarity of sets. For example, the probability that two sets have the same LE is proportional to their Jaccard similarity, which is the ratio of the intersection size to the union size. Therefore, the Jaccard similarity of the sets can be estimated from the LE similarity.
The accuracy of these size or similarity estimates can be enhanced by repeating this k times (using k permutations), by taking the k smallest ranks in a single permutation, or by hashing elements randomly to k buckets and taking the LE of each of the k buckets. I refer to these sketch flavors, respectively, as k-mins, bottom-k, or k-partition sketches. These three flavors have different advantages, depending on the tradeoffs we want to achieve. Asymptotically, the estimators with all three have the same behavior but bottom-k sketches carry the most information. In this early work (1994-1997), I studied mostly k-mins estimators (with mention of bottom-k sketches) and studied cardinality and similarity estimation.
The first application of the LE technique for size estimation I am aware of is for approximate distinct counting [Flajolet and Martin 1985]. As for similarity estimation, LE sketches can be viewed as a special case of coordinated samples [Brewer, Early, Joice 1972]. A very well known application of the LE technique is for near-duplicate detection of Web pages [Broder 1997].
I first applied the LE technique in a graph setting (see below): It turned out that (coordinated) sketches of all neighborhoods, and all reachability sets of all nodes in a graph can be very efficiently computed, with processing that is nearly-linear in graph size. This resulted in a powerful technique for analyzing very large graphs. Other early applications I explored (see below) are determining the optimal order to perform a chain multiplication of sparse matrices and sketch-based 1 tracking of the state of distributed processes for roll-back recovery.
In later papers, I use the term All-Distances sketch for the LE values with respect to all neighborhoods ("balls") of a node in the graph. We consider spatially-decaying agrregation , neighborhood variances , use of bottom-k sketches, and deriving even better estimators.

Size-estimation framework with applications to transitive closure and reachability.
Edith Cohen.
FOCS 1994 | cite
JCSS 1997 | cite| paper (ps) (pdf)
Talk (ps) (pdf) (25th Columbia Theory Day, Columbia University, April 7, 1995)
Structure Prediction and Computation of Sparse Matrix Products.
Edith Cohen.
IPCO 1996 | cite | Talk (ps)
J. Combinatorial Optimization 1997 | cite | paper (ps.gz) (ps) (pdf)
(Experimental study applying the estimation framework to optimize sparse matrix multiplications.)
When Piecewise determinism is almost true.
Edith Cohen, Yi-Min Wang, and Gaurav Suri.
PCICFT 1995 | cite | paper (ps.gz) (ps)

Distance and Reachability Labels

A labeling approach to incremental cycle detection.
Edith Cohen, Amos Fiat, Haim Kaplan, and Liam Roditty.
arXiv:1310.8381 2013.
Reachability and distance queries via 2-hop labels.
Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick.
SODA 2002| cite | proceedings (ps)
Siam J. Comput. 2003 | cite | full (ps)

Shortest Paths: Trading Accuracy and Time

All pairs small stretch paths.
Edith Cohen and Uri Zwick .
SODA 1997 | cite
J. Algorithms 2001 | cite | paper (ps.gz) (ps)
Fast algorithms for t-spanners and stretch-t paths.

FOCS 1993 | cite
Siam J. Computing 2002 | cite | paper (ps.gz) (ps)

Parallel Algorithms: Shortest Paths and Maximum Flow

These algorithms, which I developed in the early 1990's, consider the idealized PRAM model (Parallel Random Access Machine) of parallel computation. The PRAM model assumes we have as many parallel processors as we can use, and considers the tradeoff between processing time and the total "work" we use (which is the product of time and number of processors). The principled exploration of the fundamental chains of dependencies, however, is relevant for the GPUs, multi-core, and map-reduce architecture which dominate massive computation today. Some of the structures I used also turn out to be relevant for the design of sketching, streaming, and dynamic sequential algorithms. Finally, the basic ideas are rather simple, not (only) aiming for improved worst-case bounds but also for simplicity and implementability.

Polylog-time and near-linear work approximation scheme for undirected shortest-paths.
Edith Cohen

STOC 1994 | cite | Talk (pdf)
J. ACM 2000 | cite | paper (ps) (pdf)
Single-source shortest-paths (SSSP) computation seemed not to parallelize efficiently. We show that for undirected graphs, if we allow a slight fractional stretch (ratio of approximate distance to actual distance), we can perform s approximate SSSP computations in polylogarithmic time using work that is only a little more than performing s sequential applications of Dijkstra's SSSP algorithm. Our chief instrument, which is of independent interest, are efficient constructions of Hopsets. A hopset for a fraction e and integer d is a small set of auxiliary edges E' such that the shortest d-hop path in E+E' between any pair of nodes has length that is at most (1+e) times the distance between the nodes. Hopsets can also be used as sparse spanners with fractional stretch and small additive error.
Efficient parallel shortest-paths in digraphs with a separator decomposition.
Edith Cohen

SPAA 1993| cite
J. Alg 1994 | cite | paper (ps.gz) (ps)
Using selective path-doubling for parallel shortest-path computations.
Edith Cohen

ISTCS 1993| cite
J. Alg 1997 | cite| paper (ps.gz) (ps)
Approximate max-flow on small depth networks.
Edith Cohen

FOCS 1992 | cite
Siam J. Computing 1995 | cite | paper (ps)

Combinatorial Optimization / Linear Programming

On the Tradeoff between Stability and Fit.
Edith Cohen, Graham Cormode, Nick Duffield, and Carsten Lund.
arXiv:1302.2137 | ACM Tran. Algorithms 2016 | cite | 2016.
Scheduling Subset Tests: One-Time, Continuous, and How They Relate.
Edith Cohen, Haim Kaplan, and Yishay Mansour.
paper (pdf) | RANDOM-APPROX 2013 | cite | conference talk | (pdf) (pptx) |
Improved algorithms for linear inequalities with two variables per inequality.
Edith Cohen and Nimrod Megiddo.
STOC 1991 | cite
Siam J. Comput. 1994 | cite| paper (ps)
New algorithms for generalized network flows.
Edith Cohen and Nimrod Megiddo.
ISTCS 1992 | conference citation
Math. Prog. 1994 | cite| paper (ps)
Complexity analysis and algorithms for some flow problems.
Edith Cohen and Nimrod Megiddo.
SODA 1991 | cite
Algorithmica 1994 | cite | paper (ps)
Strongly polynomial time and NC algorithms for detecting cycles in periodic graphs.
Edith Cohen and Nimrod Megiddo.
STOC 1989 | cite
J. ACM 1993 | cite | paper (ps)
Recognizing properties of periodic graphs.
Edith Cohen and Nimrod Megiddo.
Applied Geometry and Discrete Math 1991 | cite| paper (ps)
Maximizing concave functions in fixed dimension.
Edith Cohen and Nimrod Megiddo.
Complexity in Numerical Optimization 1993 | cite | paper (ps)

Misc.

Design of a CDMA decoder

Multi-rate detection for the IS-95 CDMA forward traffic channel.
Edith Cohen and Hui-Ling Lou.
GLOBECOM 1995 | cite | paper (ps)
Multi-rate detection for the IS-95A CDMA forward traffic channel using the 13kbs speach coder.
Edith Cohen and Hui-Ling Lou.
ICC 1996 | cite | paper (ps) | multimedia recording of a talk by Hui-Ling Lou

Software tools

Improvise: Interactive Multimedia Process Visualization Environment.
Naser Barghouti, Edith Cohen, and Lefty Koutsofios.
ESEC 1995 | cite | paper (ps.gz) (ps)

Publications in reverse chronological order

Conference Papers

Ph.D. Thesis, Books

Journal Papers

Other work (not published in a peer reviewed venue)

Other work (not yet peer reviewed):

Distance-based Influence: Computation and Maximization.
E. Cohen, D. Delling, T. Pajor, and R. F. Werneck
arXiv 1410.6976. details
Variance competitiveness for monotone estimation: Tightening the bounds.
E. Cohen
arXiv:1406.6490
A labeling approach to incremental cycle detection
E. Cohen, A. Fiat, H. Kaplan, L. Roditty
arXiv:1310.8381

Conference Papers:

Greedy Maximization Framework for Graph-based Influence Function
E. Cohen.
IEEE HotWeb 2016. details
Reverse Ranking by Graph Structure: Model and Scalable Algorithms.
E. Buchnik and E. Cohen
ACM SIGMETRICS 2016. details
Multi-objective weighted sampling.
E. Cohen.
IEEE HotWeb 2015. details
Average distance queries through weighted samples in graphs and metric spaces: High scalability with tight statistical guarantees.
S. Chechik, E. Cohen, and H. Kaplan
RANDOM-APPROX 2015. details
Stream sampling for frequency cap statistics.
E. Cohen.
ACM KDD 2015. details
Sketch-based Influence Maximization and Computation: Scaling up with Guarantees.
E. Cohen, D. Delling, T. Pajor, and R. F. Werneck
ACM CIKM 2014. details
Computing Classic Closeness Centrality, at Scale.
E. Cohen, D. Delling, T. Pajor, and R. F. Werneck
ACM COSN 2014. (best paper award!) details
Probe Scheduling for Efficient Detection of Silent Failures.
E. Cohen, A. Hassidim, H. Kaplan, Y. Mansour, D. Raz, and Y. Tzur,
Performance 2014. details
Distance Queries from Sampled Data: Accurate and Efficient.
E. Cohen,
ACM KDD 2014. details
Estimation for Monotone Sampling: Competitiveness and Customization.
E. Cohen,
ACM PODC 2014. details
All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis.
E. Cohen,
ACM PODS 2014. details
Scalable Similarity Estimation in Social Networks: Closeness, Node Labels, and Random Edge Lengths.
E. Cohen, D. Delling, F. Fuchs, A. V. Goldberg, M. Goldszmidt, and R. F. Werneck
COSN 2013. details
Scheduling Subset Tests: One-Time, Continuous, and How They Relate.
E. Cohen, H. Kaplan, and Y. Mansour
RANDOM-APPROX 2013. details
What you can do with coordinated samples.
E. Cohen and H. Kaplan
RANDOM-APPROX 2013. details
Don't let the negatives bring you down: Sampling from streams of signed updates.
E. Cohen, G. Cormode, N. G. Duffield.
SIGMETRICS 2012. details
Get the most out of your sample: Optimal unbiased estimators using partial information.
E. Cohen, H. Kaplan.
PODS 2011: 13-24. details
Structure-aware sampling on data streams.
E. Cohen, G. Cormode, N. G. Duffield.
SIGMETRICS 2011: 197-208. details
Truth, envy, and truthful market clearing bundle pricing.
E. Cohen, M. Feldman, A. Fiat, H. Kaplan, S. Olonetsky.
WINE 2011: 97-108. details
Structure-aware sampling: Flexible and accurate summarization.
E. Cohen, G. Cormode, N. G. Duffield.
PVLDB 4(11): 819-830. 2011. details
Envy-free makespan approximation.
E. Cohen, M. Feldman, A. Fiat, H. Kaplan, S. Olonetsky.
EC 2010: 159-166. 2010. details
Composable, scalable, and accurate weight summarization of unaggregated data sets.
E. Cohen, N. Duffield, H. Kaplan, C. Lund, M. Thorup.
PVLDB 2(1): 431-442. 2009.
Coordinated weighted sampling for estimating aggregates over multiple weight assignments.
E. Cohen, H. Kaplan, S. Sen.
PVLDB 2(1): 646-657. 2009.
Leveraging discarded samples for tighter estimation of multiple-set aggregates.
E. Cohen, H. Kaplan.
SIGMETRICS 2009: 251-262. 2009.
Stream sampling for variance-optimal estimation of subset sums.
E. Cohen, N. G. Duffield, H. Kaplan, C. Lund, M. Thorup. SODA 2009: 1255-1264. 2009.
Estimating Aggregates over Multiple Sets.
E. Cohen, H. Kaplan.
ICDM 2008: 761-766. 2008.
Confident estimation for multistage measurement sampling and aggregation.
E. Cohen, N. G. Duffield, C. Lund, M. Thorup.
SIGMETRICS 2008: 109-120. 2008.
Tighter estimation using bottom-k sketches.
E. Cohen, H. Kaplan.
VLDB 2008. 2008.
Algorithms and estimators for accurate summarization of Internet traffic.
E. Cohen, N. Duffield, H. Kaplan, C. Lund and M. Thorup.
In Proceedings of the 2007 ACM/SIGCOMM Internet Measurement Conference (IMC 2007). 2007.
Summarizing data using bottom-k sketches.
E. Cohen and H. Kaplan.
In Proceedings Twenty-Sixth Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC 2007). 2007.
Sketching unaggregated data streams for subpopulation-size queries.
E. Cohen, N. Duffield, H. Kaplan, C. Lund and M. Thorup.
In Proceedings 26th ACM SIGMOD-SIGACT0SIGART Symposium on Principles of Database Systems (PODS 2007). 2007.
Bottom-k sketches: better and more efficient estimation of aggregates.
E. Cohen and H. Kaplan.
In Proceedings of the ACM Sigmetrics/Performance Joint International Conference on Measurements and Modeling of Computer Systems (poster). 2007.
Processing Top-k Queries from Samples.
E. Cohen, N. Grossaug, and H. Kaplan.
In Proceedings of the 2nd conference on Future Networking Technologies (CoNext). 2006.
Packet Classification in large ISPs: Design and evaluation of decision tree classifiers.
E. Cohen and C. Lund.
In Proceedings of the ACM Sigmetrics/Performance Joint International Conference on Measurements and Modeling of Computer Systems. 2005.
Spatially-Decaying Aggregation over a Network: Model and Algorithms.
E. Cohen and H. Kaplan.
In Proceedings of the ACM SIGMOD International Conference on Management of Data. 2004.
Coping with Network Failures: Routing Strategies for Optimal Demand Oblivious Restoration.
D. Applegate, L. Breslau, and E. Cohen.
In Proceedings of the ACM Sigmetrics/Performance Joint International Conference on Measurements and Modeling of Computer Systems. 2004.
Efficient Estimation Algorithms for Neighborhood Variance and Other Moments.
E. Cohen and H. Kaplan
In Proceedings of the ACM-SIAM SODA'04 Conference . 2004.
Making Intra-Domain Routing Robust to Changing and Uncertain Traffic Demands: Understanding Fundamental Tradeoffs.
D. Applegate and E. Cohen.
In Proceedings of the ACM SIGCOMM'03 Conference . 2003.
Maintaining Time-Decaying Stream Aggregates
E. Cohen and M. Strauss.
In Proc. 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 2003.
Optimal oblivious routing in polynomial time.
Y. Azar, E. Cohen, A. Fiat, H. Kaplan, and H. Räcke.
In Proc. 35th Annual ACM Symposium on Theory of Computing. ACM, 2003.
Associative Search in Peer to Peer Networks: Harnessing Latent Semantics.
E. Cohen, A. Fiat, and H. Kaplan.
In Proceedings of the IEEE INFOCOM'03 Conference. 2003.
Efficient Sequences of Trials.
E. Cohen, A. Fiat, and H. Kaplan
In Proceedings of the ACM-SIAM SODA'03 Conference . 2003.
A Case for Associative Peer to Peer Overlays.
E. Cohen, A. Fiat, and H. Kaplan
In Proceedings of HotNets-I . 2002.
Replication strategies in unstructured Peer-to-Peer networks.
E. Cohen and S. Shenker
In Proceedings of the ACM SIGCOMM'02 Conference . 2002.
Predicting and bypassing end-to-end Internet service degradations.
A. Bremler-Barr, E. Cohen, H. Kaplan, and Y. Mansour.
In Proceedings of the 2nd Sigcomm Internet Measurements Workshop. ACM 2002.
Balanced-replication algorithms for distribution trees.
E. Cohen and H. Kaplan.
In Proceedings of the 10th European Symposium on Algorithms . LLNCS vol.xxxx, 2002.
Search and replication in unstructured peer-to-peer networks.
Q. Lv, P. Cao, E. Cohen, K. Li, S. Shenker.
In Proceedings of the 16th annual ACM International Conference on supercomputing (ICS) 2002; In Proc. 2002 ACM SIGMETRICS conference (2-page poster). 2002.
Labeling dynamic XML trees
E. Cohen, H. Kaplan, T. Milo.
In Proc. 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 2002.
Reachability and distance queries via 2-hop labels.
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick.
In Proc. 13th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM, 2002.
Aging through cascaded caches: performance issues in the distribution of Web content.
E. Cohen, and H. Kaplan
In Proceedings of the ACM SIGCOMM'01 Conference . 2001.
Competitive analysis of the LRFU paging algorithm.
E. Cohen, H. Kaplan, and U. Zwick
In Proceedings of the 7th International Workshop on Algorithms and Data Structures . 2001.
Performance aspects of distributed caches using TTL-based consistency.
E. Cohen, E. Halperin, and H. Kaplan.
In Proceedings of the ICALP'01 conference, Springer-Verlag, LNCS. 2001.
Restoration by Path Concatenation: Fast Recovery of MPLS Paths.
A. Bremler-Barr, Y. Afek, E. Cohen, H. Kaplan, M. Merritt.
In Proceedings of the ACM PODC '01 conference; In Proceedings of the ACM SIGMETRICS '01 conference (2-page Poster). 2001.
Refreshment policies for Web content caches.
E. Cohen and H. Kaplan.
In Proceedings of the IEEE INFOCOM'01 Conference. 2001.
The age penalty and its effect on cache performance.
E. Cohen and H. Kaplan.
In Proceedings of the 3rd USENIX Symposium on Internet Technologies and Systems (USITS). 2001.
Proactive caching of DNS records: Addressing a performance bottleneck.
E. Cohen and H. Kaplan.
In Proc. of the 2001 Symposium on Applications and the Internet (SAINT). IEEE, 2001.
Connection caching under various models of communication.
E. Cohen, H. Kaplan, and U. Zwick,
In Proc. 12th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, 2000.
Prefetching the means for document transfer: A new approach for reducing Web latency
E. Cohen and H. Kaplan.
In Proceedings of the IEEE INFOCOM'00 Conference. 2000.
Finding interesting associations without support pruning,
R. Motwani, E. Cohen, M. Datar, S. Fujiware, A. Gionis, P. Indyk, J. Ullman, and C. Yang.
In: Proc. of the 16th International conference on Data engineering. IEEE, 2000.
Policies for managing TCP connections under persistent HTTP.
E. Cohen, H. Kaplan, and J. D. Oldham.
In Proc. of the 8th World Wide Web conference. 1999.
Connection caching.
E. Cohen, H. Kaplan, and U. Zwick,
In Proc. 31st Annual ACM Symposium on Theory of Computing. ACM, 1999.
Exploiting Regularities in Web Traffic Patterns for Cache Replacement.
E. Cohen and H. Kaplan,
In Proc. 31st Annual ACM Symposium on Theory of Computing. ACM, 1999.
Efficient Algorithms for Predicting Requests to Web Servers
E. Cohen, B. Krishnamurthy, and J. Rexford.
In Proceedings of the IEEE INFOCOM'99 Conference . 1999.
LP-Based Analysis of Greedy-Dual-Size.
E. Cohen and H. Kaplan.
In Proc. 10th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM, 1999.
Improving end-to-end performance of the Web using server volumes and proxy filters.
E. Cohen, B. Krishnamurthy, and J. Rexford
In Proceedings of the ACM SIGCOMM'98 Conference . 1998.
Evaluating server-assisted cache replacement in the Web.
E. Cohen, B. Krishnamurthy, and J. Rexford
In Proceedings of the 6th European Symposium on Algorithms . LLNCS vol.1461, 1998.
Learning noisy perceptrons by a perceptron in polynomial time.
E. Cohen.
In Proc. 38th IEEE Annual Symposium on Foundations of Computer Science. IEEE, 1997.
All pairs small stretch paths.
E. Cohen and U. Zwick.
In Proc. 8th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM, pages 93--102, 1997.
Approximating matrix multiplication for pattern recognition tasks.
E. Cohen and D. Lewis.
In Proc. 8th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM, pages 682--691, 1997.
Optimizing multiplications of sparse matrices.
E. Cohen.
In W. H. Cunningham, S. T. McCormick, and M. Queyranne, editors, Proc. of the 5th International Conference on Integer Programming and Combinatorial Optimization, pages 219--233. Springer-Verlag, Lecture Notes in Computer Science Vol. 1084, 1996.
Multi-rate detection for the IS-95A CDMA forward traffic channel using the 13kbs speach coder.
E. Cohen and H.-L. Lou.
In Proc. IEEE International Communications Conference. IEEE, 1996.
When piecewise determinism is almost true.
E. Cohen, Y.-M. Wang, and G. Suri.
In Proc. Pacific Rim International Symposium on Fault-Tolerant Systems, pages 66--71, December 1995.
Multi-rate detection for the IS-95 CDMA forward traffic channel.
E. Cohen and H.-L. Lou.
In Proc. IEEE Global Telecommunications Conference, pages 1789--1793. IEEE, 1995.
Improvise: Interactive Multimedia Process Visualization Environment.
N. S. Barghouti, E. Koutsofios, and E. Cohen.
In Wilhelm Schäfer, editor, Proceedings of the Fifth European Software Engineering Conference (ESEC'95). Springer-Verlag, September 1995.
Estimating the size of the transitive closure in linear time.
E. Cohen.
In Proc. 35th IEEE Annual Symposium on Foundations of Computer Science, pages 190--200. IEEE, 1994.
Polylog-time and near-linear work approximation scheme for undirected shortest-paths.
E. Cohen.
In Proc. 26th Annual ACM Symposium on Theory of Computing, pages 16--26. ACM, 1994.
Efficient parallel shortest-paths in digraphs with a separator decomposition.
E. Cohen.
In Proc. 5th annual ACM Symposium on Parallel Algorithms and Architectures, pages 57--67. ACM, 1993.
Using selective path-doubling for parallel shortest-path computations.
E. Cohen.
In Proc. of the 2nd Israeli Symposium on the Theory of Computing and Systems, pages 78--87. IEEE, 1993.
Fast algorithms for t-spanners and stretch-t paths.
E. Cohen.
In Proc. 34th IEEE Annual Symposium on Foundations of Computer Science, pages 648--658. IEEE, 1993.
Approximate max-flow on small depth networks.
E. Cohen.
In Proc. 33rd IEEE Annual Symposium on Foundations of Computer Science, pages 648--658. IEEE, 1992.
New algorithms for generalized network flows.
E. Cohen and N. Megiddo.
In D. Dolev, Z. Galil, and M. Rodeh, editors, Proc. of the 1st Israeli Symposium on the Theory of Computing and Systems, pages 103--114. Springer-Verlag, Lecture Notes in Computer Science Vol. 601, 1992.
Improved algorithms for linear inequalities with two variables per inequality.
E. Cohen and N. Megiddo.
In Proc. 23rd Annual ACM Symposium on Theory of Computing, pages 145--155. ACM, 1991.
Complexity analysis and algorithms for some flow problems.
E. Cohen and N. Megiddo.
In Proc. 2nd ACM-SIAM Symposium on Discrete Algorithms, pages 120--130. ACM-SIAM, 1991.
Strongly polynomial and NC algorithms for detecting cycles in dynamic graphs.
E. Cohen and N. Megiddo.
In Proc. 21st Annual ACM Symposium on Theory of Computing, pages 523--534. ACM, 1989.

Ph.D. Thesis:

Combinatorial Algorithms for Optimization Problems. E. Cohen.
PhD thesis, Department of Computer Science, Stanford University, Stanford, Ca., 1991.

Book chapters:

The Time-to-Live based consistency mechanism. E. Cohen and H. Kaplan.
Web Content Delivery, Tang, X. and Xu, J. and Chanson, S. T., (Eds.), Springer, New York, 2005.
Shortest Paths. E. Cohen.
Handbook of applied optimization, P.M. Pardalos and M.G.C. Resende, (Eds.), Oxford University Press, New York, 2002.

Journal Papers:

On the Tradeoff between Stability and Fit.
E. Cohen, G. Cormode, N. Duffield, and C. Lund
ACM Trans. Algorithms. 13(1) (2016). details
All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis.
E. Cohen.
IEEE Trans. Knowl. Data Eng. 27(9):2320-2334 (2015)
Algorithms and estimators for summarization of unaggregated data streams.
E. Cohen, N. Duffield, H. Kaplan, C. Lund, M. Thorup.
J. of Computer and System Sciences. (2014)
Envy-Free Makespan Approximation.
E. Cohen, M. Feldman, A. Fiat, H. Kaplan, S. Olonetsky.
SIAM J. Comput. 41(1): 12-25 (2012)
Structure-Aware Sampling: Flexible and Accurate Summarization.
E. Cohen, G. Cormode, N. G. Duffield.
PVLDB 4(11): 819-830 (2011)
Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums.
E. Cohen, N. G. Duffield, H. Kaplan, C. Lund, M. Thorup.
SIAM J. Comput. 40(5): 1402-1431 (2011)
Labeling Dynamic XML Trees.
E. Cohen, H. Kaplan, T. Milo. SIAM J. Comput. 39(5): 2048-2074 (2010)
Composable, Scalable, and Accurate Weight Summarization of Unaggregated Data Sets.
E. Cohen, N. Duffield, H. Kaplan, C. Lund, M. Thorup.
PVLDB 2(1): 431-442 (2009)
Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments.
E. Cohen, H. Kaplan, S. Sen.
PVLDB 2(1): 646-657 (2009)
Processing top-k queries from samples.
E. Cohen, N. Grossaug, H. Kaplan.
Computer Networks. 52(14):2605-2622, 2008.
Tighter estimation using bottom-k sketches.
E. Cohen, H. Kaplan. PVLDB . 1(1):213-229, 2008.
Associative Search in Peer to Peer Networks: Harnessing Latent Semantics.
E. Cohen, A. Fiat, and H. Kaplan. Computer Networks 51(8):1861--1881, 2007.
Spatially-Decaying Aggregation over a Network.
E. Cohen and H. Kaplan. JCSS. 73(3):265-288, 2007. Special issue of selected SIGMOD 2004 papers.
Making Routing Robust to Changing Traffic Demands: Algorithms and Evaluation.
D. Applegate and E. Cohen. IEEE/ACM Transactions on Networking. 16(6):1193-1206, 2006. (IEEE ComSoc 2007 William R. Bennett Prize!)
A Short Walk in The Blogistan
E. Cohen and B. Krishnamurthy.
Computer Networks 50(5):615-630 2006.
Maintaining Time-Decaying Stream Aggregates
E. Cohen and M. Strauss.
J. Algorithms , 59(1):19-36, 2006.
Optimal oblivious routing in polynomial time.
Y. Azar, E. Cohen, A. Fiat, H. Kaplan, and H. Räcke. JCSS 69(3):383-394 2004. Special issue of selected STOC 2003 papers.
Performance Aspects of Distributed Caches Using TTL-based Consistency.
E. Cohen, E. Halperin, and H. Kaplan. Theoretical Computer Science. 331(1):73-96, 2005. Special issue of selected ICALP 2001 papers.
Making Intra-Domain Routing Robust to Changing and Uncertain Traffic Demands: Understanding Fundamental Tradeoffs.
D. Applegate and E. Cohen. Computer Communication Review. 33(4), 2003.
Balanced-replication algorithms for distribution trees.
E. Cohen and H. Kaplan.
SIAM J. Comput. 34(1):227-247, 2005.
Reachability and distance queries via 2-hop labels.
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick.
SIAM J. Comput. 32(5):1338-1355 , 2003.
Predicting and bypassing end-to-end Internet service degradations.
A. Bremler-Barr, E. Cohen, H. Kaplan and Y. Mansour.
IEEE J. on Selected Areas in Communication (JSAC special issue in Internet and WWW Measurement, Mapping, and Modeling). 21(6):961-978, 2003.
Proactive caching of DNS records: Addressing a performance bottleneck.
E. Cohen and H. Kaplan.
Computer Networks 41(6):707-726, 2003.
Connection caching: model and algorithms.
E. Cohen, H. Kaplan, and U. Zwick.
J. Comput. System Sci. 67(1):92-126, 2003.
A Case for Associative Peer to Peer Overlays.
E. Cohen, H. Kaplan, and A. Fiat. Computer Communication Review. 33(1):95-100, 2003.
Replication strategies in unstructured Peer-to-Peer networks.
E. Cohen and S. Shenker. Computer Communication Review. 32(4):177-190, 2002.
Restoration by Path Concatenation: Fast recovery of MPLS paths.
A. Bremler-Barr, Y. Afek, E. Cohen, H. Kaplan, M. Merritt.
Distributed Computing. 15:273-283. 2002. (Special issue of selected PODC '01 papers)
Competitive Analysis of the LRFU Paging Algorithm.
E. Cohen, H. Kaplan, and U. Zwick.
Algorithmica. 33:511-516, 2002.
Prefetching the means for document transfer: a new approach for reducing Web latency.
E. Cohen and H. Kaplan.
Computer Networks. 39(4): 437-455, 2002.
Exploiting Regularities in Web Traffic Patterns for Cache Replacement.
E. Cohen and H. Kaplan.
Algorithmica 33(3):300-334, 2002. (dedicated issue on Internet Algorithmics.)
Refreshment policies for Web content caches.
E. Cohen and H. Kaplan.
Computer Networks, 38(6):795-808, 2002.
Caching documents with varying sizes and fetching costs: an LP-based approach.
E. Cohen and H. Kaplan.
Algorithmica , 32(3):459-466, 2002.
Aging through cascaded caches: performance issues in the distribution of Web content.
E. Cohen and H. Kaplan
Computer Communication Review. 31(4):41-54, 2001.
Finding interesting associations without support pruning.
R. Motwani, E. Cohen, M. Datar, S. Fujiware, A. Gionis, P. Indyk, J. Ullman, and C. Yang.
IEEE Transactions on Knowledge and Data Engineering (special issue), 13:64-78 , 2001.
All pairs small stretch paths.
E. Cohen and U. Zwick.
J. Algorithms , 38:335-353. 2001.
Polylog-time and near-linear work approximation scheme for undirected shortest-paths.
E. Cohen.
J. Assoc. Comput. Mach., 47:132-166, 2000.
Managing TCP Connections under Persistent HTTP.
E. Cohen, H. Kaplan, and J. D. Oldham.
Computer Networks. 31:1709--1723, 1999.
Approximating matrix multiplication for pattern recognition tasks.
E. Cohen and D. Lewis.
J. Alg. 30:211-252, 1999. (Special issue of selected papers from SODA'97)
Structure prediction and computation of sparse matrix products.
E. Cohen.
J. Combinatorial Optimization, 2:307-332, 1999.
Fast algorithms for t-spanners and stretch-t paths.
E. Cohen.
SIAM J. Comput., 28:210--236, 1999.
Improving end-to-end performance of the Web using server volumes and proxy filters.
E. Cohen, B. Krishnamurthy, and J. Rexford
Computer Communication Review. 28(4):241-253. 1998.
Size-estimation framework with applications to transitive closure and reachability.
E. Cohen.
J. Comput. System Sci. 55:441--453, 1997. (Special issue of selected papers from FOCS'94).
Using selective path-doubling for parallel shortest-path computations.
E. Cohen.
J. Alg., 22:30--56, 1997.
Efficient parallel shortest-paths in digraphs with a separator decomposition.
E. Cohen.
J. Alg., 21:331--357, 1996.
Approximate max-flow on small depth networks.
E. Cohen.
SIAM J. Comput., 24:579--597, 1995.
Complexity analysis and algorithms for some flow problems.
E. Cohen and N. Megiddo.
Algorithmica, 11:320--340, 1994.
Improved algorithms for linear inequalities with two variables per inequality.
E. Cohen and N. Megiddo.
SIAM J. Comput., 23:1313--1347, 1994.
New algorithms for generalized network flows.
E. Cohen and N. Megiddo.
Math. Prog., 64:325--336, 1994.
Strongly polynomial time and NC algorithms for detecting cycles in periodic graphs.
E. Cohen and N. Megiddo.
J. Assoc. Comput. Mach., 40:791--830, 1993.
Maximizing concave functions in fixed dimension.
E. Cohen and N. Megiddo.
In P. M. Pardalos, editor, Complexity in Numerical Optimization, pages 74--87. World Scientific, 1993.
Recognizing properties of periodic graphs.
E. Cohen and N. Megiddo.
In P. Gritzmann and Sturmfels B., editors, Applied Geometry and Discrete Math. The "Victor Klee Festschrift", volume 4, pages 135--146. ACM AMS, 1991.
NP-completeness of graph decomposition problems.
E. Cohen and M. Tarsi.
J. Complexity, 7:200--212, 1991.

August, 2015
edith (at) cohenwang (period) com