Weighted sampling is an extremely useful technique for processing streaming or distributed data. In particular, for the purpose of computing statistics or aggregates that depend on the frequencies of keys, a sample can be used to efficiently compute estimates without storing the entire table of keys and frequencies in memory. In order to get low-error estimates using a small sample, the sampling method should be weighted and tailored to the specific aggregates of interest. As a rule of thumb, the sampling weight of each key should be proportional to its contribution to the aggregate. For aggregates that sum over a function f applied to the frequency of each key (e.g., frequency moments), the sampling weight of a key x with frequency w_x should be f(w_x). A main question, and the focus of this work, is designing weighted sampling schemes for broad families of functions that can be computed efficiently over streaming or distributed data. First, we consider the family of concave sublinear functions, which includes x^p for p at most 1, ln(1+x), capping functions (min{x, T} for a constant T), their compositions, and more. We design a sampling sketch that can be tailored to any concave sublinear function with near-optimal guarantees on the variance of the estimates. We complement the theoretical analysis with an experimental study, which shows that in practice the variance of our estimates is even lower than the theoretical bound and is actually close to that of the optimal benchmark. For some functions of frequencies (for example, x^p for p > 2), there are hardness results that show in the streaming setting, weighted sampling according to these functions requires polynomial space. In the second part, we consider two scenarios that go beyond worst-case analysis. In the first scenario, the input is still adversarial, but we additionally assume some noisy knowledge of the input distribution. In the second scenario, we assume that the input comes from a more restricted class of frequency distributions that is better aligned with what is observed in practice. In both scenarios, we show how to leverage our assumptions to sample according to functions that are "hard" on worst-case inputs. In the final part, we propose a method to post-process weighted samples to satisfy element-level differential privacy, while retaining the sample utility to the maximum extent possible. Our method maximizes the reporting probabilities of keys and estimation quality for a broad family of statistics