A continuing mystery in understanding the empirical success of deep neural networks has been in their ability to achieve zero training error and yet generalize well, even when the training data is noisy and there are many more parameters than data points. Following the information-theoretic tradition of seeking understanding, this talk will share our three-part approach to shedding light on this phenomenon. First, following the tradition of such distilled toy models like the BSC and AWGN channels, the Gaussian source, or scalar linear control systems, we zoom in on the classical linear regression problem in the underdetermined setting with more parameters than training data. Here, the solutions that minimize training error interpolate the data, including noise. Second, following the tradition of converse bounds, we give a genie-aided bound on how well such interpolative solutions can generalize to fresh test data, and show that this bound generically decays to zero with the number of extra features, thus characterizing an explicit benefit of overparameterization. Third, we talk about what it takes to achieve such harmless interpolation in appropriately overparameterized limits. For appropriately sparse linear models, we provide a hybrid interpolating scheme (combining classical sparse recovery schemes with harmless noise-fitting) to achieve generalization error close to the bound on interpolative solutions. Along the way, we call out certain key concepts that we call "signal bleed" and "crab-pot regularization" that help us understand what is required to achieve harmless interpolation in general.
We first describe successive cancellation (SC) decoding on a binary-input symmetric memoryless channel using the Plotkin (u,u+v) construction. Here we also briefly address the complexity and performance characteristics of SC decoding and turn to its polarization properties. Any synthetic channel is then represented as some ensemble {qi,pi} of the binary symmetric channels BSC(pi) that occur in the decoding process with different probabilities qi and have different crossover probabilities pi. We then map each BSC(pi) onto a planar unit vector (sin ai,cos ai), where sin ai=(1-pi)/2. Polarization process then becomes a pairwise angle-modifying transformation in the ensemble {qi,ai} of unit vectors that occur with different probabilities qi and have different angles ai. In the upgrading process, two ordered vectors that have angles a and b produce a unit vector that has a smaller angle arcsin(sin a sin b). Similarly, a pairwise degrading of two BSC channels yields a unit vector with a larger angle arccos(cos a cos b). We study the properties of the angles ai and estimate how they change in the upgrading-degrading process. In particular, any two input angles produce two output angles that have a larger mutual separation. We also study some other quantities associated with pairwise angle transformations. As a result, different angles polarize and tend to 0 or π/2 with probability 1. This also proves the polarizing behavior of synthetic channels.
An all-wireless vision connecting billions of devices is finally in sight with the Internet-of-Everything paradigm. This vision entails large networks with dynamic connectivity, ad hoc formation and heterogeneous nodes. Central to be able to integrate all our lives to this massive virtual domain is security and privacy of the information that flows through it. Current wireless systems have security of information as an add-on to current design, and rely on application layer protocols, which have worked well for the scale and the resources of the systems to date. Going forward however, with massive scale formation of networks of asymmetric resources, these protocols involving key exchanges and shared randomness for security may prove to be less than practical.
Securing information at the foundation of system design can alleviate these issues by replacing or strengthening the present cryptographic solutions. This foundational design approach brings us to information theoretic security for the all-connected world. In this talk, we will provide an overview of this approach that relies on local randomness and produces information theoretic security guarantees, e.g, for confidentiality and authentication, utilizing the properties of the transmission medium. We will review the insights that have emerged when information security is included as a design primitive and provide the state of the art directions towards realizing the potential of this approach. We will also introduce models where this approach can be integrated with encryption for nodes with local memory storage.
We will overview the distributed optimization algorithms starting with the basic underlying idea illustrated on a prototype problem in machine learning. In particular, we will focus on convex minimization problem where the objective function is given as the sum of convex functions, each of which is known by an agent in a network. The agents communicate over the network with a task to jointly determine a minimum of the sum of their objective functions. The communication network can vary over time, which is modeled through a sequence of graphs over a static set of nodes (representing the agents in a system). In this setting, the distributed first-order methods will be discussed that make use of an agreement protocol, which is a mechanism replacing the role of a coordinator. We will discuss some refinements of the basic method and conclude with more recent developments of fast methods that can match the performance of centralized methods (up to a logarithmic factor).
Hypergraph partitioning is an important problem in machine learning, computer vision, VLSI design and network analytics. A widely used method for hypergraph partitioning relies on minimizing a normalized sum of the costs of placing hyperedges across clusters. Algorithmic solutions for this approach assume that different partitions of a hyperedge incur the same cost. However, this assumption fails to address settings in which different subsets of vertices in the same hyperedge provide different contributions to the underlying higher order relations. To accommodate nonuniform partitioning costs, we introduce the notions of inhomogeneous spectral hypergraph partitioning and submodular hypergraphs. Inhomogeneous spectral partitioning produces a quadratic approximation to the optimal solution if the costs satisfy submodularity constraints. We illustrate the advantages of inhomogenous over classical hypergraph partitioning on applications as diverse as structure learning of rankings, subspace segmentation and motif clustering.
The modern era of big data increasingly requires outsourcing of large data sets to remote servers (the cloud). Remote storage gives rise to new challenges such as user privacy and information security in addition to the concerns about data integrity, communication efficiency and computation complexity. The goal of Private Information Retrieval (PIR) is to allow users to efficiently access desired records from remotely stored datasets without revealing to the servers which records are desired. Recent information theoretic advances have led to capacity characterizations of PIR as well as several of its variants, and generated excitement for the future prospects of this rich research avenue. This is especially significant because PIR is a point of convergence of complementary perspectives. Indeed, PIR is the lens through which information theoretic analysis may be applied to a number of interconnected issues -- ranging from practical concerns such as privacy, security, integrity and computation complexity of distributed storage to broader theoretical ideas such as the structure of information and interference alignment. It is well known that PIR shares intimate connections to prominent problems in theoretical computer science and cryptography, communication and information theory, and coding and signal processing. As such, discoveries in PIR have the potential for a ripple effect in their impact on a number of related problems. The talk will highlight such connections, with special focus on privacy, security, locality, cross-subspace alignment and secure distributed matrix multiplication.