Generalized Hypergraph Cuts: from Theoretical Foundations to Applications
Nate Veldt (Cornell University)
Networks provide an excellent way to model systems of interconnected data with pairwise relationships. However, there has been a growing realization that many complex datasets and systems in the real world are better characterized by multiway relationships. For example, social interactions often occur in groups, consumers purchase multiple products during a shopping trip, and chemical interactions typically involve more than two molecules. This realization has led to a surge of interest in algorithms for mining information from network data with generalized mathematical structures that encode higher-order relationships. One of the most flexible models for higher-order relationships is a hypergraph, which can encode multiway relationships of arbitrary size.
This talk will present our recent innovations in models and algorithms for common data science applications – including community detection, partitioning, sparsification, and semi-supervised learning – involving hypergraphs: specifically, hypergraph cut problems. We introduce a very general notion of a hypergraph cut function motivated by data science applications, one that hasn’t yet been considered despite decades of research into hypergraphs, and then consider algorithms and hardness results for minimizing special sub-classes of this function in practice. We apply these results to explore and tease out new insights from various real-world datasets, ranging from large retail product datasets to food webs. Scalability in terms of hyperedge size and number of hyperedges is often an issue with hypergraph-based generalizations and we show localization techniques enable us to scale to data with millions of hyperedges of large size.