date 2021-12-29
tags tech, haskell
Equivalence classes in haskell
how to split a graph, functionally
editor’s note: this is based on some heavily edited notes, originally published on Dec 9th, 2017.
Let’s say you have a graph \( G \) with verticies \( V \), and you wish to calculate and group nodes together based on some equivalence relation \( R \).The original motivation for this was as an optimization pass in a machine-learning compiler: deduplication. Every node in the graph represented a bundle of code that had to be compiled, and many notes were the same (up to α‑equivalence) so deduplication improved compile time. That is:
$$ \forall v \in V. \ E(v) $$ $$ E(v) = \{ e \in V |\ R(v,e) \} $$
A union find structure is capable of representing this fact: the set of nodes is partitioned into unique set of non-overlapping subsets, which fits the above definition precisely.This is only true since \( R \) is an equality relation. However, there’s a quick and easy way to do this in Haskell while remaining pure and simple with almost no dependencies.This pure version is not as efficient as union find. In fact there is no known pure implementation of any union find-like algorithm that matches the asymptotic and practical efficiency of a non-pure implementation.
As a note, when you partition a set \( V \) on an equivalence class \( R \), you get something called the quotient space of \( R \) on \( V \), also written \( V/R \).
In Haskell terms, the above equations can basically be seen like so:
-- | equivalence class of @a@ with respect to some set
classes :: Eq a => a -> [a] -> [a]
classes v xs = filter (== v) xs
-- | calculate every equivalence class for an input list
quotient :: Eq a => [a] -> [[a]]
quotient xs = ... map (\v -> classes v xs) xs ...
However this implementation has an immediate, glaring issue: the quotient
combinator has \( O(n^2) \) complexity for the set of verticies \( V \), as scanning a single vertex then requires you to also scan every other one as well.
This basic problem exists in Haskell already with the nub
combinator: deduplicating a list in anything less than \( O(n^2) \) requires a stronger set of constraints than equality.There are a variety of proposals to add Ord-powered nub to base and other libraries, though you can find various implementations (using Data.Set internally, normally) hanging around. You need an actual concrete ordering among set elements in order to do better.
So solving this without exponential complexity is relatively easy if we have some kind of Ord
constraint on the set of verticies. More specifically there needs to be some total ordering on the set of nodes in the graph; it can be arbitrary but it must be there.In many cases a good old fashioned deriving Ord
is probably good enough!
Given all this, there’s a simple combinator that can wrap all of this up with a bow on top, assuming that you have an Ord
instance:
-- | Construct the set of equivalence classes, AKA the /quotient set/, for a
-- given input list, given an ordering relation between values.
quotientBy
:: (a -> a -> Ordering)
-- ^ The equivalence relation \(R\).
-> [a]
-- ^ The input set \(X\), represented as a simple list.
-> [[a]]
-- ^ The resulting quotient set \(X/R\), defined as a list of lists, where
-- every item in the resulting list is another set of items grouped by \(R\).
quotientBy k
= groupBy (fmap (== EQ) . k)
. sortBy k
-- | Construct the set of equivalence classes for a given list of items, using
-- @'(==)'@ as the equivalence/ordering relation.
quotient :: Ord a => [a] -> [[a]]
quotient = quotientBy compare
Partitioning the graph is simply a case of using quotient
. Of course, I don’t think this combinator is important or prevalent enough to pass the Fairbairn threshold, which would justify its name and home somewhere in the base
libraries. I would however love an improved nub
function!