(Mostly) Perfect Matchings

Disrupting a beloved college tradition with mathematical optimization

MentorMatch

As of late, I've been working on a project called MentorMatch, a service designed to pair mentees to mentors according to a set of preferences. This was primarily motivated by "Matching Day", a semiannual tradition in Greek life in which new inductees are paired with existing members of the organization. Not being a member of a fraternity myself, I asked a friend of mine for more details. Here's what I gathered:

A list of all new members and eligible existing members is made public.
After a series of events where the two groups mingle, each new member generates of list of mentors they like and vice versa.
???
The list of matchings is published, with great pomp and circumstance.

I was told there exists a preference committee elected each year whose sole responsiblity is generating this pairing. Unsurprisingly, the actual matching procedure is deliberately opaque. From what I gathered, the process is not unlike a papal conclave- the committee is gathered into one room, isolated from the rest of the world, and asked not to emerge until everyone has been matched. Given the nature of sealed-door conversations, it is impossible for members to interrogate the decisions made by the committee. And so, like many other engineers, I thought to myself:

"There has to be an easier way of doing this..."

Perfect Matchings

Fortunately, such matching problems have been a well-studied area of applied mathematics for decades. In particular, we're looking for something called a perfect matching, in which every person is assigned exactly one partner who happened to be on their preference list. Unfortunately, this might prove impossible, so we instead set our sights on a less lofty goal: maximizing global happiness¹.

Let's formalize exactly what we're aiming for here. Suppose we have $n$ mentors and $n$ mentees, with each person having an unordered list of preferences². We can create a preference matrix $C$ as follows: $$C_{ij} = \mathbf{1}(\text{mentor } i \text{ prefers mentee } j) + \mathbf{1}(\text{mentee } j \text{ prefers mentor } i)$$ where $\mathbf{1}(x)$ is the indicator function. Our goal is to pick one entry from each row and column of $C$, such that the sum of entries is maximized. If we let $(i, m_i)$ represent the assigned mentee-mentor pairs, this admits the objective function: $$ \argmax_{(m_1 \dots m_n)} \sum_{i=1}^n C_{im_i} \qquad \text{s.t. } m_i \text{ all distinct } $$ or using the language of linear algebra: $$ \argmax_{P} || P \odot C ||_F \qquad \text{s.t. } P \text{ a permutation matrix } $$

Thankfully, there exists an abundance of libraries dedicated towards solving this problem (I used SciPy's linear_sum_assignment). This returns the optimal row/column pairs, from which we can recover our final happiness score by summing the relevant entries of $C_{ij}$ and dividing by the maximum happiness $2n$.

Breaking Ties

Depending on how the preference data shakes out, our matching may not be unique. As a somewhat trivial example, consider an amiable group of mentors and mentees, in which each person prefers all members of the opposite class. Our preference matrix $C$ would contain only twos and all $n!$ possible matchings would attain maximum happiness. The algorithm we use is deterministic (i.e. we always return the same solution for every run, even if multiple exist). Does this matter? Mathematically, it doesn't- all optimal solutions give the same happiness, so just pick one and call it a day. From a psychological perspective, however, I would argue this isn't the case.

Over the course of daily life, we are faced with countless decisions- how to commute to work, what to cook for dinner, what time to go to the gym, and so forth. We weigh the benefits of each option and, using the information we have on hand, eventually come to a conclusion. However, we sometimes are paralyzed by indecision and must seek outside help. For many, the choice is made by random chance, namely the flip of a coin. For picking between two equal outcomes, a coin flip offers an impartial judgement. But, if you're anything like me, you watch the coin turn end over end in midair and think:

"I sure hope it comes up tails..."

We profess that we hold both options as equal, while secretly harboring a preference. It is not until the decision is surrendered to outside forces that this preference is brought to the surface. I strongly suspect that this phenomenon is present in our matching problem. Though there are likely several matchings that are equal according to whatever criteria is used, implicit biases will move the needle, thereby allowing the committee to converge upon the best solution.

Adding Noise

Attempting to model subconscious behavior is a fool's errand, so we will have to make do with a proxy. We suggest the following modification: if multiple optimal solutions exist, randomly select one to return. The idea is that if the user is dissatisfied with a given pairing, the algorithm can be rerolled to get a new matching. This process can be repeated ad infinitum until the user receives a matching he finds agreeable.

Currently, our set of matchings $(M, \prec)$ exhibits a strict partial order, where $m_1 \prec m_2$ if $m_1$ has a lower matching score than $m_2$. Note that we cannot compare two optimal solutions in this framework, which is problematic. We want to create a new ordering $\preceq$ on $M$ that satisfies two criteria:

If $m_1 \prec m_2$, then $m_1 \preceq m_2$
For all $m_1, m_2 \in M$, either $m_1 \preceq m_2$ or $m_2 \preceq m_1$

Essentially, we want a way to compare any two elements, while preserving the ordering of existing pairs. In combinatorics, this process is referred to as a linear extension³. We can accomplish this via augmenting our preference matrix $C$. Specifically, we run our matching algorithm on a new matrix $C'$, defined as: $$ C_{ij}' = C_{ij} + \varepsilon_{ij} $$

where $\varepsilon_{ij}$ are iid random variables drawn from a uniform distribution. To abide by the first constraint, our $\varepsilon_{ij}$ terms should be sufficiently small, as to prevent the noise from overpowering the actual data. It turns out that $\varepsilon_{ij} \sim U(0, \frac{1}{n})$ suffices, which I arrived at after spending a signficant amount of time attempting to reproduce a nondeterministic bug caused by a careless choice of parameter.

To give a concrete example, let's consider a six person match with the following preference matrix: $$ C = \begin{pmatrix} 2 & 2 & 1 \\ 2 & 2 & 1 \\ 1 & 0 & 2 \\ \end{pmatrix} $$ which has two optimal solutions: $\{(0,0), (1,1), (2,2)\}$ and $\{(0,1), (1,0), (2, 2)\}$. Now, consider two plausible matrices $C_1$ and $C_2$, obtained by augmenting $C$ with random noise: $$ C_1 = \begin{pmatrix} 2.2563 & 2.0389 & 1.0486\\ 2.2235 & 2.3014 & 1.1923 \\ 1.192 & 0.2541 & 2.0134 \\ \end{pmatrix} \qquad C_2 = \begin{pmatrix} 2.0933 & 2.2486 & 1.2232 \\ 2.3290 & 2.0182 & 1.2924 \\ 1.2123 & 0.0198 & 2.0603 \end{pmatrix} $$

In both cases, the ties disappear, yielding a unique solution. More importantly, the answers differ and correspond to the original two solutions!

Reflection

With the noise functionality settled, our matching algorithm is complete. MentorMatch is the best of both worlds- offering the objectivity of mathematical optimization while appeasing human psychology. As such, I propose the following new-and-improved Matching Day:

A list of all new members and eligible existing members is made public.
After a series of events where the two groups mingle, each new member generates of list of mentors they like and vice versa.
Preference data is uploaded to MentorMatch and an initial optimal matching is generated
The powers that be rerun the algorithm to their heart's content, stopping whenever satisfied
The list of matchings is published and the creator of MentorMatch is thanked profusely for his efforts

which should hopefully streamline the whole process. This has the added benefit of being completely transparent, allowing the committee to use MentorMatch as a shield against any criticism. Unfortunately, MentorMatch also obsoletes the entire committee, save the person responsible for clicking the refresh button. Alas, such a sea change may prove to be mildly unpopular, so I expect my proposal to fall on deaf ears. Nevertheless, MentorMatch was an enjoyable foray into the world of matching problems, so I'll call this valuable experience- even if the project never sees the light of day.

□