Data Could Decide the Harvard Admissions Trial. Here’s How and Why

The trial and lawsuit unleashed mountains of classified Harvard admissions data. Both the University and SFFA employed statistical experts to analyze the data and testify about their results in court. So, who’s right?

By Iris M. Lewis and Sanjana L. Narayanan

By Simon S. Sun

Numbers lie at the heart of the Harvard admissions trial.

The trial, which kicked off Oct. 15 in a Boston courthouse and lasted for three weeks, is the product of a four-year-old lawsuit brought by anti-affirmative action advocacy group Students for Fair Admissions. Now that things have wrapped up in court, the judge in the case — Allison D. Burroughs — has a high-stakes question to answer: does Harvard, as SFFA alleges, reject qualified Asian-American applicants in order to achieve illegal racial balancing?

Burroughs is expected to render a verdict in early 2019. As she decides, she’ll ponder thousands of pages of internal Harvard documents as well as evidence and testimony presented throughout the three-week-long trial.

And she’ll consider numbers. Lots and lots of numbers.

SFFA paid Duke economics professor Peter S. Arcidiacono to create a model of the College’s admissions process. He claims his model proves Harvard does discriminate against Asian Americans. Harvard, though, paid University of California, Berkeley economics professor David E. Card to create his own model of the admissions process. He claims his model proves the College does not discriminate.

So, who’s right?

Economics professors and statisticians across the country say Arcidiacono and Card built their models almost the same way. Both used a technique called “regression” that estimates the relationships among different variables — in the trial, these variables represent factors that admissions officers weigh when evaluating Harvard hopefuls.

Still, Card’s and Arcidiacono’s regressions differ on a few meaningful points.

In his analysis, Card included the personal rating — a score that measures applicants’ character — and so-called “ALDC” applicants, meaning students who are athletes, legacies, linked to top Harvard donors, or children of faculty members. Arcidiacono left out both. Card also ran his analyses year-by-year for the six years’ worth of Harvard admissions data made available as part of the trial; Arcidiacono pooled the data together.

The disparities between the models could prove crucial in deciding the fate of the case. The Crimson breaks down what these differences mean — and what you need to know about each model.

THE PROBLEM OF THE PERSONAL RATING

When admissions officers sit down to assign a personal rating, they consider students’ interviews with alumni, application essays, and teacher recommendations. Then they rank applicants on a scale of 1 to 4, with 1 being the highest and 4 being the lowest score.

Experts on both sides of the trial say Card’s inclusion and Arcidiacono’s exclusion of the personal rating may mark the most important difference between the economists’ models. SFFA supporter and economics professor Michael Keane called the personal rating the “single biggest factor” in proving or disproving alleged discrimination.

Those on Harvard’s side argue that Arcidiacono’s omission of the personal rating makes his model less representative of the admissions process.

Harry J. Holzer ’78, an economics professor and Harvard supporter, said it is a good idea to consider all the variables that admissions officers weigh.

“The whole trick,” Holzer said, “is to find measures that on the one hand are objective, that aren’t themselves contaminated by bias, but at the same time that are fairly complete.”

Tyler J. VanderWeele, a professor at the Harvard School of Public Health, also said he believes Card was correct to include the personal rating. VanderWeele wrote in an email that, given Harvard’s admissions process has many facets, “it seems the coefficient in Card’s model is the more reasonable one.”

Arcidiacono and SFFA supporters, however, argue that Card’s inclusion of the personal rating allows him to obscure bias inherent in Harvard’s admissions process. Asian-American College applicants typically earn lower personal scores than do applicants of other races with similar academic accomplishments — a statistic SFFA attributes to prejudice and Harvard attributes to weaker recommendations from high school teachers.

By including the personal rating in his model, Card is treating the personal score as a valid measure of a candidate’s worth — and failing to acknowledge that admissions officers’ biases may have influenced the metric, SFFA supporters say.

“How can we use these subjective personal ratings, which may be biased against Asian applicants, to investigate the biased admissions decisions against Asians?” pro-SFFA economics professor Yingyao Hu asked in an email.

Pro-SFFA economist Matthew Shum agreed.

“The personal rating is itself a function of Asian-American status,” Shum said. “When you include both these things, it’s hard to estimate the effect of Asian-American status directly on admissions.”

The problem of the personal rating, economists on both sides agree, boils down to burden of proof. Who has to show what?

Harvard says SFFA needs to prove that the disparity in ratings assigned to applicants of different races stems from intentional discrimination. SFFA, though, says Harvard needs to prove that this discrimination does not exist.

Pro-Harvard economist Jesse Rothstein framed the issue as more of a “legal question” than a “statistical” one.

“I think if you’re trying to allege that there’s discrimination, you have to do more than just show differences in outcomes,” Rothstein said. “You have to go further than that.”

Keane, who supports SFFA, made the opposite point.

“You’re making quite a dramatic claim, if you’re saying that Asians have systematically inferior personal qualities to the other groups,” Keane said. “And to convince me that’s true, you’re gonna have to offer me some very strong evidence. Because my prior is that people are basically equal.”

Near the end of the trial, Burroughs offered her own assessment of the different variables populating Card and Arcidiacono’s models. She compared the factors added in and left out to an ever-changing “a la carte menu.”

Card, standing in the courthouse, laughed.

“You start with my recipe,” he told Burroughs. “Then [in Arcidiacono’s] you take out some ingredients.”

ALDC APPLICANTS AND A YEAR-BY-YEAR BREAKDOWN

The personal rating is not the only disparity between Card and Arcidiacono’s analyses.

Card’s inclusion of ALDC applicants in his model has proven controversial, too. Students in these categories typically benefit from acceptance rates far higher than those seen by non-athletes, non-legacy candidates, those not of interest to top donors, and those unrelated to Harvard faculty.

Pro-Harvard Economics professor Cecilia E. Rouse said it does not make sense to omit ALDC applicants from any model of the Harvard admissions system.

“The question is, why would one do that, if they are competing and are part of the same admission?” Rouse said.

But Hanming Fang, another economist supporting SFFA, had an answer for Rouse: he pointed to ALDC students’ dramatically increased acceptance rate as proof that these applicants go through a fundamentally different admissions process.

“Pooling two applicant pools with different admission processes is not a sound modeling choice in my view,” Fang wrote in an email.

Card also chose to run his model year-by-year. Arcidiacono, though, pooled the data.

Experts were again divided over the merits of each approach. Harvard supporters argued that, in the real world, the admissions cycle is annual, so Card’s model is the best approximation. SFFA supporters, by contrast, claimed that combining the years increases the strength of the statistical analysis.

Analysts said it is unclear how these smaller differences between the two men’s models may have affected the conclusions and correctness of their analyses.

WHY BOTH MODELS COULD BE WRONG

Both paid experts may be wrong.

Some outside statisticians said they think Arcidiacono and Card may have oversimplified the realities of the Harvard admissions process.

Statistics Professor Jun S. Liu argued the regression analysis both experts performed is insufficiently sophisticated to determine causation between variables such as race and likelihood of acceptance to Harvard.

Liu wrote in an email that, though he has not studied the analyses in depth, it seems to him that the regression technique is ”fundamentally flawed.”

“A more appropriate causal inference analysis needs to be done to make any statistically convincing statement/judgement,” he wrote.

VanderWeele also recommended employing more nuanced methods. He suggested relying on machine learning — a form of artificial intelligence in which computers create and improve a statistical model as they “train” on more data — or cross-validation, which involves divvying data into different subsets to test the effectiveness of a model.

He said both approaches can be completed using software that is readily available and “easy-to-use.”

“There are methods now that estimate differences, standardized by a given set of covariates, but that use machine learning and cross-validation to choose the appropriate functional form and set of interactions,” VanderWeele said. “I would be inclined to think the most unbiased analysis would be one in which both analysts agreed on the set of covariates and then used this type of modeling approach.”

‘IT CAN’T TELL A STORY’

As Burroughs inches towards her final decision, she faces a perhaps intractable problem. According to multiple economists, neither Arcidiacono’s nor Card’s model is clearly and universally more correct than the other.

“The one thing that’s important for you to understand is that you can’t prove things with statistics,” Keane said. “It’s not like doing pure math, where you just get the right answer. There’s not really a right answer.”

Whether or not Harvard intentionally discriminates against Asian-American applicants is a question statistics has largely failed to answer. So come April, Burroughs will ultimately be forced to make what several economists dubbed a “judgment call.”

“What statistics is doing, basically, is just looking for relationships in the data,” Shum said. “It can’t tell a story.”

— Staff writer Iris M. Lewis can be reached at iris.lewis@thecrimson.com

— Staff writer Sanjana L. Narayanan can be reached at sanjana.narayanan@thecrimson.com