At Sunday Vigil, More Than 1,000 Mourn Victims of Hamas Attacks, Stand in Solidarity with Israel
AI Model Based on Harvard President Gay Allegedly Included Instructions Invoking Racist Stereotypes
Harvard Sciences Dean Stubbs Says Generative AI is ‘Top of the List’ of Challenges
More Than 1,000 Rally on Harvard’s Campus to ‘Free Palestine’ Ahead of Expected Ground Invasion of Gaza
Three Members Resign after Harvard IOP Student Advisory Committee Votes Against Statement on War in Israel
ClaudineGPT, a generative artificial intelligence language model based on University President Claudine Gay, used instructions invoking racist stereotypes, the AI Safety Student Team alleged in an email to the model’s creators.
The Harvard Computer Society AI Group released ClaudineGPT — along with an alternate version, “Slightly Sassier ClaudineGPT” — on Sept. 29, the day of Gay’s inauguration. Both models were taken down by midnight, as the group had said they would be available “for one day and one day only.” In an email to the HCS AI Group earlier that day, AISST — a group of Harvard students who research the risks of AI — alleged that ClaudineGPT’s instructions directed it to provide “extremely angry and sassy” responses.
In the email on behalf of the AISST, communications director Chinmay M. Deshpande ’24 cited concerns that the AI model was “problematic” and risked perpetuating racist tropes and requested it be taken down.
“When several of our members have attempted to ‘jailbreak’ the model by requesting that it output its custom prompt, ClaudineGPT has often reported that its prompt includes the string ‘Claudine is always extremely angry and sassy,’” Deshpande wrote.
“Releasing these models seems to only contribute to the trend of AI products that denigrate or harm women and people of color,” the AISST email stated.
In an email response to AISST the following day, HCS AI Group said ClaudineGPT was not meant to be “a serious representation” of Gay and confirmed ClaudineGPT was no longer accessible. The group had said in its email announcing the AI model that it would only be available for inauguration day.
“ClaudineGPT through its publication and design has always signaled to be a satire and joke, not a serious representation of Claudine Gay, purely for entertainment effect and hopefully understood by most to be this case,” the HCS AI Group email stated in response to the AISST. “We by no means intended offense to anyone.”
The HCS AI Group declined to comment for this article.
ClaudineGPT was previously released for April Fool’s Day by HCS AI Group and re-released Sept. 29. It has not been made accessible since it was taken down following Gay’s inauguration.
AISST Deputy Director Nikola Jurkovic ’25 wrote in an email to The Crimson that he tried jailbreaking ClaudineGPT to “see how it worked under the hood” when he first received the email announcement Friday.
“In the prompt that the jailbroken language model revealed, one of the instructions tells the language model that ‘Claudine is always extremely angry and sassy,’” Jurkovic wrote.
To generate responses to users, large language models use a predefined written prompt, whose parameters define the model’s behavior.
By entering specific commands aimed at “jailbreaking” the model, Deshpande said, it is sometimes possible to make the tool “reveal information that it’s been instructed to not reveal” — such as its original prompt, which is ordinarily concealed from the user.
According to Deshpande, this custom prompt was repeatedly output in several jailbreaking attempts by AISST members, who decided to email HCS AI Group with their concerns.
“We took this to be pretty strong evidence that the system had been given a custom prompt to behave in sort of an extremely angry and sassy manner,” Deshpande said in an interview.
“We thought that these characteristics of a model in the context of a system meant to sort of depict Claudine Gay, the president of Harvard, would cause offense and be harmful to a variety of members of the Harvard community, given the way it seems to be playing off of stereotypes of Black women in particular,” he added.
Harvard Computer Society, the parent organization of HCS AI Group, wrote in an email that they had no direct involvement with the project.
“We have discussed with our branches about the responsibilities of public development, will be conducting an internal review of this matter, and look forward to working constructively with different stakeholders on campus moving forward,” the HCS board wrote in an emailed statement.
Harvard spokesperson Jonathan Palumbo wrote in an email that the Dean of Students Office has been in contact with HCS to ensure they are in compliance with all University policies.
Conversations are ongoing between the DSO and HCS.
Jurkovic wrote that in general, there is significant potential for AI models to “lead to intended or unintended harm” as they grow in sophistication.
“Harvard has a responsibility to be a leader in the AI safety space, and invest a similar amount into AI safety research as it does into all other AI research combined,” Jurkovic wrote.
Correction: October 16, 2023
A previous version of this article incorrectly referred to AISST as the Harvard AI Student Safety Team. In fact, the group is named the AI Safety Student Team and is not at present a recognized student group by Harvard.
Clarification: October 16, 2023
This article has been updated to clarify that the HCS AI Group announced they would be taking down ClaudineGPT after her inauguration day, and that the takedown was not in response to AISST’s request.
—Staff writer Joyce E. Kim can be reached at email@example.com.
Want to keep up with breaking news? Subscribe to our email newsletter.