
A campus investigation finds broad student use of large language models and growing faculty frustration as detection proves unreliable; instructors are relying on rewrites and version histories rather than formal adjudication.
A campus investigation found widespread use of large language models (LLMs) among Harvard students and rising faculty frustration about how to enforce academic standards, according to reporting cited in national coverage. Instructors say routine coursework is increasingly assisted or generated by generative models, creating a practical enforcement problem for courses and degree programs. That dynamic is already reshaping responses to suspected misuse: rather than pursuing formal cases, some faculty are shifting to effort — based remedies and manual checks.
Faculty describe a range of classroom experiences: many students now rely on LLMs for assignments, and some have learned tactics to avoid detection, leaving instructors uncertain how to prove when AI was used. Professors have experimented with technical countermeasures such as embedding hidden text markers in prompts and requiring Google Doc version histories for submitted work to show a student’s drafting process. One instructor’s syllabus included a blunt warning: “If your submission reads like it might be AI work, I’ll have you redo the assignment in its entirety.
I am uninterested in proving whether you did or did not use AI. Colleagues and observers have leaned on informal heuristics to spot AI-generated writing: earlier chatbot generations were said to overuse em dashes, and critics have pointed to a characteristic ‘evenhandedness’—a tendency to present balanced, double — sided phrasing — as an AI tell. Those signals have fostered widespread online confidence in so-called “AI-dar,” and many instructors initially relied on such patterns when evaluating student work.
But the same behaviors that allowed quick judgments also make those heuristics fragile. Students and other users can prompt LLMs to add stylistic quirks — extra adverbs, longer or more complex sentences, deliberate typos — or instruct models to adopt a more distinctive voice. Platform features and user-facing controls further let people shape an AI’s communication to personal preferences, meaning superficial ‘tells’ can be erased or mimicked with little technical effort.
The practical consequence on campus has been a retreat from formal adjudication in some cases. Several instructors report they have stopped forwarding suspected incidents to the university honor council because there is no reliable way to prove AI use; others say they will require rewrites in a demonstrably personal voice rather than pursue formal sanctions. Enforcement has thus tilted toward subjective judgments, evidentiary checks like document version histories, and requests for in-person demonstrations of work.
For builders and toolmakers, the investigation underscores a technical implication: detection strategies rooted in stylistic heuristics or surface “tells” are likely to be brittle when users can prompt and tune outputs to evade them. The findings point to an ongoing challenge for detection methods, which must contend with easy prompt — based manipulation and platform features that intentionally alter an AI’s voice while still supporting legitimate uses.
Sources
Replies (0)
No replies in this topic yet.