Building TRUST in Text: Linguistically Motivated Language Model Detection
Many people are already aware that Large Language Models (LLMs) give wrong answers or misinformation “by accident”. What has recently become clear, is that they can also be poisoned by outsiders with the goal of promoting specific agendas. Imagine, for instance, a doctor using a "poisoned LLM that subtly nudges them towards prescribing one company’s medication over a competitor’s. I hypothesise that subtle linguistic signals in generated text can reveal that a model has been poisoned in this way. My aim is to develop new knowledge and methods that can both detect poisoning and make LLMs more robust and trustworthy. As LLMs are being deployed across society, this is becoming increasingly important.
I’ve had a lifelong interest in languages, which spurred me to start studying linguistics after high school. I chose to study at Stockholm University – at the time, this was one of few places which offered a specialisation in computational linguistics. As I wanted to learn more, I landed a PhD position in Natural Language Processing (NLP), just as a methodology known as neural networks started to gain popularity. Today, this is the technology that drives modern artificial intelligence, including LLMs. While many AI researchers overlook linguistic insights, I believe these insights are the key to scientific breakthroughs in NLP. My love for languages is part of what inspires me to always take a multilingual perspective in my work, and in this project this allows us to work on making LLMs safe and secure across language communities worldwide.
Detecting poisoned text in LLMs means tackling a few key hurdles. First, we must assemble realistic examples of tampering – across topics and languages. Next, the linguistic “fingerprints” of an attack are likely incredibly subtle, meaning that we need methods that can pick out small linguistic shifts. Our solutions also have to run fast and at scale, since deployed models serve millions of users – and are already incredibly costly. And, because bad actors will tweak their methods over time, our detection techniques have to keep evolving.
As LLMs are being deployed across society, e.g. in healthcare, education, and finance, the potential for attacks is increasing steadily. My project allows us to stay at the forefront of ensuring that LLMs remain secure, putting us a step ahead of would-be attackers. On one hand, the project aims at making concrete technological contributions which can ensure secure LLMs, thereby removing one of the obstacles for LLM implementation in society. At the same time, one of my goals is to inform the public of this relatively unknown risk. While many of us are aware that LLMs can "make mistakes", few are aware that someone might poison an LLM with the distinct goal of misleading a specific user in a specific way. Awareness of this fact can go a long way towards mitigating risks, while we're building our defences.
I am incredibly honoured to have been awarded the Sapere Aude: DFF-Research Leader-grant. It is a validation of the fundamental research I have carried out since completing my PhD, and a solid booster rocket for my continued growth as an independent research leader. In this project, I’ll be able to focus on an entirely new interdisciplinary research direction, together with excellent external partners at Stockholm University and NVIDIA. The two PhD students and the postdoc, joining the project, will be among the first to explore the emerging field of LLM security with a linguistic twist, which provides them a uniquely strong position for their future careers after the project. Furthermore, the grant contributes significantly to ensuring that the research environment I have started to build up over the past few years can continue to thrive.
Aalborg University
Natural Language Processing (NLP)
Furesø
Hamar Katedralskole
I’m originally from Norway, and I have settled down in Farum with my Danish wife and our three small children. Much of my time is spent navigating the joyful chaos of family life. My love for languages has always been both a personal and professional passion. When not working on AI and language models, I enjoy cooking elaborate meals (as time allows), travelling with my family, and honing my skills in crafting dad jokes for my family to enjoy (or not). Balancing research and family life is a challenge, and at the same time a constant source of perspective and inspiration.