Johannes Bjerring Bjerva

Research leader

Professor

Project title

Building TRUST in Text: Linguistically Motivated Language Model Detection

What is your project about?

Many people are already aware that Large Language Models (LLMs) give wrong answers or misinformation “by accident”. What has recently become clear, is that they can also be poisoned by outsiders with the goal of promoting specific agendas. Imagine, for instance, a doctor using a "poisoned LLM that subtly nudges them towards prescribing one company’s medication over a competitor’s. I hypothesise that subtle linguistic signals in generated text can reveal that a model has been poisoned in this way. My aim is to develop new knowledge and methods that can both detect poisoning and make LLMs more robust and trustworthy. As LLMs are being deployed across society, this is becoming increasingly important.

How did you become interested in your particular field of research?

I’ve had a lifelong interest in languages, which spurred me to start studying linguistics after high school. I chose to study at Stockholm University – at the time, this was one of few places which offered a specialisation in computational linguistics. As I wanted to learn more, I landed a PhD position in Natural Language Processing (NLP), just as a methodology known as neural networks started to gain popularity. Today, this is the technology that drives modern artificial intelligence, including LLMs. While many AI researchers overlook linguistic insights, I believe these insights are the key to scientific breakthroughs in NLP. My love for languages is part of what inspires me to always take a multilingual perspective in my work, and in this project this allows us to work on making LLMs safe and secure across language communities worldwide.

What are the scientific challenges and perspectives in your project?

Detecting poisoned text in LLMs means tackling a few key hurdles. First, we must assemble realistic examples of tampering – across topics and languages. Next, the linguistic “fingerprints” of an attack are likely incredibly subtle, meaning that we need methods that can pick out small linguistic shifts. Our solutions also have to run fast and at scale, since deployed models serve millions of users – and are already incredibly costly. And, because bad actors will tweak their methods over time, our detection techniques have to keep evolving.

What is your estimate of the impact, which your project may have to society in the long term?

As LLMs are being deployed across society, e.g. in healthcare, education, and finance, the potential for attacks is increasing steadily. My project allows us to stay at the forefront of ensuring that LLMs remain secure, putting us a step ahead of would-be attackers. On one hand, the project aims at making concrete technological contributions which can ensure secure LLMs, thereby removing one of the obstacles for LLM implementation in society. At the same time, one of my goals is to inform the public of this relatively unknown risk. While many of us are aware that LLMs can "make mistakes", few are aware that someone might poison an LLM with the distinct goal of misleading a specific user in a specific way. Awareness of this fact can go a long way towards mitigating risks, while we're building our defences.

Which impact do you expect the Sapere Aude programme will have on your career as a researcher?

I am incredibly honoured to have been awarded the Sapere Aude: DFF-Research Leader-grant. It is a validation of the fundamental research I have carried out since completing my PhD, and a solid booster rocket for my continued growth as an independent research leader. In this project, I’ll be able to focus on an entirely new interdisciplinary research direction, together with excellent external partners at Stockholm University and NVIDIA. The two PhD students and the postdoc, joining the project, will be among the first to explore the emerging field of LLM security with a linguistic twist, which provides them a uniquely strong position for their future careers after the project. Furthermore, the grant contributes significantly to ensuring that the research environment I have started to build up over the past few years can continue to thrive.

Johannes Bjerring Bjerva

Research leader

Project title

What is your project about?

How did you become interested in your particular field of research?

What are the scientific challenges and perspectives in your project?

What is your estimate of the impact, which your project may have to society in the long term?

Which impact do you expect the Sapere Aude programme will have on your career as a researcher?

Research institution

Research field

City of your current residence

High school

Background and personal life

View all research leaders here

Questions about your application?

Questions about your grant?

Secure mail

Contact the fund

Questions about applications?

Telephone Hours

E-mail Enquiries