Algorithmic Challenges of Big Data
Much of the recent innovations in the sciences is driven by our ability to perform complex data analysis tasks. In the present day, many of the data sets are no longer small samples generated in a lab but are pooled together from various sources all over the world. This raises the issue of scalability: How do we store the data? How do we extract the information from the data? How do we exchange the information? Such questions have been a strong focus in computer science and machine learning. And while this very active field has many success stories, it is likely that we have only scratched the surface. Many classic problems have not been solved, while new issues such as algorithmic fairness, privacy, or determinism have only recently received attention. This project deals with the challenges of performing modern data analysis at scale.
I have always liked math and as a child I really liked Asimov's Foundation series. I did not consciously start to work in this field because of either, but there must be a connection because now I am working in a discipline that is all about developing mathematical models and tools for analyzing human behavior. In other words, I get to be part of making the Foundation a reality. Don't bother with the TV series.
Some of the problems I look at are just really hard. Here it takes a mixture of perseverance, wits, and a bit of luck to slowly chip away at the tough problems in the field. Whether we have the mathematical tools already available or whether we need new insights remains to be seen. For some of the other problems, such as those arising from ethical concerns, we are facing a challenge that we do not really know what problems to look at. For example, almost everyone will agree that "fairness" in an algorithm sounds like a desirable thing, but it is less clear what the exact fairness notion should be. At this point, the challenge is not just the usual difficulty of solving a tough problem, but to be able to understand and process the ideas from very different areas on how this line of research should proceed.
To a degree, theoretical work on algorithms is a bit of a trickle-down science. Luckily for me, this trickling effect has already largely taken place. The importance of data analysis is not going anywhere but up and some of the algorithms we are trying to understand better are already widely used. Improving these algorithms or even just understanding them better has immediate impact. In the long run, I hope that the work we do on making data analysis scalable will become standard component of every machine learning library. Before I retire, I'd really like someone to prove an optimal coreset algorithm for Euclidean k-means. I'd like that someone to be me, but at this point, I'm no longer picky.
In the short to mid-term, Sapere Aude will allow me to attract brilliant students and postdocs that will help me solve the most tantalizing open problems in big data analysis. In the long run, I see Sapere Aude as laying the foundation of a strong machine learning group at Aarhus University.
I grew up in the States, before spending the rest of my childhood in Germany. After my PhD, I spent most of my time at Sapienza University in Rome, which was a marvelous experience. Even so, I was very happy to come to Aarhus when the opportunity presented itself and here I am now. Aside from algorithms, I really like running, Italian food, the NBA, and heavy metal.
Aarhus University
Algorithms and machine learning
Aarhus
Friedrich Harkort Schule (Germany)