People of ACM - Anne-Marie Kermarrec

January 5, 2017

Why has there been such an increase in research recently at the intersection of scalability and distributed computing?

Distributed systems used to represent a rather small area of computer science, mostly on systems designed to run large (and computing resource-greedy) applications. Today, however, pretty much every system out there is distributed. This is due to the combination of several factors: the democratization of the Internet, the explosion of mobile devices, the advent of cloud computing and the software-as-a-service (SaaS) model. Today, pretty much any application requires the ability to access data remotely and perform some form of synchronization, and therefore is widely distributed.

Not only are almost all computer systems now distributed, but they also require more scalability than before. The volume of data out there is exploding, powerful machine learning algorithms assist us in pretty much every daily task, and data analytics is at the core of most businesses today. Clearly, such systems must be able to gracefully address an increasing amount of data to process, and this requires extreme scalability capabilities.

This situation keeps evolving, there is more and more data produced every second, as well as more and more devices (such as those in the Internet of Things), and there is no reversal of this trend with respect to scalability and distribution. Those are important research areas in which there is still a lot of exciting work to achieve.

The algorithm employed in Mediego processes big data in real time to create personalization profiles. How is Mediego’s approach to personalization new, and why is instantaneous processing important?

I’m coming from a distributed system background—more specifically, I’ve been very active in my research in the area of peer-to-peer systems. Now a peer-to-peer system is by essence scalable in the sense that as clients also act as servers, the number of servers linearly increases with the number of clients. Now, they also complicate significantly the algorithm design: whatever operation that runs on a P2P system requires each entity to process part of the computation based on local data in such a way that eventually the P2P system as a whole delivers a service. In other words, a well-designed P2P algorithm is by default scalable. This clearly is the novelty behind our algorithms; they have been designed with a P2P mindset and therefore are inherently scalable.

Why is instantaneous so important? Because we are living in an instantaneous world, the content is highly dynamic (for example, 350,000 tweets are generated each minute and 300 million photos are uploaded on Facebook every day) and users are more impatient than ever. On average, a Netflix user, even though she is as relaxed as she is in in movie-watching watching mode, will only wait 60 seconds to be satisfied before switching platforms. On a standard web page, the patience degree is more on the order of a few seconds. Therefore, any online service today needs to be provided at a very low latency. To be able to process a huge volume of data for millions of users and within 50 milliseconds, our algorithms need to be scalable as well as instantaneous!

In addition to deftly targeting consumers, what are some other applications in which this approach to personalization infrastructures could be used?

Today pretty much any provider (content provider, online media, e-commerce, insurance, etc.) faces the same problem. They have a huge volume of available products or items to display and/or sell, and a tiny space (typically the size of a mobile phone) and time slot (typically the few seconds of patience) to hook their customers. So it is crucial to make the most of all marketing efforts (on the web, in emails, in direct mail, etc.). Personalization will come as a crucial feature to address this challenging context in many areas.

Does the climate of computing research and tech startups in France have certain unique opportunities and challenges?

France is a great country for launching your startup. We do have a lot of support from local and national organizations, research institutes, universities and the government (tax-wise there is also a lot of support for startups). Finally, seed money is also relatively easy to get. So in many ways France is a great place to start a new venture. However, there are a few challenges that we have to face in France. In the business-to-business arena, my impression is that French companies do not trust startups as much as they trust big companies. Penetrating the market as a young and small startup can therefore be very challenging. Also, the amount of money raised from venture capital firms is usually much lower here than what seems to be possible in the US.

I strongly believe that France has a lot of potential for innovation: great creativity, strong technologies, and a strong support from the ecosystem. There will be even more opportunities if the research and the business worlds form stronger connections.

Anne-Marie Kermarrec is a Research Director at L'Institut national de recherche en informatique et en automatique (Inria) in Rennes, where she led a research group on large-scale distributed systems from 2006 to 2015.

She is also CEO of Mediego, the startup she co-founded in 2015, providing online personalized predictive services that directly leverage her recent research. She works on large-scale distributed systems and recommenders. Her research work was pioneering in peer-to-peer (P2P) and gossip-based algorithms, as well as large-scale personalization infrastructures. Kermarrec was recently named a 2016 ACM Fellow for contributions to large-scale distributed computing.