|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
       
 
Senior researcher at Inria Saclay
Part-time professor at École Polytechnique
Photo
4A318, Place Marguerite Perey
F-91120 Palaiseau, France
Born on 1976/10/20, Paris
French Citizen, 2 children
 -UUU:----F1  welcome.txt       All (2,7)  (C/*l Abbrev) ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Bio

Since 2023, I have been a senior researcher at Inria Saclay and a part-time professor at École Polytechnique in France. My research focuses on virtualization, operating systems, concurrency, and language runtimes. I am particularly interested in improving the performance, the design, and the safety of the systems. I lead the Benagil team, which is a joint team between Inria and Telecom SudParis/IP Paris. After having been the chair of the French chapter of the ACM SIGOPS from 2011 to 2014, I acted as treasurer from 2014 to 2016. I received my PhD degree from UPMC Sorbonne Université in 2005 and my "Habilitation à diriger les recherche", also from UPMC Sorbonne Université, in 2012. From 2006 to 2014, I was an associate professor at UPMC Sorbonne Université in the LIP6 laboratory, and from 2014 to 2023, I was a full professor at Telecom SudParis/IP Paris. In 2005, I performed postdoctoral research at the Université de Grenoble "Joseph Fourier".

Education and experience

2023 – today Senior researcher at Inria Saclay
2023 – today Part-time professor at École Polytechnique
2014 – 2023 Professor at Telecom SudParis
2006 – 2014 Assistant Professor (maître de conférences) at UPMC Sorbonne Université, Regal Team – INRIA/LIP6
Habilitation à dirigée les recherches (HdR) in 2012
2005 – 2006 PostDoc at Université Joseph Fourier (Grenoble/France). Adele Team – LSR (today LIG)
2001 – 2005 Ph.D. Thesis under direction of Prof. B. folliot, Université Pierre et Marie Curie (UPMC). SRC Team – LIP6
2004 – 2005: Teaching assistant (ATER) – UPMC
2001 – 2004: Teaching Assistant (Moniteur) – UPMC
1999 – 2001 Masters degree in Computer Science at UPMC
Magistère d’Informatique Appliquée d’Ile de France (MIAIF)
Maîtrise d'Informatique/DEA Système Informatiques Répartis
1997 – 1999 Bachelors/First year of masters degree (M1) in Math. Sciences at UPMC
1994 – 1997 Bachelors degree in Physical Sciences at UPMC

Research

Selected publications

[All] [DBLP] [Google Scholar]
  • J-NVM: Off-heap Persistent Objects in Java. Anatole Lefort, Yohan Pipereau, Kwabena Amponsem, Pierre Sutra and Gaël Thomas. In Proceedings of the Symposium on Operating Systems Principles, SOSP'21, pages 16.  2021. . [Abstract] [BibTeX] [.pdf] This paper presents J-NVM, a framework to access efficiently Non-Volatile Main Memory (NVMM) in Java. J-NVM offers a fully-fledged interface to persist plain Java objects using failure-atomic blocks. This interface relies internally on proxy objects that intermediate direct off-heap access to NVMM. The framework also provides a library of highly-optimized persistent data types that resist reboots and power failures. We evaluate J-NVM by implementing a persistent backend for the Infinispan data store. Our experimental results, obtained with a TPC-B like benchmark and YCSB, show that J-NVM is consistently faster than other approaches at accessing NVMM in Java.
    @inproceedings{sosp:21:lefort:jnvm,
      author = { Lefort, Anatole and Pipereau, Yohan and Amponsem, Kwabena and Sutra, Pierre and Thomas, Gaël },
      title = {J-NVM: Off-heap Persistent Objects in Java},
      booktitle = {Proceedings of the Symposium on Operating Systems Principles, SOSP'21},
      publisher = {ACM},
      year = {2021},
      pages = {16}
    }
  • When eXtended Para-Virtualization (XPV) Meets NUMA. Bao Bui, Djob Mvondo, Boris Teabe, Kevin Jiokeng, Lavoisier Wapet, Alain Tchana, Gaël Thomas, Daniel Hagimont, Gilles Muller and Noel De Palma. In Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'19, pages 15.  2019. . [Abstract] [BibTeX] [.pdf] This paper addresses the problem of efficiently virtualizing NUMA architectures. The major challenge comes from the fact that the hypervisor regularly reconfigures the placement of a virtual machine (VM) over the NUMA topology. However, neither guest operating systems (OSes) nor system runtime libraries (e.g., Hotspot) are designed to consider NUMA topology changes at runtime, leading end user applications to unpredictable performance. This paper presents eXtended Para-Virtualization (XPV), a new principle to efficiently virtualize a NUMA architecture. XPV consists in revisiting the interface between the hypervisor and the guest OS, and between the guest OS and system runtime libraries (SRL) so that they can dynamically take into account NUMA topology changes. The paper presents a methodology for systematically adapting legacy hypervisors, OSes, and SRLs. We have applied our approach with less than 2k line of codes in two legacy hypervisors (Xen and KVM), two legacy guest OSes (Linux and FreeBSD), and three legacy SRLs (Hotspot, TCMalloc, and jemalloc). The evaluation results showed that XPV outperforms all existing solutions by up to 304%.
    @inproceedings{eurosys:19:bui:xpv,
      author = { Bui, Bao and Mvondo, Djob and Teabe, Boris and Jiokeng, Kevin and Wapet, Lavoisier and Tchana, Alain and Thomas, Gaël and Hagimont, Daniel and Muller, Gilles and De Palma, Noel },
      title = {When eXtended Para-Virtualization (XPV) Meets NUMA},
      booktitle = {Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'19},
      publisher = {ACM},
      year = {2019},
      pages = {15}
    }
  • An interface to implement NUMA policies in the Xen hypervisor. Gauthier Voron, Gaël Thomas, Vivien Quéma and Pierre Sens. In Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'17, pages 14.  2017. . [Abstract] [BibTeX] [.pdf] While virtualization only introduces a small overhead on machines with few cores, this is not the case on larger ones. Most of the overhead on the latter machines is caused by the Non-Uniform Memory Access (NUMA) architecture they are using. In order to reduce this overhead, this paper shows how NUMA placement heuristics can be implemented inside Xen. With an evaluation of 29 applications on a 48-core machine, we show that the NUMA placement heuristics can multiply the performance of 9 applications by more than 2.
    @inproceedings{eurosys:17:voron:xen-numa,
      author = {Voron, Gauthier and Thomas, Gaël and Quéma, Vivien and Sens, Pierre},
      title = {An interface to implement NUMA policies in the Xen hypervisor},
      booktitle = {Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'17},
      publisher = {ACM},
      year = {2017},
      pages = {14}
    }
  • Fast and Portable Locking for Multicore Architectures. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. ACM Transactions on Computer Systems (TOCS). Vol. 33(4), pages 13:1-13:62.  2016. . [Abstract] [BibTeX] [.pdf] The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking technique, Remote Core Locking (RCL), that aims to accelerate the execution of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server hardware thread. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock, because such data can typically remain in the server’s cache. Other contributions presented in this article include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX lock acquisitions into RCL locks.

    Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an x86 machine with four AMD Opteron processors and 48 hardware threads. By using RCL instead of Linux POSIX locks, performance is improved by up to 2.5 times on Memcached, and up to 11.6 times on Berkeley DB with the TPC-C client. On a SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to Solaris POSIX locks on Memcached, and up to 7.9 times on Berkeley DB with the TPC-C client.
    @article{tocs:16:lozi:rcl,
      author = {Lozi, Jean-Pierre and David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Fast and Portable Locking for Multicore Architectures},
      journal = {ACM Transactions on Computer Systems (TOCS)},
      publisher = {ACM},
      year = {2016},
      volume = {33},
      number = {4},
      pages = {13:1--13:62}
    }
  • NumaGiC: a garbage collector for big data on big NUMA machines. Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro and Nhan Nguyen. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15, pages 14.  2015. . [Abstract] [BibTeX] [.pdf] On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160GB to 350GB, and on SPECjbb2013 and SPECjbb2005, NumaGiC improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6x over NAPS (up to 5.4x over Parallel Scavenge).
    @inproceedings{asplos:15:gidra:numagic,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc and Nguyen, Nhan},
      title = {NumaGiC: a garbage collector for big data on big NUMA machines},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15},
      publisher = {ACM},
      year = {2015},
      pages = {14}
    }
  • Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler. Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14, pages 14.  2014. . [Abstract] [BibTeX] [.pdf] Today, Java is regularly used to implement large multi-threaded server-class applications that use locks to protect access to shared data. However, understanding the impact of locks on the performance of a system is complex, and thus the use of locks can impede the progress of threads on configurations that were not anticipated by the developer, during specific phases of the execution. In this paper, we propose Free Lunch, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock. Free Lunch is designed around a new metric, critical section pressure (CSP), which directly correlates the progress of the threads to each of the locks. Using Free Lunch, we have identified phases of high CSP, which were hidden with other lock profilers, in the distributed Cassandra NoSQL database and in several applications from the DaCapo 9.12, the SPECjvm2008 and the SPECjbb2005 benchmark suites. Our evaluation of Free Lunch shows that its overhead is never greater than 6%, making it suitable for in-vivo use.
    @inproceedings{oopsla:14:david:free-lunch,
      author = {David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler},
      booktitle = {Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14},
      publisher = {ACM},
      year = {2014},
      pages = {14}
    }
  • Faults in Linux 2.6. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Gilles Muller and Julia Lawall. ACM Transactions on Computer Systems (TOCS). Vol. 32(2), pages 4:1-4:40.  2014. . [Abstract] [BibTeX] [.pdf] In August 2011, Linux entered its third decade. Ten years before, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired numerous efforts on improving the reliability of driver code. Today, Linux is used in a wider range of environments, provides a wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality?

    To answer this question, we have transported Chou et al.'s experiments to all versions of Linux 2.6; released between 2003 and 2011. We find that Linux has more than doubled in size during this period, but the number of faults per line of code has been decreasing. Moreover, the fault rate of drivers is now below that of other directories, such as arch. These results can guide further development and research efforts for the decade to come. To allow updating these results as Linux evolves, we define our experimental protocol and make our checkers available.
    @article{tocs:14:palix:faults,
      author = {Palix, Nicolas and Thomas, Gaël and Saha, Suman and Calvès, Christophe and Muller, Gilles and Lawall, Julia},
      title = {Faults in Linux 2.6},
      journal = {ACM Transactions on Computer Systems (TOCS)},
      publisher = {ACM},
      year = {2014},
      volume = {32},
      number = {2},
      pages = {4:1--4:40}
    }
  • A study of the scalability of stop-the-world garbage collectors on multicores. Lokesh Gidra, Gaël Thomas, Julien Sopena and Marc Shapiro. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, pages 229-240.  2013. . [Abstract] [BibTeX] [.pdf] Large-scale multicore architectures are problematic for garbage collection (GC). In particular, throughput-oriented stop-the-world algorithms demonstrate excellent performance with a small number of cores, but have been shown to degrade badly beyond approximately 20 cores on OpenJDK 7. This negative result raises the question whether the stop-the-world design has intrinsic limitations that would require a radically different approach. Our study suggests that the answer is no, and that there is no compelling scalability reason to discard the existing highly-optimised throughput-oriented GC code on contemporary hardware. This paper studies the default throughput-oriented garbage collector of OpenJDK 7, called Parallel Scavenge. We identify its bottlenecks, and show how to eliminate them using well-established parallel programming techniques. On the SPECjbb2005, SPECjvm2008 and DaCapo 9.12 benchmarks, the improved GC matches the performance of Parallel Scavenge at low core count, but scales well, up to 48 cores.
    @inproceedings{asplos:13:gidra:naps,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc},
      title = {A study of the scalability of stop-the-world garbage collectors on multicores},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13},
      publisher = {ACM},
      year = {2013},
      pages = {229--240}
    }
  • Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, pages 65-76.  2012. . [Abstract] [BibTeX] [.pdf] The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. In this paper, we propose a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock because such data can typically remain in the server core's cache.

    We have developed a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX locks into RCL locks. We have evaluated our approach on 18 applications: Memcached, Berkeley DB, the 9 applications of the SPLASH-2 benchmark suite and the 7 applications of the Phoenix2 benchmark suite. 10 of these applications, including Memcached and Berkeley DB, are unable to scale because of locks, and benefit from RCL. Using RCL locks, we get performance improvements of up to 2.6 times with respect to POSIX locks on Memcached, and up to 14 times with respect to Berkeley DB.
    @inproceedings{usenix-atc:12:lozi:rcl,
      author = {Lozi, Jean-Pierre and David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications},
      booktitle = {Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12},
      publisher = {USENIX Association},
      year = {2012},
      pages = {65--76}
    }
  • Faults in Linux: ten years later. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Julia Lawall and Gilles Muller. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11, pages 305-318.  2011. . [Abstract] [BibTeX] [.pdf] In 2001, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired a number of development and research efforts on improving the reliability of driver code. Today Linux is used in a much wider range of environments, provides a much wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality? Are drivers still a major problem?

    To answer these questions, we have transported the experiments of Chou et al. to Linux versions 2.6.0 to 2.6.33, released between late 2003 and early 2010. We find that Linux has more than doubled in size during this period, but that the number of faults per line of code has been decreasing. And, even though drivers still accounts for a large part of the kernel code and contains the most faults, its fault rate is now below that of other directories, such as arch (HAL) and fs (file systems). These results can guide further development and research efforts. To enable others to continually update these results as Linux evolves, we define our experimental protocol and make our checkers and results available in a public archive.
    @inproceedings{asplos:11:palix:faults,
      author = {Palix, Nicolas and Thomas, Gaël and Saha, Suman and Calvès, Christophe and Lawall, Julia and Muller, Gilles},
      title = {Faults in Linux: ten years later},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11},
      publisher = {ACM},
      year = {2011},
      pages = {305--318}
    }

Habilitation à Diriger les Recherches and PhD thesis

HDR thesis in 2012, Improving the design and the performance of managed runtime environments, with the following committee:

  • Emery Berger, Associate Professor, University of Massachusetts, Amherst (rapporteur)
  • Albert Cohen, Senior researcher, INRIA Saclay
  • Bertil Folliot, Professor, UPMC Sorbonne Université
  • Gilles Muller, Senior researcher, INRIA Rocquencourt
  • Wolfgang Schröder-Preikschat, Professor, Erlangen-Nürnberg University
  • Jan Vitek, Professor, Purdue University (rapporteur)
  • Willy Zwaenepoel, Professor, École Polytechnique Fédérale de Lausanne (rapporteur)

PhD Thesis in 2005 under the direction of Professor Bertil Folliot, Application actives : Construction dynamique d'environnements flexibles homogènes, with the following committee:

  • Bertil Folliot, Professor, UPMC Sorbonne Université
  • Gilles Grimaud, Associate Professor (HDR), Université de Lille
  • Jacques Malenfant, Professor, UPMC Sorbonne Université
  • Gilles Muller, Professor, École des Mines de Nantes (rapporteur)
  • Jean-Bernard Stefani, Senior researcher, INRIA Grenoble (rapporteur)
  • Willy Zwaenepoel, Professor, École Polytechnique Fédérale de Lausanne

Student Supervisions

If you are interested in a PhD, send me a note. You will need to demonstrate real scientific curiosity, and interest in research topics such as concurrent programming, virtualization, operating systems, language runtimes, etc.

Ongoing ()

Defended ()

  • Subashiny Tanigassalame, (2018-2024). Privagic: confidential computing made practical with secure typing. Currently postdoc researcher at Inria (France). [Abstract]

    For more than twenty years, several tools have been proposed to auto- matically partition an application between a secure memory zone and a non- secure memory zone. These tools analyze the data flow of the application in order to identify the memory locations that may contain sensitive values. Most of these tools behave incorrectly in the presence of pointers. When they are correct, they are unable to handle threads because of the difficulty to track pointers in a multi-threaded application. The current tools are also unable to split an application in more than two partitions. This is caused by over-approximation, which leads to memory locations falsely shared between the two partitions.

    In this thesis, instead of starting from data flow analysis, we propose to start from a more accurate technique: language typing. We introduce secure typing, which consists in embedding a partition identifier in the type sys- tem of a language. Based on secure typing, we designed a language-agnostic compiler based on LLVM. The compiler takes a legacy application enriched with secure types as input, and generates multiple partitions for Intel SGX. Our evaluation with micro- and macro-applications show that (i) secure typ- ing can handle pointers, multiple threads and more than two partitions, (ii) adding secure types in a legacy application is easy, (iii) secure typing re- duces the trusted computing base, and is more efficient than embedding a full application inside an enclave.

  • Remi Dulong, co-advised with Pascal Felber at 50% (2018-2023). Towards New Memory Paradigms: Integrating Non-Volatile Main Memory and Remote Direct Memory Access in Modern Systems. Currently research engineer at Michelin (Switzerland). [Abstract]

    Modern computers are built around two main parts: their Central Processing Unit (CPU), and their volatile main memory, or Random Access Memory (RAM). The basis of this architecture takes its roots in the 1970’s first computers. Since, this principle has been constantly upgraded to provide more functionnality and performance.

    In this thesis, we study two memory paradigms that drastically change the way we can interact with memory in modern systems: non-volatile memory and remote memory access. We implement software tools that leverage them in order to make them compatible and exploit their performance with concrete applications. We also analyze the impact of the technologies underlying these new memory medium, and the perspectives of their evolution in the coming years.

    For non-volatile memory, as the main memory performance is key to unlock the full potential of a CPU, this feature has historically been abandoned on the race for performance. Even if the first computers were designed with non-volatile forms of memory, computer architects started to use volatile RAM for its incomparable performance compared to durable storage, and never questioned this decision for years. However, in 2019 Intel released a new component called Optane DC Persistent Memory (DCPMM), a device that made possible the use of Non-Volatile Main Memory (NVMM). That product, by its capabilities, provides a new way of thinking about data persistence. Yet, it also challenges the hardware architecture used in our current machines and the way we program them.

    With this new form of memory we implemented NVCACHE, a cache designed for non-volatile memory that helps boosting the interactions with slower persistent storage medias, such as Solid State Drive (SSD). We find NVCACHE to be quite performant for workloads that require a high granularity of persistence guarantees, while being as easy to use as the traditional POSIX interface. Compared to file systems designed for NVMM, NVCACHE can reach similar or higher throughput when the non-volatile memory is used. In addition, NVCACHE allows the code to exploit NVMM performance while not being limited by the amount of NVMM installed in the machine.

    Another major change of in the computer landscape has been the popularity of distributed systems. As individual machines tend to reach performance limitations, using several machines and sharing workloads became the new way to build powerful computers. While this mode of computation allows the software to scale up the number of CPUs used simultaneously, it requires fast interconnection between the computing nodes. For that reason, several communication protocols implemented Remote Direct Memory Access (RDMA), a way to read or write directly into a distant machine’s memory. RDMA provides low latencies and high throughput, bypassing many steps of the traditional network stack.

    However, RDMA remains limited in its native features. For instance, there is no advanced multicast equivalent for the most efficient RDMA functions. Thanks to a programmable switch (the Intel Tofino), we implemented a special mode for RDMA that allows a client to read or write in multiple servers at the same time, with no performance penalty. Our system called Byp4ss makes the switch participate in transfers, duplicating RDMA packets. On top of Byp4ss, we implement a consensus protocol named DISMU, which shows the typical use of Byp4ss features and its impact on performance. By design, DISMU is optimal in terms of latency and throughput, as it can reduce to the minimum the number of packets exchanged through the network to reach a consensus.

    Finally, by using these two technologies, we notice that future generations of hardware may require a new interface for memories of all kinds, in order to ease the interoperability in systems that tend to get more and more heterogeneous and complex.

  • Yohan Pipereau, co-advised with Mathieu Bacou at 50% (2019-2023). Improving memory usage in virtual machines. Currently postdoc at Telecom SudParis (France). [Abstract]

    Data-centers rely on virtual machines (VMs) to offer isolation between deployments. While, the use of VMs enables better resource usage compared to running a service per bare-metal machine, it achieves poorer resource usage than multiprocessus solutions. This is caused by two phenomon: At VM allocation time, VMs are scheduled as resource requests on a VM scheduler which perform virtual machine allocations across a set of servers. Optimal solution to this scheduling problem is NP-hard leading to the adoption of heuristic based allocation that let a good percentage of unallocated memory on each servers known as 'stranded memory'. At VM runtime, VM memory is consumed on-demand and the difference between memory allocation and usage results in a decent portion of 'allocated unused memory' currently impracticly usable.

    First, we propose a transparent solution for applications running inside VMs to remotely access stranded memory in remote machines with fine-grained reservation of remote resources. Second, we review current techniques trying to fit allocated memory to used memory. We show that all these techniques are managed by the hypervisor and introduce performance degradation in VMs and more importantly high response time which makes resource sharing unpractical. Instead, we propose an abstraction to perform VM-initiated memory provisionning and we present early result of fast adaptation of VM memory.

  • Anatole Lefort, co-advised with Pierre Sutra at 50% (2018-2022). A Support for Persistent Memory in Java. Currently postdoc at Technical University of Munich (Germany). [Abstract]

    Recently released non-volatile main memory (NVMM), as fast and durable memory, dramatically increases storage performance over traditional media (SSD, hard disk). A substantial and unique property of NVMM is byte-addressability -- complex memory data structures, maintained with regular load/store instructions,can now resist machine power-cycles, software faults or system crashes. However, correctly managing persistence with the fine grain of memory instructions is laborious, with increased risk of compromising data integrity and recovery at any misstep. Programming abstractions from software libraries and support from language runtime and compilers are necessary to avoid memory bugs that are exacerbated with persistence. In this thesis, we address the challenges of supporting persistent memory in managed language environments by introducing J-NVM, a framework to efficiently access NVMM in Java. With J-NVM, we demonstrate how to design an efficient, simple and complete interface to weave NVMM-devised persistence into object-oriented programming,while remaining unobtrusive to the language runtime itself. In detail, J-NVM offers a fully-fledged interface to persist plain Java objects using failure-atomic sections. This interface relies internally on proxy objects that intermediate direct off-heap access to NVMM. The framework also provides a library of highly-optimized persistent data types that resist reboots and power failures. We evaluate J-NVM by implementing a persistent backend for Infinispan, an industrial-grade data store. Our experimental results, obtained with a TPC-B like benchmark and YCSB, show that J-NVM is consistently faster than other approaches at accessing NVMM in Java.

  • Alexis Lescouet, co-advised with Élisabeth Brunet at 50% (2017-2021). Memory management for operating systems and runtimes. Currently engineer at Nutanix (United Kingdom). [Abstract]

    During the last decade, the need for computational power has increased due to the emergence and fast evolution of fields such as data analysis or artificial intelligence. This tendency is also reinforced by the growing number of services and end-user devices. Due to physical constraints, the trend for new hardware has shifted from an increase in processor frequency to an increase in the number of cores per machine.

    This new paradigm requires software to adapt, making the ability to manage such a paral- lelism the cornerstone of many parts of the software stack.

    Directly concerned by this change, operating systems have evolved to include complex rules each pertaining to different hardware configurations. However, more often than not, resources management units are responsible for one specific resource and make a decision in isolation. Moreover, because of the complexity and fast evolution rate of hardware, operating systems, not designed to use a generic approach have trouble keeping up. Given the advance of virtualization technology, we propose a new approach to resource management in complex topologies using virtualization to add a small software layer dedicated to resources placement in between the hardware and a standard operating system.

    Similarly, in user space applications, parallelism is an important lever to attain high perfor- mances, which is why high performance computing runtimes, such as MPI, are built to increase parallelism in applications. The recent changes in modern architectures combined with fast networks have made overlapping CPU-bound computation and network communication a key part of parallel applications. While some degree of overlap might be attained manually, this is often a complex and error prone procedure. Our proposal automatically transforms blocking communications into nonblocking ones to increase the overlapping potential. To this end, we use a separate communication thread responsible for handling communications and a memory protection mechanism to track memory accesses in communication buffers. This guarantees both progress for these communications and the largest window during which communication and computation can be processed in parallel.

  • Gauthier Voron, co-advised with Pierre Sens at 70% (2014-2018). Virtualisation efficace d'architectures NUMA. Currently postdoc at EPFL (Switzerland). [Abstract]

    The virtualization technology and the NUMA architecture both evolved independently to tackle different issues: reduce hardware usage cost for the first, produce more powerful hardware for the second. Nonetheless, nowadays, the hardware used in the cloud data centers uses NUMA architectures and thus, the virtual machines are executed atop such hardware. The virtualization software has, however, not been designed for NUMA architectures. Because of this poor integration, the applications executed inside a virtual machine running atop of a NUMA architecture may have low performance. As the combined use of NUMA architectures and virtualization is relatively recent, because of the cloud computing emergence, only a few works address this performance issue.

    My PhD thesis addresses the challenge of efficiently virtualizing a NUMA architecture in a cloud infrastructure. In detail, my research is twofold. On the first side, my research has the goal of measuring how virtualization behaves on a NUMA architecture, and how and why a NUMA architecture changes the performance of virtualized applications.

  • Mohamed Said Mosli Bouksiaa, co-advised with François Trahay at 30% (2014-2018). Performance variation considered helpful. Currently engineer at Claap (France). [Abstract]

    Understanding the performance of a multi-threaded application is difficult. The threads interfere when they access the same hardware resource or the same synchronization primitive, which slows down their execution. Unfortunately, current profiling tools reports the hardware components or the synchronization primitives that saturate, but they cannot tell if the saturation is the cause of a performance bottleneck.

    In this PhD these, I propose a holistic metric able to pinpoint the blocks of code that suffer interference the most, regardless of the interference cause. The metric relies on differential execution, but instead of comparing previously identified inefficient runs with efficient ones, I consider performance variation as a universal indicator of interference problems. With an evaluation of 27 applications I show that the metric can identify interference problems caused by 6 different kinds of interactions in 9 applications.

  • Lokesh Gidra, co-advised with Marc Shapiro at 70% and Julien Sopena (2011-2015). Garbage Collector for memory intensive applications on NUMA architectures. Currently engineer at Google (US). [Abstract]

    Large-scale multicore architectures create new challenges for garbage collectors (GCs). On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. In this thesis, we address this problem with NumaGiC, a GC with a mostly-distributed design.

    In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism and increase memory access imbalance, by allowing threads to steal from other nodes when they are idle. NumaGiC strives to find a perfect balance between local access, memory access balance, and parallelism.

    In this work, we compare NumaGiC with Parallel Scavenge and some of its incrementally improved variants on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160 GB to 350 GB, and on SPECjbb2013 and SPECjbb2005, NumaGiC improves over- all performance by up to 94% over Parallel Scavenge, and increases the performance of the collector itself by up to 5.4× over Parallel Scavenge. In terms of scalability of GC throughput with increasing number of NUMA nodes, NumaGiC scales substantially better than Parallel Scavenge for all the applications. In fact in case of SPECjbb2005, where inter-node object references are the least among all, NumaGiC scales almost linearly.

  • Florian David, co-advised with Gilles Muller at 50% (2011-2015). Continuous and Efficient Lock Profiling for Java on Multicore Architectures. Currently engineer at Datadog (France). [Abstract]

    Today, Java is regularly used to implement large multithreaded server-class applications that use locks to protect access to shared data. However, understanding the impact of locks on the performance of a system is complex, and thus the use of locks can impede the progress of threads on configurations that were not anticipated by the developer, during specific phases of the execution. In this paper, we propose Free Lunch, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock. Free Lunch is designed around a new metric, critical section pressure (CSP), which directly correlates the progress of the threads to each of the locks. Using Free Lunch, we have identified phases of high CSP, which were hidden with other lock profilers, in the distributed Cassandra NoSQL database and in several applications from the DaCapo 9.12, the SPECjvm2008 and the SPECjbb2005 benchmark suites. Our evaluation of Free Lunch shows that its overhead is never greater than 6%, making it suitable for in-vivo use.

  • Jean Pierre Lozi, co-advised with Gilles Muller at 50% (2010–2014). Towards more scalable mutual exclusion for multicore architectures. Currently young researcher (chargé de recherche) at INRIA Paris (France). [Abstract]

    The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this thesis is a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated hardware thread, which is referred to as the server. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock because such data can typically remain in the server’s cache.

    Other contributions presented in this thesis include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool developed with Julia Lawall that transforms POSIX locks into RCL locks. Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an x86 machine with four AMD Opteron processors and 48 hardware threads. Using RCL locks, performance is improved by up to 2.5 times with respect to POSIX locks on Memcached, and up to 11.6 times with respect to Berkeley DB with the TPC-C client. On an SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to POSIX locks on Memcached, and up to 7.9 times with respect to Berkeley DB with the TPC-C client.

  • Koutheir Attouchi, co-advised with Gilles Muller at 50% (2011-2014). Managing resource sharing conflicts in an open embedded software environment. Currently engineer at MicroDoc (Canada). [Abstract]

    Our homes are becoming smart thanks to the numerous devices, sensors and actuators available in it, providing services, e.g., entertainment, home security, energy efficiency and health care. Various service providers want to take advantage of the smart home opportunity by rapidly developing services to be hosted by an embedded smart home gateway. The gateway is open to applications developed by untrusted service providers, controlling numerous devices, and possibly containing bugs or malicious code. Thus, the gateway should be highly-available and robust enough to handle software problems without restarting abruptly. Sharing the constrained resources of the gateway between service providers allows them to provide richer services. However, resource sharing conflicts happen when an application uses resources “unreasonably” or abusively. This thesis addresses the problem of resource sharing conflicts in the smart home gateway, investigating prevention approaches when possible, and considering detection and resolving approaches when prevention is out of reach.

    Our first contribution, called Jasmin, aims at preventing resource sharing conflicts by isolating applications. Jasmin is a middleware for development, deployment and isolation of native, component-based and service-oriented applications targeted at embedded systems. Jasmin enables fast and easy cross-application communication, and uses Linux containers for lightweight isolation. Our second contribution, called Incinerator, is a subsystem in the Java Virtual Machine (JVM) aiming to resolve the problem of Java stale references, i.e., references to objects that should no more be used. Stale references can cause significant memory leaks in an OSGi-based smart home gateway, hence decreasing the amount of available memory, which increases the risks of memory sharing conflicts. With less than 4% overhead, Incinerator not only detects stale references, making them visible to developers, but also eliminates them, hence lowering the risks of resource sharing conflicts. Even in Java, memory sharing conflicts happen. Thus, in order to detect them, we propose our third contribution: a memory monitoring subsystem integrated into the JVM. Our subsystem is mostly transparent to application developers and also aware of the component model composing smart home applications. The system accurately accounts for resources consumed during cross-application interactions, and provides on-demand snapshots of memory usage statistics for the different service providers sharing the gateway.

  • Thomas Preud’Homme, co-advised with Bertil Folliot and Julien Sopena at 70% (2008–2013). Optimized inter-core communication protocol for stream-oriented parallelism. Currently engineer at ARM (United Kingdom).
  • Nicolas Geoffray, co-advised with Bertil Folliot at 80% (2005-2009). Fostering System Research with VMKit. Currently engineer at Google (United Kingdom). [Abstract]

    Many systems researh projets now target managed runtime environments (MRE) beause they provide better produtivity and safety compared to native environments. Still, developing and optimizing an MRE is a tedious task that requires many years of development. Although MREs share some common functionalities, suh as a Just In Time Compiler or a Garbage Collector, this opportunity for sharing hash not been yet exploited in implementing MREs. This thesis desribes and evaluates VMKit, a first attempt to build a common substrate that eases the development and experimentation of high-level MREs and systems mehanisms. VMKit has been suessfully used to build two MREs, a Java Virtual Machine and a Common Language Runtime, as well as a a new system mechanism that provides better security in the context of servie-oriented architectures.

    We describe the lessons learnt in implementing suh a common infrastruture from a performance and an ease of development standpoint. The performance of VMKit are reasonable compared to industrial MREs, and the high-level MREs are only 20,000 lines of code. Our new system mechanism only requires the addition of 600 lines of code in VMKit, and is a significant step towards better dependable systems.

  • Charles Clément, co-advised with Bertil Folliot at 40% (2004-2009). Isolation of operating system extensions with a managed runtime environment. Currently engineer at Amazon (US).

Engineers ()

  • Harris Bakiras (2011 – 2013): engineer on the VMKit project (INRIA ADT project). Currently engineer at Microsoft (France).

Grants and Contrats ()

  • 2023 – 2030: DiVA (PEPR Cloud, 868k€), principal investigator.
  • 2023 – 2030: Archi-CESAM (PEPR Cloud, 580k€), member.
  • 2023 – 2030: TrusInSec (PEPR Cloud, 139k€), member.
  • 2024 – 2028: FrugalDinet (ANR PRC, 171k€), scientific leader for Telecom SudParis.
  • 2021 – 2025: Maplurinum (ANR PRC, 184k€), principal investigator.
  • 2020 – 2023: CIFRE PhD funding of Damien Thenot with Vates (45k€).
  • 2019 – 2023: Scalevisor (ANR PRCE, 226k€), scientific leader for Telecom SudParis.
  • 2020 – 2022: Firn (DIM-RFSI, 10k€), member.
  • 2019 – 2022: H2020 CloudButton (423k€), member.
  • 2019 – 2022: Pythia (ANR JCJC, 180k€), member.
  • 2020 – 2022: Idiom (FUI, 120k€), member.
  • 2018 – 2021: Primate (ANR PCRI, 147k€), principal investigator for the french side.
  • 2013 – 2015: Richelieu (FUI, 147k€), scientific leader for UPMC.
  • 2011 – 2014: Infra-JVM (ANR Infra, 193k€), principal investigator.
  • 2011 – 2014: CIFRE PhD funding of Koutheir Attouchi with Orange Labs (30k€).
  • 2009 – 2012: ABL (ANR Blanc, 275k€): A Bug Life, coordinator of Task 1.

Talks ()

  • J-NVM: Off-heap Persistent Objects in Java (10/2022, Université de Neuchatel, 04/2022, Invited talk at the "Journée du GDR RSD")
  • Quelques expériences autour des architectures à mémoire non uniformes (06/2019, keynote at COMPAS 2019)
  • Scalevisor: a CPU/memory driver for large multicore architectures (10/2019, University of Neuchâtel, Switzerland, 07/2018, IRCICA, France)
  • System techniques to mitigate the NUMA effect (03/2017, RSD-ASF Winter School, France)
  • A study of Garbage Collector Scalability on Multicore Hardware (02/2014, LIAFA, France – 09/2013, EPFL, Suisses – 07/2013, Verimag, France – 03/2013, IRISA, France – 01/2013, Epita, France)
  • Système, Virtualisation, Nuage - Petit tour d’horizon de la recherche en système en France (May. 2012, CNRS, France – Nov. 2011, UPMC, France)
  • VMKit: a substrate for Managed Runtime Environment (Feb. 2012, IRILL, France – Sep. 2011, Purdue University, US – Apr. 2011, Journées Compilation, Dinard, France – Mar. 2011, University of Utah, Salt-Lake City, US)
  • VMKit : un substrat de Machine Virtuelle (Oct. 2010, LaBRI, Bordeaux, France – Apr. 2010, Groupe de Travail Programmation, Lip6, Paris, France)
  • AutoVM : repousser les frontières de la généricité, Séminaire Performance et Généricité (May 2009, Epita, Paris, France)
  • Application d’une approche exo-noyau à la construction d’une machine virtuelle Java : la JnJVM (Apr. 2006, LIFL, Lilles, France – Mar. 2006, IRISA, Rennes, France)
  • Applications Actives : Construction dynamique d’environnements d’exécution flexibles homogènes (Jul. 2005, LIG, Grenoble, France)

PhD and HDR thesis committees ()

  • Reviewers (rapporteur) of theses: Yasmine Djebrouni (02/2024), Ana Khorguani (11/2023), Gil Utard (HdR, 07/2023), Saalik Hatia (06/2023), Benoît Martin (04/2023), Stella Bitchebe (02/2023), Paulo Ricardo Rodrigues de Souza Junior (11/2022), Sébastien Vaucher (10/2022, Switzerland), Andrea Segalini (11/2021), Idriss Daoudi (09/2021), Thomas Baumela (02/2021), Amin Mohtasham (12/2019, Portugal), Rafael Pires (12/2019, Switzerland), Léo Grange (10/2019), Quentin Bergougnoux (06/2019), Davide Frey (HDR, 06/2019), Vikas Jaiman (04/2019), Hugo Brunie (01/2019), Hugo Guiroux (12/2018), Vinícius Garcia Pinto (10/2018), Antoine Faravelon (10/2018), Maxime France-Pillois (09/2018), Mohamad Jaafar Nehme (12/2017), Boris Teabe (10/2017), Clément Béra (09/2017), Bo Zhang (12/2016), Julien Pagès (12/2016), Fabien André (11/2016), Nassim Halli (10/2016), François Serman (09/2016), Etienne Brodu (06/2016), Thomas Calmant (10/2015), José Simão (03/2015, Portugal), Joaquim Perchat (01/2015), Victor Lomüller (11/2014), Camillo Bruni (05/2014), Yufang Dan (05/2014), François Goichon (12/2013), Quentin Sabah (12/2013), Konstantinos Kloudas (03/2013), Geoffroy Cogniaux (12/2012).
  • Examiner (examinateur) of theses: Julien Mirval (03/2024), Pierre Sutra (HdR, 01/2024), Zhiyuan Yao (05/2023), Maxime Belair (12/2021), François Trahay (HDR, 06/2021), Pierre-Marie Bajan (07/2019), Fotis Nikolaidis (05/2019), Pierre Louis Roman (12/2018), Johann Bourcier (HDR, 12/2018), Mickaël Salaün (03/2018), Alain Tchana (HDR, 12/2017), Antoine Capra (12/2015), Inti Gonzalez Herrera (12/2015), Aurèle Maheo (09/2015), Marion Guthmuller (04/2015), Pierre Olivier (12/2014), Baptiste Lepers (01/2014), Sylvain Cotard (12/2013), Jean-Yves Vet (11/2013), Preston Francisco Rodrigues (05/2013), Sarni Toufik (10/2012), Rémy Pottier (09/2012), Kiev Santos de Gama (10/2011), Christophe Deleray (10/2006).

Software

  • Leader and main contributor of the "infra-web-cours" project. Infra-web-cours is a toolkit that relies on a domain specific language (DSL) to quickly develop courses in HTML. Many courses at Telecom SudParis and Institut Polytechnique de Paris now uses this toolkit (e.g., here or here). The documentation and the DSL are described here The source file is hosted on our gitlab at https://gitlab.inf.telecom-sudparis.eu/infra-web/infra-web-cours. Around 2000 lines of code. Started in 2016. Not involved anymore in the project since 2023
  • Leader and contributor of the VMKit project (LLVM Licence). VMKit is a toolkit to help developers and researchers to experiment with new ideas in managed runtime environments. Was integrated in the Linux Ubuntu distribution. Web site: http://vmkit.llvm.org. Around 50000 lines of code. Started in 2007, retired in 2014.
  • Leader and main contributor of the JnJVM project. JnJVM is an adaptable Java virtual machine written in Scheme. Web site: http://vvm.lip6.fr/projects_realizations/jnjvm/. Around 50000 lines of code. Started in 2002, retired in 2007.
  • Contributor of the CVM project. CVM (Component Virtual Machine) is a managed runtime environment specialized for executing and adapting components. Web site: http://vvm.lip6.fr/projects_realizations/cvm/. Around 70'000 lines of code. Started in 2004, retired in 2005.

Teaching

Currently, I'm the head of the Parallel and Distributed Systems track of the master degree in Computer Science of IP Paris. I teach mainly in the domains of systems and languages. Since 2001, I have taught around 3400 hours. Between 2020 and 2023, I was the head of the master degree in computer science of IP Paris (∼120 students), and between 2016 and 2021, I was the coordinator of the computer science curricula of Telecom SudParis (∼40 courses, 800 students). I'm now involved in the following courses:

Course conceptions ()

  • 2024: Object-oriented programming in C++ (École Polytechnique, Bachelor 2, ∼100 students)
  • 2019: Operating systems (Telecom SudParis, Master 1, ∼50 students)
  • 2019: Architecture and compilation (Telecom SudParis, Master 1, ∼20 students)
  • 2017: Initiation to the Java programming language (Telecom SudParis, Bachelor 3, ∼200 students)
  • 2016: Advanced Programming of Multicore Architectures (IP Paris and UPSaclay, Master 2, ∼40 students)
  • 2016: System programming (Telecom SudParis, Master 1, ∼100 students)
  • 2015: Initiation to systems with bash (Telecom SudParis, Bachelor 3, ∼200 students)
  • 2009: Design of language runtimes (UPMC, Bachelor 2, ∼100 students)
  • 2009: Multicore programming (UPMC, Master 2, ∼20 students - Polytech'Paris, Master 2, ∼20 students)
  • 2007: Component oriented middlewares (UPMC, Master 2, ∼100 students)

Course responsibilities ()

  • 2014 – today: Object-oriented programming in C++ (École Polytechnique, Bachelor 2, ∼100 students)
  • 2016 – today: Advanced Programming of Multicore Architectures (IP Paris and UPSaclay, Master 2, ∼40 students)
  • 2017 – 2023: Initiation to the Java programming language (Telecom SudParis, Bachelor 3, ∼200 students)
  • 2015 – 2020: Initiation to systems with bash (Telecom SudParis, Bachelor 3, ∼200 students)
  • 2016 – 2018: System programming (University Paris-Saclay, Master 1, ∼30 students)
  • 2009 – 2014: Multicore programming (UPMC, Master 2, ∼30 students - Polytech'Paris, Master 2, ∼20 students)
  • 2013 – 2014: Initiation to operating system (UPMC, Bachelor 2, ∼200 students)
  • 2010 – 2014: Research group in systems (UPMC, Master 2, ∼30 students)
  • 2006 – 2010: Client/Server oriented Distributed Systems (UPMC, Master 1, ∼70 students)
  • 2006 – 2010: Distributed systems and client/server (UPMC, Master 2, ∼10 students)
  • 2006 – 2009: Component oriented middlewares (UPMC, Master 2, ∼100 students)

Teaching summary

Degree Teaching unit name Years Hours University
Bachelors 1 Functionnal programming with Scheme 2001 – 2004 192 UPMC
Bachelors 1 Initiation to the C language 2012 – 2013 60 UPMC
Bachelors 2 Initiation to managed runtime environments 2009 – 2014 107 UPMC
Bachelors 2 Initiation to operating system 2012 – 2014 100 UPMC
Bachelors 2 Architecture of microprocessors 2012 – 2013 40 UPMC
Bachelors 2 Computer Architecture 2023 – 2024 48 École Polytechnique
Bachelors 2 Object oriented programming in C++ 2024 – 2025 39 École Polytechnique
Bachelors 3 Operating system principles 2004 – 2011 180 UPMC
Bachelors 3 Introduction to architectures and systems 2014 – 2015 33 Telecom SudParis
Bachelors 3 Initiation to the Java programming language 2015 – 2023 120 Telecom SudParis
Bachelors 3 Initiation to operating systems 2015 – 2022 131 Telecom SudParis
Masters 1 Operating system principles 2002 – 2014 594 UPMC
Masters 1 System Projects 2004 – 2014 27 UPMC
Masters 1 Parallel programming 2004 – 2005 20 UPMC
Masters 1 Client/server oriented distributed systems 2006 – 2011 172 UPMC
Masters 1 Components 2007 – 2008 2 UPMC
Masters 1 Operating systems principle 2008 – 2010 16 Polytech'Paris
Masters 1 Object oriented programming 2014 – 2016 54 Telecom SudParis
Masters 1 Design and implementation of centralized systems 2014 – 2017 75 Telecom SudParis
Masters 1 System programming 2016 – 2022 132 Telecom SudParis
Masters 1 System programming 2016 – 2018 90 UPSaclay
Masters 1 Compilation: from the algorithm to the logic gate 2018 – 2024 63 Telecom SudParis
Masters 1 Operating system principles 2018 – 2023 225 Telecom SudParis
Masters 2 Distributed applications and systems 2005 – 2006 10 Polytech'Grenoble
Masters 2 Component oriented middlewares 2006 – 2009 84 UPMC
Masters 2 Distributed systems and client/server 2006 – 2010 112 UPMC
Masters 2 Advanced system frameworks 2008 – 2014 62 UPMC
Masters 2 Multicore systems and virtualization 2009 – 2015 92 UPMC
Masters 2 Research group in system 2009 – 2014 116 UPMC
Masters 2 Multicore systems 2009 – 2011 40 Polytech'Paris
Masters 2 Cloud computing 2015 – 2020 15 UPSaclay
Masters 2 Cloud computing 2015 – 2018 9 UPSaclay
Masters 2 High performance systems 2015 – 2024 45 Telecom SudParis
Masters 2 Advanced Programming of Multicore Architectures 2016 – 2025 315 UPSaclay
Masters 2 Cloud infrastructure 2017 – 2018 6 Telecom SudParis

Professional activities

Member of program committees and organizations of events

I have been a member of program committees of rank A conferences.

  • Member of the Usenix ATC 2025 program committee ()
  • Member of the Eurosys 2025 program committee ()
  • Member of the SoCC 2024 program committee
  • Member of the Usenix ATC 2023 program committee ()
  • Member of the Eurosys 2023 program committee ()
  • Member of the SoCC 2023 program committee
  • Member of the Eurosys 2022 program committee ()
  • PC Chair of the Eurosys 2022 Shadow program committee
  • Member of the Middleware 2021 program committee ()
  • Member of the SRDS 2019 program committee ()
  • Member of the ICOOOLPS 2019 program committee (Workshop)
  • Member of the Middleware 2018 program committee()
  • Member of the SRDS 2018 program committee ()
  • Member of the Eurosys 2018 program committee ()
  • Member of the MoreVM 2018 program committee (Workshop)
  • Member of the Compas 2018 program committee (French)
  • PC Chair of Compas 2017 (French)
  • Member of the Eurosys 2016 program committee ()
  • Member of the Compas 2016 program committee (French)
  • Member of the VEE 2015 program committee ()
  • Member of the ICOOOLPS 2015 program committee (Workshop)
  • Poster co-chair of the Eurosys 2015 conference
  • Member of the ComPAS 2015 program committee (French)
  • Treasurer and sponsorhsip co-chair of the Middleware 2014 conference
  • Member of the ComPAS/CFSE 2014 program committee (French)
  • Member of the PLOS 2013 program committee (Workshop)
  • Member of the ComPAS/CFSE 2013 program committee (French)
  • Member of the DAIS 2012 program committee ()
  • Member of the DAIS 2011 program committee ()
  • Program chair of the NOTERE 2011 Workshops (French)
  • Member of the CFSE 2011 program committees (French)

Other responsibilities

  • 2023 – today: chair of the steering committee of Compas (french conference in system/parallelism/architecture)
  • 2019 – today: head of the Parallel and Distributed Systems track of the master degree in Computer Science of IP Paris (∼15 students)
  • 2020 – 2023: head of the master degree in computer science of IP Paris (∼100 students)
  • 2016 – 2021: coordinator of the computer science curricula at Telecom SudParis (∼800 students, ∼40 courses)
  • 2014 – 2016: Treasurer of the French chapter of the ACM SIGOPS (ASF)
  • 2011 – 2014: Chair of the French chapter of the ACM SIGOPS (ASF)
  • 2011 – 2014: Elected member of the LIP6 laboratory board (conseil de laboratoire)
  • 2011 – 2014: Founding member of the organizing committee of the "colloquium d’informatique de l’UPMC Sorbonne Université"
  • Hiring committees: MdC U. Picardie 2023 - MdC ENSEEIHT 2023, 2013 - MdC Sorbonne Université 2023 - CRCN/ISFP INRIA LNE 2022 - MdC Telecom SudParis 2022, 2021, 2020, 2019, 2018 - MdC ENSIIE 2022, 2021 - MdC Université de Lille 2021 - MdC IUT de Toulouse 2021 - MdC UGE 2021 - MdC Lyon1 2020 - MdC Ensimag 2019 - MdC UJF 2010 - MdC Chaire Université Rennes1/INRIA 2010

Publications

For the ranking of venues, I use the Australian Ranking of ICT Conferences (http://www.core.edu.au), which ranks conferences and journals in computer science with A*, A, B and C. I use the ranking of the year of publication. If the ranking changes after the publication, I only update it if the change takes place less than four years after the publication. If the ranking is not given, it means that the venue is not ranked.

In the system field of computer science, major conferences have often a higher status and a greater impact than journals (cf. https://homes.cs.washington.edu/~mernst/advice/conferences-vs-journals.html).

Ranked Publications

International conferences ()

  1. Privagic: automatic code partitioning with explicit secure typing . Subashiny Tanigassalame, Yohan Pipereau, Adam Chader, Jana Toljaga and Gaël Thomas. In Proceedings of the International Conference on Middleware, Middleware'24, pages 12.  2024. . [Abstract] [BibTeX] Partitioning a multi-threaded application between a secure and a non-secure memory zone remains a challenge. The current tools rely on data flow analysis techniques, which are unable to handle multi-threaded C or C++ applications. To avoid this limitation, we propose to trade the ease-of-use of data flow analysis for another language construct: explicit secure typing. With secure typing, as with data flow analysis, the developer annotates memory locations that contain sensitive values. However, instead of analyzing how the sensitive values flow, we propose to use these annotations to only check typing rules, such as ensuring that the code never stores a sensitive value in an unsafe memory location. By avoiding data flow analysis, the developer has to annotate more memory locations, but the partitioning tool can handle multi-threaded C and C++ applications. We implemented our explicit secure typing principle in a compiler named Privagic. Privagic takes a legacy application enriched with secure types as input. It outputs an application partitioned for Intel SGX. Our evaluation with micro- and macro-applications shows that (i) explicit secure typing can handle multi-threaded C and C++ applications, (ii) adding explicit secure types requires a modest engineering effort of less than 10 modified lines of codes in our use cases, (iii) using explicit secure typing is more efficient than embedding a complete application in an enclave both in terms of performance and security in our use cases.
    @inproceedings{middleware:24:tanigassalame:privagic,
      author = { Tanigassalame, Subashiny and Pipereau, Yohan and Chader, Adam and Toljaga, Jana and Thomas, Gaël },
      title = { Privagic: automatic code partitioning with explicit secure typing },
      booktitle = {Proceedings of the International Conference on Middleware, Middleware'24},
      publisher = {ACM},
      year = {2024},
      pages = {12}
    }
  2. P4CE: Consensus over RDMA at line speed. Rémi Dulong, Nathan Felber, Pascal Felber, Gilles Hopin, Baptiste Lepers, Valerio Schiavoni, Gaël Thomas and Sébastien Vaucher. In Proceedings of the International Conference on Distributed Computing Systems, ICDCS'24, pages 12.  2024. . [Abstract] [BibTeX] [.pdf] P4CE is the first replication protocol that exhibits the same latency and requires the same network capacity as sending data to a single server. P4CE builds upon previous RDMA-based consensus protocols. They achieve consensus with a single network round-trip, but with a reduced network throughput. P4CE also achieves consensus with a single round-trip, but without degrading throughput by decoupling the consensus decisions from the RDMA communications. The decision part of the consensus protocol runs on a commodity server, but the communication part of P4CE is fully implemented on a programmable switch, which replicates data and aggregates the acknowledgements in the network, avoiding the throughput bottleneck at the leader. Although simple in its principle, the implementation of P4CE raises many challenging issues, notably caused by the complexity of RDMA and the underlying network protocols, the intricacies of packet rewriting during replication and aggregation, and the restricted set of operations that can be implemented at wire speed in the programmable switch. We implemented P4CE and deployed it on a commercially- available Intel Tofino switch, achieving up to 4× better through- put and better latency than state-of-the-art consensus protocols.
    @inproceedings{icdcs:24:dulong:p4ce,
      author = { Dulong, Rémi and Felber, Nathan and Felber, Pascal and Hopin, Gilles and Lepers, Baptiste and Schiavoni, Valerio and Thomas, Gaël and Vaucher, Sébastien },
      title = {P4CE: Consensus over RDMA at line speed},
      booktitle = {Proceedings of the International Conference on Distributed Computing Systems, ICDCS'24},
      publisher = {IEEE Computer Society},
      year = {2024},
      pages = {12}
    }
  3. FastSGX: A Message-passing based Runtime for SGX. Subashiny Tanigassalame, Yohan Pipereau, Adam Chader, Jana Toljaga and Gaël Thomas. In Proceedings of the International Conference on Advanced Information Networking and Applications, AINA'24, pages 12.  2024. . [Abstract] [BibTeX] [.pdf] Designing an efficient privacy-preserving application with Intel SGX is difficult. The problem comes from the prohibitive cost of switching the processor from the non-secure mode to the secure mode. To avoid this cost, we propose to design an SGX application as a distributed system with worker threads that commu- nicate by exchanging messages. We implemented FastSGX, a runtime that exposes this programming model to the developer, and evaluated it with several data structures. Our evaluation with different workloads shows that the applications designed with FastSGX consistently outperform the equivalent applications designed with the software development kit provided by Intel to use SGX.
    @inproceedings{aina:24:tanigassalme:fastsgx,
      author = { Tanigassalame, Subashiny and Pipereau, Yohan and Chader, Adam and Toljaga, Jana and Thomas, Gaël },
      title = {FastSGX: A Message-passing based Runtime for SGX},
      booktitle = {Proceedings of the International Conference on Advanced Information Networking and Applications, AINA'24},
      publisher = {Springer},
      year = {2024},
      pages = {12}
    }
  4. SecV: Secure Code Partitioning via Multi-Language Secure Values. Peterson Yuhala, Pascal Felber, Hugo Guiroux, Jean-Pierre Lozi, Alain Tchana, Valerio Schiavoni and Gaël Thomas. In Proceedings of the International Conference on Middleware, Middleware'23, pages 13.  2023. . [Abstract] [BibTeX] [.pdf] Trusted execution environments like Intel SGX provide enclaves, which offer strong security guarantees for applications. Running entire applications inside enclaves is possible, but this approach leads to a large trusted computing base (TCB). As such, various tools have been developed to partition programs written in languages such as C or Java into trusted and untrusted parts, which are run in and out of enclaves respectively. However, those tools depend on language-specific taint-analysis and partitioning techniques. They cannot be reused for other languages and there is thus a need for tools that transcend this language barrier. We address this challenge by proposing a multi-language tech- nique to specify sensitive code or data, as well as a multi-language tool to analyse and partition the resulting programs for trusted execution environments like Intel SGX. We leverage GraalVM’s Truffle framework, which provides a language-agnostic abstract syntax tree (AST) representation for programs, to provide special AST nodes called secure nodes that encapsulate sensitive program in- formation. Secure nodes can easily be embedded into the ASTs of a wide range of languages via Truffle’s polyglot API. Our technique includes a multi-language dynamic taint tracking tool to analyse and partition applications based on our generic secure nodes. Our extensive evaluation with micro- and macro-benchmarks shows that we can use our technique for two languages (Javascript and Python), and that partitioned programs can obtain up to 14.5% performance improvement as compared to unpartitioned versions.
    @inproceedings{middleware:23:yuhala:secv,
      author = { Yuhala, Peterson and Felber, Pascal and Guiroux, Hugo and Lozi, Jean-Pierre and Tchana, Alain and Schiavoni, Valerio and Thomas, Gaël },
      title = {SecV: Secure Code Partitioning via Multi-Language Secure Values},
      booktitle = {Proceedings of the International Conference on Middleware, Middleware'23},
      publisher = {ACM},
      year = {2023},
      pages = {13}
    }
  5. FastXenBlk: high-performance virtualized disk IOs without compromising isolation (industry track). Damien Thenot, Jean-Pierre Lozi and Gaël Thomas. In Proceedings of the International Conference on Middleware, Middleware'23, pages 7.  2023. [Abstract] [BibTeX] [.pdf] Optimizing IO in a type I hypervisor such as Xen is difficult because of the cost of exchanging data between a VM and the driver. We address this challenge by proposing FastXenBlk, a new IO driver for Xen. FastXenBlk uses three mechanisms to improve IO performance. First, it uses several threads that poll multiple virtual IO queues that are exposed to a guest in order to execute IOs in parallel. Second, it batches requests in order to minimize the number of hypercalls to Xen. And third, it uses kernel bypass in order to avoid system calls during IOs. We evaluate FastXenBlk using the FIO benchmark with different access patterns and IO sizes. Our evaluation shows that FastXenBlk consistently improves the latency and the throughput for all workloads as compared to tapdisk, the driver currently used in production, by a factor of up to 3×.
    @inproceedings{middleware:23:thenot:fastexenblk,
      author = { Thenot, Damien and Lozi, Jean-Pierre and Thomas, Gaël },
      title = {FastXenBlk: high-performance virtualized disk IOs without compromising isolation (industry track)},
      booktitle = {Proceedings of the International Conference on Middleware, Middleware'23},
      publisher = {ACM},
      year = {2023},
      pages = {7}
    }
  6. J-NVM: Off-heap Persistent Objects in Java. Anatole Lefort, Yohan Pipereau, Kwabena Amponsem, Pierre Sutra and Gaël Thomas. In Proceedings of the Symposium on Operating Systems Principles, SOSP'21, pages 16.  2021. . [Abstract] [BibTeX] [.pdf] This paper presents J-NVM, a framework to access efficiently Non-Volatile Main Memory (NVMM) in Java. J-NVM offers a fully-fledged interface to persist plain Java objects using failure-atomic blocks. This interface relies internally on proxy objects that intermediate direct off-heap access to NVMM. The framework also provides a library of highly-optimized persistent data types that resist reboots and power failures. We evaluate J-NVM by implementing a persistent backend for the Infinispan data store. Our experimental results, obtained with a TPC-B like benchmark and YCSB, show that J-NVM is consistently faster than other approaches at accessing NVMM in Java.
    @inproceedings{sosp:21:lefort:jnvm,
      author = { Lefort, Anatole and Pipereau, Yohan and Amponsem, Kwabena and Sutra, Pierre and Thomas, Gaël },
      title = {J-NVM: Off-heap Persistent Objects in Java},
      booktitle = {Proceedings of the Symposium on Operating Systems Principles, SOSP'21},
      publisher = {ACM},
      year = {2021},
      pages = {16}
    }
  7. Montsalvat: Intel SGX Shielding for GraalVM Native Images. Peterson Yuhala, Jämes Ménétrey, Pascal Felber, Valerio Schiavoni, Alain Tchana, Gaël Thomas, Hugo Guiroux and Jean-Pierre Lozi. In Proceedings of the International Conference on Middleware, Middleware'21, pages 13.  2021. . [Abstract] [BibTeX] [.pdf] The popularity of the Java programming language has led to its wide adoption in cloud computing infrastructures. However, Java applications running in untrusted clouds are vulnerable to various forms of privileged attacks. The emergence of trusted execution environments (TEEs) such as Intel SGX mitigates this problem. TEEs protect code and data in secure enclaves inaccessible to untrusted software, including the kernel and hypervisors. To efficiently use TEEs, developers must manually partition their applications into trusted and untrusted parts, in order to reduce the size of the trusted computing base (TCB) and minimise the risks of security vulnerabilities. However, partitioning applications poses two important challenges: (i) ensuring efficient object communication between the partitioned components, and (ii) ensuring the consistency of garbage collection between the parts, especially with memory-managed languages such as Java. We present Montsalvat, a tool which provides a practical and intuitive annotation-based partitioning approach for Java applications destined for secure enclaves. Montsalvat provides an RMI-like mechanism to ensure inter-object communication, as well as consistent garbage collection across the partitioned components. We implement Montsalvat with GraalVM native-image, a tool for compiling Java applications ahead-of-time into standalone native executables that do not require a JVM at runtime. We perform extensive evaluations of Montsalvat using micro- and macro- benchmarks, and we show that our partitioning approach can improve performance in real-world applications up to 6.6× (PalDB) and 2.2× (GraphChi) as compared to solutions that naively include the entire applications in the enclave.
    @inproceedings{middleware:21:yuhala:montsalvat,
      author = { Yuhala, Peterson and Ménétrey, Jämes and Felber, Pascal and Schiavoni, Valerio and Tchana, Alain and Thomas, Gaël and Guiroux, Hugo and Lozi, Jean-Pierre },
      title = {Montsalvat: Intel SGX Shielding for GraalVM Native Images},
      booktitle = {Proceedings of the International Conference on Middleware, Middleware'21},
      publisher = {ACM},
      year = {2021},
      pages = {13}
    }
  8. NVCache: A Plug-and-Play NVMM-based I/O Booster for Legacy Systems. Rémi Dulong, Rafael Pires, Andreia Correia, Valerio Schiavoni, Pedro Ramalhete, Pascal Felber and Gaël Thomas. In Proceedings of the international conference on Dependable Systems and Networks, DSN'21, pages 13.  2021. . [Abstract] [BibTeX] [.pdf] This paper introduces NVCACHE, an approach that uses a non-volatile main memory (NVMM) as a write cache to improve the write performance of legacy applications. We compare NVCACHE against file systems tailored for NVMM (Ext4-DAX and NOVA) and with I/O-heavy applications (SQLite, RocksDB). Our evaluation shows that NVCACHE reaches the performance level of the existing state-of-the-art systems for NVMM, but without their limitations: NVCACHE does not limit the size of the stored data to the size of the NVMM, and works transparently with unmodified legacy applications, providing additional persistence guarantees even when their source code is not available.
    @inproceedings{dsn:21:dulong:nvcache,
      author = {Dulong, Rémi and Pires, Rafael and Correia, Andreia and Schiavoni, Valerio and Ramalhete, Pedro and Felber, Pascal and Thomas, Gaël },
      title = {NVCache: A Plug-and-Play NVMM-based I/O Booster for Legacy Systems},
      booktitle = {Proceedings of the international conference on Dependable Systems and Networks, DSN'21},
      publisher = {IEEE Computer Society},
      year = {2021},
      pages = {13}
    }
  9. When eXtended Para-Virtualization (XPV) Meets NUMA. Bao Bui, Djob Mvondo, Boris Teabe, Kevin Jiokeng, Lavoisier Wapet, Alain Tchana, Gaël Thomas, Daniel Hagimont, Gilles Muller and Noel De Palma. In Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'19, pages 15.  2019. . [Abstract] [BibTeX] [.pdf] This paper addresses the problem of efficiently virtualizing NUMA architectures. The major challenge comes from the fact that the hypervisor regularly reconfigures the placement of a virtual machine (VM) over the NUMA topology. However, neither guest operating systems (OSes) nor system runtime libraries (e.g., Hotspot) are designed to consider NUMA topology changes at runtime, leading end user applications to unpredictable performance. This paper presents eXtended Para-Virtualization (XPV), a new principle to efficiently virtualize a NUMA architecture. XPV consists in revisiting the interface between the hypervisor and the guest OS, and between the guest OS and system runtime libraries (SRL) so that they can dynamically take into account NUMA topology changes. The paper presents a methodology for systematically adapting legacy hypervisors, OSes, and SRLs. We have applied our approach with less than 2k line of codes in two legacy hypervisors (Xen and KVM), two legacy guest OSes (Linux and FreeBSD), and three legacy SRLs (Hotspot, TCMalloc, and jemalloc). The evaluation results showed that XPV outperforms all existing solutions by up to 304%.
    @inproceedings{eurosys:19:bui:xpv,
      author = { Bui, Bao and Mvondo, Djob and Teabe, Boris and Jiokeng, Kevin and Wapet, Lavoisier and Tchana, Alain and Thomas, Gaël and Hagimont, Daniel and Muller, Gilles and De Palma, Noel },
      title = {When eXtended Para-Virtualization (XPV) Meets NUMA},
      booktitle = {Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'19},
      publisher = {ACM},
      year = {2019},
      pages = {15}
    }
  10. EActors: Fast and flexible trusted computing using SGX. Vasily A. Sartakov, Stefan Brenner, Sonia Ben Mokhtar, Sara Bouchenak, Gaël Thomas and Rüdiger Kapitza. In Proceedings of the International Conference on Middleware, Middleware'18, pages 12.  2018. . [Abstract] [BibTeX] [.pdf] Novel trusted execution support, as offered by Intel’s Software Guard eXtensions (SGX), embeds seamlessly into user space applications by establishing regions of encrypted memory, called enclaves. Enclaves comprise code and data that is exe- cuted under special protection of the CPU and can only be accessed via an enclave defined interface. To facilitate the usability of this new system abstraction, Intel offers a soft- ware development kit (SGX SDK). While the SDK eases the use of SGX, it misses appropriate programming support for inter-enclave interaction, and demands to hardcode the exact use of trusted execution into applications, which restricts flexibility.

    This paper proposes EActors, an actor framework that is tailored to SGX and offers a more seamless, flexible and efficient use of trusted execution – especially for applications demanding multiple enclaves. EActors disentangles the interaction with enclaves and, among them, from costly execution mode transitions. It features lightweight fine-grained parallelism based on the concept of actors, thereby avoiding costly SGX SDK provided synchronisation constructs. Finally, EActors offers a high degree of freedom to execute actors, either untrusted or trusted, depending on security requirements and performance demands. We implemented two use cases on top of EActors: (i) a secure instant messaging service, and (ii) a secure multi-party computation service. Both illustrate the ability of EActors to seamlessly and effectively build secure applications. Furthermore, our performance evaluation results show that securing the messaging service with EActors improves performance compared to the vanilla versions of JabberD2 and ejabberd by up to 40×.
    @inproceedings{middleware:18:saratov:eactors,
      author = {Sartakov, Vasily A. and Brenner, Stefan and Ben Mokhtar, Sonia and Bouchenak, Sara and Thomas, Gaël and Kapitza, Rüdiger},
      title = {EActors: Fast and flexible trusted computing using SGX},
      booktitle = {Proceedings of the International Conference on Middleware, Middleware'18},
      publisher = {ACM},
      year = {2018},
      pages = {12}
    }
  11. Towards an Efficient Pauseless Java GC with Selective HTM-Based Access Barriers. Maria Carpen-Amarie, Yaroslav Hayduk, Pascal Felber, Christof Fetzer, Gaël Thomas and David Dice. In Proceedings of the international conference on Managed Languages and Runtimes (formerly PPPJ), ManLang'17, pages 7.  2017. . [Abstract] [BibTeX] [.pdf] The garbage collector (GC) is a critical component of any managed runtime environment (MRE), such as the Java virtual machine. While the main goal of the GC is to simplify and automate memory management, it may have a negative impact on the application performance, especially on multi-core systems. This is typically due to stop-the-world pauses, i.e., intervals for which the application threads are blocked during the collection. Existing approaches to concurrent GCs allow the application threads to perform at the same time as the GC at the expense of throughput and simplicity. In this paper we build upon an existing pauseless transactional GC algorithm and design an important optimization that would signi cantly increase its throughput. More precisely, we devise selective access barriers, that de ne multiple paths based on the state of the garbage collector. Preliminary evaluation of the selective barriers shows up to 93% improvement over the initial transactional barriers in the worst case scenario. We estimate the performance of a pauseless GC having selective transactional barriers and nd it to be on par with Java’s concurrent collector.
    @inproceedings{manlang:17:carpen-amarie:gc-htm,
      author = {Carpen-Amarie, Maria and Hayduk, Yaroslav and Felber, Pascal and Fetzer, Christof and Thomas, Gaël and Dice, David},
      title = {Towards an Efficient Pauseless Java GC with Selective HTM-Based Access Barriers},
      booktitle = {Proceedings of the international conference on Managed Languages and Runtimes (formerly PPPJ), ManLang'17},
      publisher = {ACM},
      year = {2017},
      pages = {7}
    }
  12. An interface to implement NUMA policies in the Xen hypervisor. Gauthier Voron, Gaël Thomas, Vivien Quéma and Pierre Sens. In Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'17, pages 14.  2017. . [Abstract] [BibTeX] [.pdf] While virtualization only introduces a small overhead on machines with few cores, this is not the case on larger ones. Most of the overhead on the latter machines is caused by the Non-Uniform Memory Access (NUMA) architecture they are using. In order to reduce this overhead, this paper shows how NUMA placement heuristics can be implemented inside Xen. With an evaluation of 29 applications on a 48-core machine, we show that the NUMA placement heuristics can multiply the performance of 9 applications by more than 2.
    @inproceedings{eurosys:17:voron:xen-numa,
      author = {Voron, Gauthier and Thomas, Gaël and Quéma, Vivien and Sens, Pierre},
      title = {An interface to implement NUMA policies in the Xen hypervisor},
      booktitle = {Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'17},
      publisher = {ACM},
      year = {2017},
      pages = {14}
    }
  13. Transactional Pointers: Experiences with HTM-Based Reference Counting in C++. Maria Carpen-Amarie, Dave Dice, Gaël Thomas and Pascal Felber. In Proceedings of the International Conference on Networked Systems, NETYS'16, pages 15.  2016. [Abstract] [BibTeX] [.pdf] The most popular programming languages, such as C++ or Java, have libraries and data structures designed to automatically address concurrency hazards in order to run on multiple threads. In particular, this trend has also been adopted in the memory management domain. However, automatic concurrent memory management also comes at a price, leading sometimes to noticeable overhead. In this paper, we experiment with C++ smart pointers and their automatic memory-management technique based on reference counting. More precisely, we study how we can use hardware transactional memory (HTM) to avoid costly and sometimes unnecessary atomic operations. Our results suggest that replacing the systematic counting strategy with HTM could improve application performance in certain scenarios, such as concurrent linked-list traversal.
    @inproceedings{netys:16:carpen-amarie:transactional-pointers,
      author = {Carpen-Amarie, Maria and Dice, Dave and Thomas, Gaël and Felber, Pascal},
      title = {Transactional Pointers: Experiences with HTM-Based Reference Counting in C++},
      booktitle = {Proceedings of the International Conference on Networked Systems, NETYS'16},
      publisher = {Springer},
      year = {2016},
      pages = {15}
    }
  14. Evaluating HTM for pauseless garbage collectors in Java. Maria Carpen-Amarie, Dave Dice, Patrick Marlier, Gaël Thomas and Pascal Felber. In Proceedings of the International Symposium on Parallel and Distributed Processing with Applications, ISPA'15, pages 8.  2015. . [Abstract] [BibTeX] [.pdf] While garbage collectors (GCs) significantly simplify programmers’ tasks by transparently handling memory management, they also introduce various overheads and sources of unpredictability. Most importantly, GCs typically block the application while reclaiming free memory, which makes them unfit for environments where responsiveness is crucial, such as real-time systems. There have been several approaches for developing concurrent GCs that can exploit the processing capabilities of multi-core architectures, but at the expense of a synchronization overhead between the application and the collector. In this paper, we investigate a novel approach to implementing pauseless moving garbage collection using hardware transactional memory (HTM). We describe the design of a moving GC algorithm that can operate concurrently with the application threads. We study the overheads resulting from using transactional barriers in the Java virtual machine (JVM) and discuss various optimizations. Our findings show that, while the cost of these barriers can be minimized by carefully restricting them to volatile accesses when executing within the interpreter, the actual performance degradation becomes unacceptably high with the just-in-time compiler. The results tend to indicate that current HTM mechanisms cannot be readily used to implement a pauseless GC in Java that can compete with state-of-the-art concurrent GCs.
    @inproceedings{ispa:15:carpen-amarie:stmgc,
      author = {Carpen-Amarie, Maria and Dice, Dave and Marlier, Patrick and Thomas, Gaël and Felber, Pascal},
      title = {Evaluating HTM for pauseless garbage collectors in Java},
      booktitle = {Proceedings of the International Symposium on Parallel and Distributed Processing with Applications, ISPA'15},
      year = {2015},
      pages = {8}
    }
  15. Automatic OpenCL code generation for multi-device heterogeneous architectures. Pei Li, Elisabeth Brunet, François Trahay, Christian Parrot, Gaël Thomas and Raymond Namyst. In Proceedings of the International Conference on Parallel Processing, ICPP'15, pages 10.  2015. . [Abstract] [BibTeX] [.pdf] Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially non- uniform domain decomposition, inter-accelerator data move- ments, and dynamic load balancing. Writing such code manually is time consuming and error-prone. In this paper, we propose a new programming tool called STEPOCL along with a new domain specific language designed to simplify the development of an application for multiple accelerators. We evaluate both the performance and the usefulness of STEPOCL with three applications and show that: (i) the performance of an application written with STEPOCL scales linearly with the number of accelerators, (ii) the performance of an application written using STEPOCL competes with a handwritten version, (iii) larger workloads run on multiple devices that do not fit in the memory of a single device, (iv) thanks to STEPOCL, the number of lines of code required to write an application for multiple accelerators is roughly divided by ten.
    @inproceedings{icpp:15:li:stepocl,
      author = {Li, Pei and Brunet, Elisabeth and Trahay, François and Parrot, Christian and Thomas, Gaël and Namyst, Raymond},
      title = {Automatic OpenCL code generation for multi-device heterogeneous architectures},
      booktitle = {Proceedings of the International Conference on Parallel Processing, ICPP'15},
      year = {2015},
      pages = {10}
    }
  16. Incinerator - Eliminating Stale References in Dynamic OSGi Applications. Koutheir Attouchi, Gaël Thomas, Gilles Muller, Julia Lawall and André Bottaro. In Proceedings of the international conference on Dependable Systems and Networks, DSN'15, pages 11.  2015. . [Abstract] [BibTeX] [.pdf] Java class loaders are commonly used in application servers to load, unload and update a set of classes as a unit. However, unloading or updating a class loader can introduce stale references to the objects of the outdated class loader. A stale reference leads to a memory leak and, for an update, to an inconsistency between the outdated classes and their replacements. To detect and eliminate stale references, we propose Incinerator, a Java virtual machine extension that introduces the notion of an outdated class loader. Incinerator detects stale references and sets them to null during a garbage collection cycle. We evaluate Incinerator in the context of the OSGi framework and show that Incinerator correctly detects and eliminates stale references, including a bug in Knopflerfish. We also evaluate the performance of Incinerator with the DaCapo benchmark on VMKit and show that Incinerator has an overhead of at most 3.3%.
    @inproceedings{dsn:15:attouchi:incinerator,
      author = {Attouchi, Koutheir and Thomas, Gaël and Muller, Gilles and Lawall, Julia and Bottaro, André},
      title = {Incinerator - Eliminating Stale References in Dynamic OSGi Applications},
      booktitle = {Proceedings of the international conference on Dependable Systems and Networks, DSN'15},
      publisher = {IEEE Computer Society},
      year = {2015},
      pages = {11}
    }
  17. NumaGiC: a garbage collector for big data on big NUMA machines. Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro and Nhan Nguyen. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15, pages 14.  2015. . [Abstract] [BibTeX] [.pdf] On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160GB to 350GB, and on SPECjbb2013 and SPECjbb2005, NumaGiC improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6x over NAPS (up to 5.4x over Parallel Scavenge).
    @inproceedings{asplos:15:gidra:numagic,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc and Nguyen, Nhan},
      title = {NumaGiC: a garbage collector for big data on big NUMA machines},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15},
      publisher = {ACM},
      year = {2015},
      pages = {14}
    }
  18. Memory Monitoring on a multi-tenant OSGi execution environment. Koutheir Attouchi, Gaël Thomas, André Bottaro and Gilles Muller. In Proceedings of the international symposium on Component-Based Software Engineering, CBSE'14, pages 107-116.  2014. . [Abstract] [BibTeX] [.pdf] Smart Home market players aim to deploy component-based and service-oriented applications from untrusted third party providers on a single OSGi execution environment. This creates the risk of resource abuse by buggy and malicious applications, which raises the need for resource monitoring mechanisms. Existing resource monitoring solutions either are too intrusive or fail to identify the relevant resource consumer in numerous multi-tenant situations. This paper proposes a system to monitor the memory consumed by each tenant, while allowing them to continue communicating directly to render services. We propose a solution based on a list of configurable resource accounting rules between tenants, which is far less intrusive than existing OSGi monitoring systems. We modified an experimental Java Virtual Machine in order to provide the memory monitoring features for the multi-tenant OSGi environment. Our evaluation of the memory monitoring mechanism on the DaCapo benchmarks shows an overhead below 46%.
    @inproceedings{cbse:14:attouchi:monitoring,
      author = {Attouchi, Koutheir and Thomas, Gaël and Bottaro, André and Muller, Gilles},
      title = {Memory Monitoring on a multi-tenant OSGi execution environment},
      booktitle = {Proceedings of the international symposium on Component-Based Software Engineering, CBSE'14},
      publisher = {ACM},
      year = {2014},
      pages = {107--116}
    }
  19. Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler. Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14, pages 14.  2014. . [Abstract] [BibTeX] [.pdf] Today, Java is regularly used to implement large multi-threaded server-class applications that use locks to protect access to shared data. However, understanding the impact of locks on the performance of a system is complex, and thus the use of locks can impede the progress of threads on configurations that were not anticipated by the developer, during specific phases of the execution. In this paper, we propose Free Lunch, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock. Free Lunch is designed around a new metric, critical section pressure (CSP), which directly correlates the progress of the threads to each of the locks. Using Free Lunch, we have identified phases of high CSP, which were hidden with other lock profilers, in the distributed Cassandra NoSQL database and in several applications from the DaCapo 9.12, the SPECjvm2008 and the SPECjbb2005 benchmark suites. Our evaluation of Free Lunch shows that its overhead is never greater than 6%, making it suitable for in-vivo use.
    @inproceedings{oopsla:14:david:free-lunch,
      author = {David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler},
      booktitle = {Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14},
      publisher = {ACM},
      year = {2014},
      pages = {14}
    }
  20. A study of the scalability of stop-the-world garbage collectors on multicores. Lokesh Gidra, Gaël Thomas, Julien Sopena and Marc Shapiro. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, pages 229-240.  2013. . [Abstract] [BibTeX] [.pdf] Large-scale multicore architectures are problematic for garbage collection (GC). In particular, throughput-oriented stop-the-world algorithms demonstrate excellent performance with a small number of cores, but have been shown to degrade badly beyond approximately 20 cores on OpenJDK 7. This negative result raises the question whether the stop-the-world design has intrinsic limitations that would require a radically different approach. Our study suggests that the answer is no, and that there is no compelling scalability reason to discard the existing highly-optimised throughput-oriented GC code on contemporary hardware. This paper studies the default throughput-oriented garbage collector of OpenJDK 7, called Parallel Scavenge. We identify its bottlenecks, and show how to eliminate them using well-established parallel programming techniques. On the SPECjbb2005, SPECjvm2008 and DaCapo 9.12 benchmarks, the improved GC matches the performance of Parallel Scavenge at low core count, but scales well, up to 48 cores.
    @inproceedings{asplos:13:gidra:naps,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc},
      title = {A study of the scalability of stop-the-world garbage collectors on multicores},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13},
      publisher = {ACM},
      year = {2013},
      pages = {229--240}
    }
  21. Hector: Detecting Resource-Release Omission Faults in Error-Handling Code for Systems Software. Suman Saha, Jean-Pierre Lozi, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the international conference on Dependable Systems and Networks, DSN'13, pages 12.  2013. Best paper award. . [Abstract] [BibTeX] [.pdf] Omitting resource-release operations in systems error handling code can lead to memory leaks, crashes, and deadlocks. Finding omission faults is challenging due to the difficulty of reproducing system errors, the diversity of system resources, and the lack of appropriate abstractions in the C language. To address these issues, numerous approaches have been proposed that globally scan a code base for common resource-release operations. Such macroscopic approaches are notorious for their many false positives, while also leaving many faults undetected.

    We propose a novel microscopic approach to finding resource-release omission faults in systems software. Rather than generalizing from the entire source code, our approach focuses on the error-handling code of each function. Using our tool, Hector, we have found over 370 faults in six systems software projects, including Linux, with a 23% false positive rate. Some of these faults allow an unprivileged malicious user to crash the entire system.
    @inproceedings{dsn:13:saha:ehctor,
      author = {Saha, Suman and Lozi, Jean-Pierre and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Hector: Detecting Resource-Release Omission Faults in Error-Handling Code for Systems Software},
      booktitle = {Proceedings of the international conference on Dependable Systems and Networks, DSN'13},
      publisher = {IEEE Computer Society},
      year = {2013},
      pages = {12}
    }
  22. An improvement of OpenMP pipeline parallelism with the BatchQueue algorithm. Thomas Preud'homme, Julien Sopena, Gaël Thomas and Bertil Folliot. In Proceedings of the International Conference on Parallel and Distributed Systems, ICPADS'12, pages 8.  2012. . [Abstract] [BibTeX] [.pdf] In the context of multicore programming, pipeline parallelism is a solution to easily transform a sequential program into a parallel one without requiring a whole rewriting of the code. The OpenMP stream-computing extension presented by Pop and Cohen proposes an extension of OpenMP to handle pipeline parallelism. However, their communication algorithm relies on multiple producer multiple consumer queues, while pipelined application mostly deals with linear chains of communication, i.e., with only a single producer and a single producer.

    To improve the communication performance of the OpenMP stream-extension, we propose to use, when it is possible, a more specialized single producer single consumer communication algorithm called BatchQueue. Our evaluation shows that BatchQueue is then able to improve the throughput by up to 30% for real applications and by up to 200% for an example application which is fully parallelizable communication intensive micro benchmark. Our study shows therefore that using specialized and efficient communication algorithms can have a significant impact on the overall performance of pipelined applications.
    @inproceedings{icpads:12:preudhomme:batchqueue,
      author = {Preud'homme, Thomas and Sopena, Julien and Thomas, Gaël and Folliot, Bertil},
      title = {An improvement of OpenMP pipeline parallelism with the BatchQueue algorithm},
      booktitle = {Proceedings of the International Conference on Parallel and Distributed Systems, ICPADS'12},
      publisher = {IEEE Computer Society},
      year = {2012},
      pages = {8}
    }
  23. Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. In Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, pages 65-76.  2012. . [Abstract] [BibTeX] [.pdf] The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. In this paper, we propose a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock because such data can typically remain in the server core's cache.

    We have developed a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX locks into RCL locks. We have evaluated our approach on 18 applications: Memcached, Berkeley DB, the 9 applications of the SPLASH-2 benchmark suite and the 7 applications of the Phoenix2 benchmark suite. 10 of these applications, including Memcached and Berkeley DB, are unable to scale because of locks, and benefit from RCL. Using RCL locks, we get performance improvements of up to 2.6 times with respect to POSIX locks on Memcached, and up to 14 times with respect to Berkeley DB.
    @inproceedings{usenix-atc:12:lozi:rcl,
      author = {Lozi, Jean-Pierre and David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications},
      booktitle = {Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12},
      publisher = {USENIX Association},
      year = {2012},
      pages = {65--76}
    }
  24. Faults in Linux: ten years later. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Julia Lawall and Gilles Muller. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11, pages 305-318.  2011. . [Abstract] [BibTeX] [.pdf] In 2001, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired a number of development and research efforts on improving the reliability of driver code. Today Linux is used in a much wider range of environments, provides a much wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality? Are drivers still a major problem?

    To answer these questions, we have transported the experiments of Chou et al. to Linux versions 2.6.0 to 2.6.33, released between late 2003 and early 2010. We find that Linux has more than doubled in size during this period, but that the number of faults per line of code has been decreasing. And, even though drivers still accounts for a large part of the kernel code and contains the most faults, its fault rate is now below that of other directories, such as arch (HAL) and fs (file systems). These results can guide further development and research efforts. To enable others to continually update these results as Linux evolves, we define our experimental protocol and make our checkers and results available in a public archive.
    @inproceedings{asplos:11:palix:faults,
      author = {Palix, Nicolas and Thomas, Gaël and Saha, Suman and Calvès, Christophe and Lawall, Julia and Muller, Gilles},
      title = {Faults in Linux: ten years later},
      booktitle = {Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11},
      publisher = {ACM},
      year = {2011},
      pages = {305--318}
    }
  25. Blue banana: resilience to avatar mobility in distributed MMOGs. Sergey Legtchenko, Sébastien Monnet and Gaël Thomas. In Proceedings of the international conference on Dependable Systems and Networks, DSN'10, pages 171-180.  2010. . [Abstract] [BibTeX] [.pdf] Massively Multiplayer Online Games (MMOGs) recently emerged as a popular class of applications with millions of users. To offer acceptable gaming experience, such applications need to render the virtual world surrounding the player with a very low latency. However, current state of-the-art MMOGs based on peer-to-peer overlays fail to satisfy these requirements. This happens because avatar mobility implies many data exchanges through the overlay. As state-of-the-art overlays do not anticipate this mobility, the needed data is not delivered on time, which leads to transient failures at the application level. To solve this problem, we propose Blue Banana, a mechanism that models and predicts avatar movement, allowing the overlay to adapt itself by anticipation to the MMOG needs. Our evaluation is based on large-scale traces derived from Second life. It shows that our anticipation mechanism decreases by 20% the number of transient failures with only a network overhead of 2%.
    @inproceedings{dsn:10:legtchenko:bluebanana,
      author = {Legtchenko, Sergey and Monnet, Sébastien and Thomas, Gaël},
      title = {Blue banana: resilience to avatar mobility in distributed MMOGs},
      booktitle = {Proceedings of the international conference on Dependable Systems and Networks, DSN'10},
      publisher = {IEEE Computer Society},
      year = {2010},
      pages = {171--180}
    }
  26. BatchQueue: fast and memory-thrifty core to core communication. Thomas Preud'Homme, Julien Sopena, Gaël Thomas and Bertil Folliot. In Proceedings of the international Symposium on Computer Architecture and High Performance Computing, SBAC-PAD'10, pages 215-222.  2010. . [Abstract] [BibTeX] [.pdf] Sequential applications can take advantage of multi-core systems by way of pipeline parallelism to improve their performance. In such parallelism, core to core communication overhead is the main limit of speedup. This paper presents BatchQueue, a fast and memory-thrifty core to core communication system based on batch processing of whole cache line. BatchQueue is able to send a 32bit word of data in just 12.5 ns on a Xeon X5472 and only needs 2 full cache lines plus 3 byte-sized variables -- each on a different cache line for optimal performance -- to work. The characteristics of BatchQueue -- high throughput and increased latency resulting from its batch processing -- makes it well suited for highly communicative tasks with no real time requirements such as monitoring.
    @inproceedings{sbac-pad:10:preudhomme:batchqueue,
      author = {Preud'Homme, Thomas and Sopena, Julien and Thomas, Gaël and Folliot, Bertil},
      title = {BatchQueue: fast and memory-thrifty core to core communication},
      booktitle = {Proceedings of the international Symposium on Computer Architecture and High Performance Computing, SBAC-PAD'10},
      publisher = {IEEE Computer Society},
      year = {2010},
      pages = {215--222}
    }
  27. VMKit: a substrate for managed runtime environments. Nicolas Geoffray, Gaël Thomas, Julia Lawall, Gilles Muller and Bertil Folliot. In Proceedings of the international conference on Virtual Execution Environments, VEE'10, pages 51-62.  2010. . [Abstract] [BibTeX] [.pdf] Managed Runtime Environments (MREs), such as the JVM and the CLI, form an attractive environment for program execution, by providing portability and safety, via the use of a bytecode language and automatic memory management, as well as good performance, via just-in-time (JIT) compilation. Nevertheless, developing a fully featured MRE, including e.g. a garbage collector and JIT compiler, is a herculean task. As a result, new languages cannot easily take advantage of the benefits of MREs, and it is difficult to experiment with extensions of existing MRE based languages.

    This paper describes and evaluates VMKit, a first attempt to build a common substrate that eases the development of high-level MREs. We have successfully used VMKit to build two MREs: a Java Virtual Machine and a Common Language Runtime. We provide an extensive study of the lessons learned in developing this infrastructure, and assess the ease of implementing new MREs or MRE extensions and the resulting performance. In particular, it took one of the authors only one month to develop a Common Language Runtime using VMKit. VMKit furthermore has performance comparable to the well established open source MREs Cacao, Apache Harmony and Mono, and is 1.2 to 3 times slower than JikesRVM on most of the DaCapo benchmarks.
    @inproceedings{vee:10:geoffray:vmkit,
      author = {Geoffray, Nicolas and Thomas, Gaël and Lawall, Julia and Muller, Gilles and Folliot, Bertil},
      title = {VMKit: a substrate for managed runtime environments},
      booktitle = {Proceedings of the international conference on Virtual Execution Environments, VEE'10},
      publisher = {ACM},
      year = {2010},
      pages = {51--62}
    }
  28. I-JVM: a Java virtual machine for component isolation in OSGi. Nicolas Geoffray, Gaël Thomas, Gilles Muller, Pierre Parrend, Stéphane Frénot and Bertil Folliot. In Proceedings of the international conference on Dependable Systems and Networks, DSN'09, pages 544-553.  2009. . [Abstract] [BibTeX] [.pdf] The OSGi framework is a Java-based, centralized, component oriented platform. It is being widely adopted as an execution environment for the development of extensible applications. However, current Java Virtual Machines are unable to isolate components from each other. For instance, a malicious component can freeze the complete platform by allocating too much memory or alter the behavior of other components by modifying shared variables. This paper presents I-JVM, a Java Virtual Machine that provides a lightweight approach to isolation while preserving compatibility with legacy OSGi applications. Our evaluation of I-JVM shows that it solves the 8 known OSGi vulnerabilities that are due to the Java Virtual Machine and that the overhead of I-JVM compared to the JVM on which it is based is below 20%.
    @inproceedings{dsn:09:geoffray:ijvm,
      author = {Geoffray, Nicolas and Thomas, Gaël and Muller, Gilles and Parrend, Pierre and Frénot, Stéphane and Folliot, Bertil},
      title = {I-JVM: a Java virtual machine for component isolation in OSGi},
      booktitle = {Proceedings of the international conference on Dependable Systems and Networks, DSN'09},
      publisher = {IEEE Computer Society},
      year = {2009},
      pages = {544--553}
    }
  29. A lazy developer approach: building a JVM with third party software. Nicolas Geoffray, Gaël Thomas, Charles Clément and Bertil Folliot. In Proceedings of the international symposium on Principles and Practice of Programming in Java, PPPJ'08, pages 73-82.  2008. [Abstract] [BibTeX] [.pdf] The development of a complete Java Virtual Machine (JVM) implementation is a tedious process which involves knowledge in different areas: garbage collection, just in time compilation, interpretation, file parsing, data structures, etc. The result is that developing its own virtual machine requires a considerable amount of man/year. In this paper we show that one can implement a JVM with third party software and with performance comparable to industrial and top open-source JVMs. Our proof-of-concept implementation uses existing versions of a garbage collector, a just in time compiler, and the base library, and is robust enough to execute complex Java applications such as the OSGi Felix implementation and the Tomcat servlet container.
    @inproceedings{pppj:08:geoffray:ladyvm,
      author = {Geoffray, Nicolas and Thomas, Gaël and Clément, Charles and Folliot, Bertil},
      title = {A lazy developer approach: building a JVM with third party software},
      booktitle = {Proceedings of the international symposium on Principles and Practice of Programming in Java, PPPJ'08},
      publisher = {ACM},
      year = {2008},
      pages = {73--82}
    }
  30. A distributed service-oriented mediation tool. Colombe Herault, Gaël Thomas and Philippe Lalanda. In Proceedings of the international Conference on Services Computing, SCC'07, pages 403-409.  2007. . [Abstract] [BibTeX] [.pdf] Integration of heterogeneous information becomes again a requirement with the emergence of large-scale distributed applications such as Web-Services based Applications. Enterprise Service Buses (ESB) deals with distribution and communication, but they still do not fix all the mediation issues such as design, deployment and administration of mediators. It turns out however that current solutions are technology-oriented and beyond the scope of most programmers. In this paper, we present an approach that clearly separates the specification of the mediation operations basing on a service component model, and their execution on a distributed ESB. Model and ESB are independent of the targeted middleware used by applications. This work is made within the European-funded S4ALL project (Services For All).
    @inproceedings{scc:07:herault:mediation,
      author = {Herault, Colombe and Thomas, Gaël and Lalanda, Philippe},
      title = {A distributed service-oriented mediation tool},
      booktitle = {Proceedings of the international Conference on Services Computing, SCC'07},
      publisher = {IEEE Computer Society},
      year = {2007},
      pages = {403--409}
    }
  31. Transparent and dynamic code offloading for Java Application. Nicolas Geoffray, Gaël Thomas and Bertil Folliot. In Proceedings of the international conference on Distributed Objects and Applications, DOA'06, pages 1790-1806.  2006. [Abstract] [BibTeX] [.pdf] Code ofloading is a promising effort for embedded systems and load-balancing. Embedded systems will be able to offoad computation to nearby computers and large-scale applications will be able to load-balance computation during high load. This paper presents a runtime infrastructure that transparently distributes computation between interconnected workstations. Application source code is not modified: instead, dynamic aspect weaving within an extended virtual machine allows to monitor and distribute entities dynamically. Runtime policies for distribution can be dynamically adapted depending on the environment. A first evaluation of the system shows that our technique increases the transaction rate of a Web server during high load by 73%.
    @inproceedings{doa:06:geoffray:offloading,
      author = {Geoffray, Nicolas and Thomas, Gaël and Folliot, Bertil},
      title = {Transparent and dynamic code offloading for Java Application},
      booktitle = {Proceedings of the international conference on Distributed Objects and Applications, DOA'06},
      publisher = {LNCS},
      year = {2006},
      pages = {1790--1806}
    }
  32. A generic language for dynamic adaptation. Assia Hachichi, Gaël Thomas, Cyril Martin, Simon Patarin and Bertil Folliot. In Proceedings of the European conference on Parallel processing, EuroPar'05, pages 40-49.  2005. . [Abstract] [BibTeX] [.pdf] Today, component oriented middlewares are used to design, develop and deploy distributed applications easily. They ensure the heterogeneity, interoperability, and reuse of software modules. Several standards address this issue: CCM (CORBA Component Model), EJB (Enterprise Java Beans) and .Net. However they offer a limited and fixed number of system services, and their deployment and configuration mechanisms cannot be used by any language nor API dynamically. As a solution, we present a generic high-level language to adapt system services dynamically in existing middlewares. This solution is based on a highly adaptable platform which enforces adaptive behaviours, and offers a means to specify and adapt system services dynamically. A first prototype was achieved for the OpenCCM platform, and good performances were obtained.
    @inproceedings{europar:05:hachichi:cvm,
      author = {Hachichi, Assia and Thomas, Gaël and Martin, Cyril and Patarin, Simon and Folliot, Bertil},
      title = {A generic language for dynamic adaptation},
      booktitle = {Proceedings of the European conference on Parallel processing, EuroPar'05},
      publisher = {LNCS},
      year = {2005},
      pages = {40--49}
    }
  33. Support efficient dynamic aspects through reflection and dynamic compilation. Frédéric Ogel, Gaël Thomas and Bertil Folliot. In Proceedings of the Symposium on Applied Computing, SAC'05, pages 1351-1356.  2005. . [Abstract] [BibTeX] [.pdf] As systems grow more and more complex, raising severe evolution and management difficulties, computational reflection and aspect-orientation have proven to enforce separation of concerns principles and thus to address those issues. However, most of the existing solutions rely either on a static source code manipulation or on the introduction of extra-code (and overhead) to support dynamic adaptation. Whereas those approaches represent the extreme of a spectre, developers are left with this rigid trade-off between performance and dynamism. A first step toward a solution was the introduction of specialized virtual machines to support dynamic aspects into the core of the execution engine. However, using such dedicated runtimes limits applications' portability and interoperability. In order to reconcile dynamism and performance without introducing portability and interoperability issues, we propose a dynamic reflexive runtime that uses reflection and dynamic compilation to allow application-specific dynamic weaving strategics, whithout introducing extra-overhead compared to static monolithic weavers.
    @inproceedings{sac:05:ogel:efficientaspect,
      author = {Ogel, Frédéric and Thomas, Gaël and Folliot, Bertil},
      title = {Support efficient dynamic aspects through reflection and dynamic compilation},
      booktitle = {Proceedings of the Symposium on Applied Computing, SAC'05},
      publisher = {ACM},
      year = {2005},
      pages = {1351--1356}
    }

International journals ()

  1. Using differential execution analysis to identify thread interference. Mohamed Said Mosli, François Trahay, Alexis Lescouet, Gauthier Voron, Rémi Dulong, Amina Guermouche, Élisabeth Brunet and Gaël Thomas. IEEE Transactions on Parallel and Distributed Systems (TPDS). Vol. 30(12), pages 13.  2019. . [Abstract] [BibTeX] [.pdf] Understanding the performance of a multi-threaded application is difficult. The threads interfere when they access the same shared resource, which slows down their execution. Unfortunately, current profiling tools report the hardware components or the synchronization primitives that saturate, but they cannot tell if the saturation is the cause of a performance bottleneck. In this paper, we propose a holistic metric able to pinpoint the blocks of code that suffer interference the most, regardless of the interference cause. Our metric uses performance variation as a universal indicator of interference problems. With an evaluation of 27 applications we show that our metric can identify interference problems caused by 6 different kinds of interference in 9 applications. We are able to easily remove 7 of the bottlenecks, which leads to a performance improvement of up to 9 times.
    @article{tpds:19:mosli:ispot,
      author = { Mosli, Mohamed Said and Trahay, François and Lescouet, Alexis and Voron, Gauthier and Dulong, Rémi and Guermouche, Amina and Brunet, Élisabeth and Thomas, Gaël },
      title = {Using differential execution analysis to identify thread interference},
      journal = {IEEE Transactions on Parallel and Distributed Systems (TPDS)},
      publisher = {IEEE Computer Society},
      year = {2019},
      volume = {30},
      number = {12},
      pages = {13}
    }
  2. Fast and Portable Locking for Multicore Architectures. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall and Gilles Muller. ACM Transactions on Computer Systems (TOCS). Vol. 33(4), pages 13:1-13:62.  2016. . [Abstract] [BibTeX] [.pdf] The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking technique, Remote Core Locking (RCL), that aims to accelerate the execution of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server hardware thread. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock, because such data can typically remain in the server’s cache. Other contributions presented in this article include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX lock acquisitions into RCL locks.

    Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an x86 machine with four AMD Opteron processors and 48 hardware threads. By using RCL instead of Linux POSIX locks, performance is improved by up to 2.5 times on Memcached, and up to 11.6 times on Berkeley DB with the TPC-C client. On a SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to Solaris POSIX locks on Memcached, and up to 7.9 times on Berkeley DB with the TPC-C client.
    @article{tocs:16:lozi:rcl,
      author = {Lozi, Jean-Pierre and David, Florian and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Fast and Portable Locking for Multicore Architectures},
      journal = {ACM Transactions on Computer Systems (TOCS)},
      publisher = {ACM},
      year = {2016},
      volume = {33},
      number = {4},
      pages = {13:1--13:62}
    }
  3. Faults in Linux 2.6. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Gilles Muller and Julia Lawall. ACM Transactions on Computer Systems (TOCS). Vol. 32(2), pages 4:1-4:40.  2014. . [Abstract] [BibTeX] [.pdf] In August 2011, Linux entered its third decade. Ten years before, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired numerous efforts on improving the reliability of driver code. Today, Linux is used in a wider range of environments, provides a wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality?

    To answer this question, we have transported Chou et al.'s experiments to all versions of Linux 2.6; released between 2003 and 2011. We find that Linux has more than doubled in size during this period, but the number of faults per line of code has been decreasing. Moreover, the fault rate of drivers is now below that of other directories, such as arch. These results can guide further development and research efforts for the decade to come. To allow updating these results as Linux evolves, we define our experimental protocol and make our checkers available.
    @article{tocs:14:palix:faults,
      author = {Palix, Nicolas and Thomas, Gaël and Saha, Suman and Calvès, Christophe and Muller, Gilles and Lawall, Julia},
      title = {Faults in Linux 2.6},
      journal = {ACM Transactions on Computer Systems (TOCS)},
      publisher = {ACM},
      year = {2014},
      volume = {32},
      number = {2},
      pages = {4:1--4:40}
    }
  4. Designing highly flexible virtual machines: the JnJVM experience. Gaël Thomas, Nicolas Geoffray, Charles Clément and Bertil Folliot. Software - Practice & Experience (SP&E). Vol. 38(15), pages 1643-1675.  2008. . [Abstract] [BibTeX] [.pdf] Dynamic flexibility is a major challenge in modern system design to react to context or applicative requirements evolutions. Adapting behaviors may impose substantial code modification across the whole system, in the field, without service interruption, and without state loss. This paper presents the JnJVM, a full Java virtual machine (JVM) that satisfies these needs by using dynamic aspect weaving techniques and a component architecture. It supports adding or replacing its own code, while it is running, with no overhead on unmodified code execution. Our measurements reveal similar performance when compared to the monolithic JVM Kaffe. Three illustrative examples show different extension scenarios: (i) modifying the JVMs behavior; (ii) adding capabilities to the JVM; and (iii) modifying applications behavior.
    @article{spe:08:thomas:jnjvm,
      author = {Thomas, Gaël and Geoffray, Nicolas and Clément, Charles and Folliot, Bertil},
      title = {Designing highly flexible virtual machines: the JnJVM experience},
      journal = {Software - Practice & Experience (SP&E)},
      publisher = {John Wiley & Sons, Ltd.},
      year = {2008},
      volume = {38},
      number = {15},
      pages = {1643--1675}
    }

International workshops and short papers ()

  1. J-NVM: Off-heap Persistent Objects in Java. Anatole Lefort, Yohan Pipereau, Kwabena Amponsem, Pierre Sutra and Gaël Thomas. In Proceedings of the Non-Volatile Memories Workshop, NVMW'22, pages 2.  2022. [Abstract] [BibTeX] [.pdf] This paper presents J-NVM, a framework to access efficiently Non-Volatile Main Memory (NVMM) in Java. J-NVM offers a fully-fledged interface to persist plain Java objects using failure-atomic blocks. This interface relies internally on proxy objects that intermediate direct off-heap access to NVMM. The framework also provides a library of highly-optimized persistent data types that resist reboots and power failures. We evaluate J-NVM by implementing a persistent backend for the Infinispan data store. Our experimental results, obtained with a TPC-B like benchmark and YCSB, show that J-NVM is consistently faster than other approaches at accessing NVMM in Java.
    @inproceedings{nvmw:22:lafort:jnvm,
      author = { Lefort, Anatole and Pipereau, Yohan and Amponsem, Kwabena and Sutra, Pierre and Thomas, Gaël },
      title = {J-NVM: Off-heap Persistent Objects in Java},
      booktitle = {Proceedings of the Non-Volatile Memories Workshop, NVMW'22},
      year = {2022},
      pages = {2}
    }
  2. Transparent Overlapping of Blocking Communication in MPI Applications. Alexis Lescouet, Élisabeth Brunet, François Trahay and Gaël Thomas. In Proceedings of the international conference on High Performance Computing and Communications, HPCC'20, pages 6.  2020. Short paper. . [Abstract] [BibTeX] [.pdf] With the growing number of cores and fast network like Infiniband, one of the keys to performance improvement in MPI applications is the ability to overlap CPU-bound computation with network communications. While this can be done manually, this is often a complex and error prone procedure. We propose an approach that allows MPI blocking communication to act as nonblocking communication until data are needed, increasing the potential for communication and computation overlapping.

    Our approach, COMMMAMA, uses a separate communication thread to which communications are offloaded and a memory protection mechanism to track memory accesses in communication buffers. This guarantees both progress for these communications and the largest window during which communication and computation can be processed in parallel. This approach also significantly reduces the hassle for programmers to design MPI applications as it reduces the need to forecast when nonblocking communication should be waited.
    @inproceedings{hpcc:20:lescouet:commmama,
      author = {Lescouet, Alexis and Brunet, Élisabeth and Trahay, François and Thomas, Gaël },
      title = {Transparent Overlapping of Blocking Communication in MPI Applications},
      booktitle = {Proceedings of the international conference on High Performance Computing and Communications, HPCC'20},
      year = {2020},
      pages = {6}
    }
  3. ScalOMP: Analyzing the Scalability of OpenMP Applications. Anton Daumen, Patrick Carribault, François Trahay and Gaël Thomas. In Proceedings of the International Workshop on OpenMP, IWOMP'19, pages 14.  2019. [Abstract] [BibTeX] [.pdf] Achieving good scalability from parallel codes is becoming increasingly difficult due to the hardware becoming more and more complex. Performance tools help developers but their use is sometimes complicated and very iterative. In this paper we propose a simple methodology for assessing the scalability and for detecting performance problems in an OpenMP application. This methodology is implemented in a performance analysis tool named ScalOMP that relies on the capabilities of OMPT for analyzing OpenMP applications. ScalOMP reports the code regions with scalability issues and suggests optimization strategies for those issues. The evaluation shows that ScalOMP incurs low overhead and that its suggestions lead to significant performance improvement of several OpenMP applications.
    @inproceedings{iwomp:19:daumen:scalomp,
      author = { Daumen, Anton and Carribault, Patrick and Trahay, François and Thomas, Gaël },
      title = {ScalOMP: Analyzing the Scalability of OpenMP Applications},
      booktitle = {Proceedings of the International Workshop on OpenMP, IWOMP'19},
      year = {2019},
      pages = {14}
    }
  4. A Performance Study of Java Garbage Collectors on Multicore Architectures. Maria Carpen-Amarie, Patrick Marlier, Pascal Felber and Gaël Thomas. In Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'15, pages 10.  2015. [Abstract] [BibTeX] [.pdf] In the last few years, managed runtime environments such as the Java Virtual Machine (JVM) are increasingly used on large-scale multicore servers. The garbage collector (GC) represents a critical component of the JVM and has a significant influence on the overall performance and efficiency of the running application. We perform a study on all available Java GCs, both in an academic environment (set of benchmarks), as well as in a simulated real-life situation (client-server application). We mainly focus on the three most widely used collectors: ParallelOld, ConcurrentMarkSweep and G1. We find that they exhibit different behaviours in the two tested environments. In particular, the default Java GC, ParallelOld, proves to be stable and adequate in the first situation, while in the real-life scenario its use results in unacceptable pauses for the application threads. We believe that this is partly due to the memory requirements of the multicore server. G1 GC performs notably bad on the benchmarks when forced to have a full collection between the iterations of the application. Moreover, even though G1 and ConcurrentMarkSweep GCs introduce significantly lower pauses than ParallelOld in the client-server environment, they can still seriously impact the response time on the client. Pauses of around 3 seconds can make a real-time system unusable and may disrupt the communication between nodes in the case of large-scale distributed systems.
    @inproceedings{pmam:15:carpen-amarie:gcanalysis,
      author = {Carpen-Amarie, Maria and Marlier, Patrick and Felber, Pascal and Thomas, Gaël},
      title = {A Performance Study of Java Garbage Collectors on Multicore Architectures},
      booktitle = {Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'15},
      publisher = {ACM},
      year = {2015},
      pages = {10}
    }
  5. EZ: towards efficient asynchronous protocol gateway construction. Yérom-David Bromberg, Morandat Floréal, Réveillère Laurent and Gaël Thomas. In Proceedings of the conference on Distributed Applications and Interoperable Systems, DAIS'13, pages 169-174.  2013. Short paper. . [Abstract] [BibTeX] [.pdf] Over the past decade, we have witnessed the emergence of a bulk set of devices, from very different application domains interconnected via Internet to form what is commonly named Internet of Things (IoT). The IoT vision is grounded in the belief that all devices are able to interact seamlessly with each other anytime, anyplace, anywhere. However, devices communicate via a multitude of incompatible protocols, and consequently drastically slow down the IoT vision adoption. Gateways, that are able to translate one protocol to another, appear to be a key enabler of the future of IoT but present a cumbersome challenge for many developers. In this paper, we are providing a framework called EZ that enables to generate gateways for either C or Java platform without requiring from developers any substantial understanding of either relevant protocols or low-level network programming.
    @inproceedings{dais:13:bromberg:ez,
      author = {Bromberg, Yérom-David and Morandat Floréal and Réveillère Laurent and Thomas, Gaël},
      title = {EZ: towards efficient asynchronous protocol gateway construction},
      booktitle = {Proceedings of the conference on Distributed Applications and Interoperable Systems, DAIS'13},
      publisher = {Springer},
      year = {2013},
      pages = {169--174}
    }
  6. Assessing the scalability of garbage collectors on many cores. Lokesh Gidra, Gaël Thomas, Julien Sopena and Marc Shapiro. In Proceedings of the SOSP Workshop on Programming Languages and Operating Systems, PLOS'11, pages 1-5.  2011. Best paper award. . [Abstract] [BibTeX] [.pdf] Managed Runtime Environments (MRE) are increasingly used for application servers that use large multi-core hardware. We find that the garbage collector is critical for overall performance in this setting. We explore the costs and scalability of the garbage collectors on a contemporary 48-core multiprocessor machine. We present experimental evaluation of the parallel and concurrent garbage collectors present in OpenJDK, a widely-used Java virtual machine. We show that garbage collection represents a substantial amount of an application's execution time, and does not scale well as the number of cores increases. We attempt to identify some critical scalability bottlenecks for garbage collectors.
    @inproceedings{plos:11:gidra:gc,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc},
      title = {Assessing the scalability of garbage collectors on many cores},
      booktitle = {Proceedings of the SOSP Workshop on Programming Languages and Operating Systems, PLOS'11},
      publisher = {ACM},
      year = {2011},
      pages = {1--5}
    }
  7. How often do experts make mistakes?. Nicolas Palix, Julia Lawall, Gaël Thomas and Gilles Muller. In Proceedings of the workshop on Aspects, Components, and Patterns for Infrastructure Software, ACP4IS'10, pages 9-16.  2010. . [Abstract] [BibTeX] [.pdf] Large open-source software projects involve developers with a wide variety of backgrounds and expertise. Such software projects furthermore include many internal APIs that developers must understand and use properly. According to the intended purpose of these APIs, they are more or less frequently used, and used by developers with more or less expertise. In this paper, we study the impact of usage patterns and developer expertise on the rate of defects occurring in the use of internal APIs. For this preliminary study, we focus on memory management APIs in the Linux kernel, as the use of these has been shown to be highly error prone in previous work. We study defect rates and developer expertise, to consider e.g., whether widely used APIs are more defect prone because they are used by less experienced developers, or whether defects in widely used APIs are more likely to be fixed.
    @inproceedings{acp4is:10:palix:bugs,
      author = {Palix, Nicolas and Lawall, Julia and Thomas, Gaël and Muller, Gilles},
      title = {How often do experts make mistakes?},
      booktitle = {Proceedings of the workshop on Aspects, Components, and Patterns for Infrastructure Software, ACP4IS'10},
      year = {2010},
      pages = {9--16}
    }
  8. Partition participant detector with dynamic paths in mobile networks. Luciana Arantes, Pierre Sens, Gaël Thomas, Denis Conan and Leon Lim. In Proceedings of the international symposium on Network Computing and Applications, NCA'10, pages 224-228.  2010. Short paper. [Abstract] [BibTeX] [.pdf] Mobile ad-hoc networks, MANETs, are self organized and very dynamic systems where processes have no global knowledge of the system. In this paper, we propose a model that characterizes the dynamics of MANETs in the sense that it considers that paths between nodes are dynamically built and the system can have infinitely many processes but the network may present finite stable partitions. We also propose an algorithm that implements an eventually perfect partition participant detector PD which eventually detects the participant nodes of stable partitions.
    @inproceedings{nca:10:arantes:manet,
      author = {Arantes, Luciana and Sens, Pierre and Thomas, Gaël and Conan, Denis and Lim, Leon},
      title = {Partition participant detector with dynamic paths in mobile networks},
      booktitle = {Proceedings of the international symposium on Network Computing and Applications, NCA'10},
      publisher = {IEEE Computer Society},
      year = {2010},
      pages = {224--228}
    }
  9. Towards a new isolation abstraction for OSGi. Nicolas Geoffray, Gaël Thomas, Charles Clément and Bertil Folliot. In Proceedings of the workshop on Isolation and Integration in Embedded Systems, IIES'08, pages 41-45.  2008. [Abstract] [BibTeX] [.pdf] The OSGi specification defines a dynamic Java-based service oriented architecture for networked environments such as home service gateways. To provide isolation between different services, it relies on the Java class loading mechanism. While class loaders have many advantages beside isolation, they are poor in protecting the system against malicious or buggy services. In this paper, we propose a new approach for service isolation. It is based on the Java isolate technology, without a task-oriented architecture. Our approach is more tailored to service-oriented architectures and in particular offers a complete isolation abstraction to the OSGi platform.
    @inproceedings{iies:08:geoffray:ijvm,
      author = {Geoffray, Nicolas and Thomas, Gaël and Clément, Charles and Folliot, Bertil},
      title = {Towards a new isolation abstraction for OSGi},
      booktitle = {Proceedings of the workshop on Isolation and Integration in Embedded Systems, IIES'08},
      year = {2008},
      pages = {41--45}
    }
  10. Live and heterogeneous migration of execution environments. Nicolas Geoffray, Gaël Thomas and Bertil Folliot. In Proceedings of the international workshop on Pervasive Systems, PerSys'06, pages 1254-1263.  2006. [Abstract] [BibTeX] [.pdf] Application migration and heterogeneity are inherent issues of pervasive systems. Each implementation of a pervasive system must provide its own migration framework which hides heterogeneity of the different resources. This leads to the development of many frameworks that perform the same functionality. We propose a minimal execution environment, the micro virtual machine, that factorizes process migration implementation and offers heterogeneity, transparency and performance. Systems implemented on top of this micro virtual machine, such as our own Java virtual machine, will therefore automatically inherit process migration capabilities.
    @inproceedings{persys:06:geoffray:migration,
      author = {Geoffray, Nicolas and Thomas, Gaël and Folliot, Bertil},
      title = {Live and heterogeneous migration of execution environments},
      booktitle = {Proceedings of the international workshop on Pervasive Systems, PerSys'06},
      year = {2006},
      pages = {1254--1263}
    }
  11. Mediation and enterprise service bus -- A position paper. Colombe Hérault, Gaël Thomas and Philippe Lalanda. In Proceedings of the international workshop on Mediation in Semantic Web Services, Mediate'05, pages 1-13.  2005. [Abstract] [BibTeX] [.pdf] Enterprise Service Buses (ESB) are becoming standard to allow communication between Web Services. Different techniques and tools have been proposed to implement and to deploy mediators within ESBs. It turns out however that current solutions are very technology-oriented and beyond the scope of most programmers. In this position paper, we present an approach that clearly separates the specification of the mediation operations and their execution on an ESB. This work is made within the European-funded S4ALL project (Services For All).
    @inproceedings{mediate:05:herault:mediation,
      author = {Hérault, Colombe and Thomas, Gaël and Lalanda, Philippe},
      title = {Mediation and enterprise service bus -- A position paper},
      booktitle = {Proceedings of the international workshop on Mediation in Semantic Web Services, Mediate'05},
      year = {2005},
      pages = {1--13}
    }
  12. A step toward ubiquitous computing: an efficient flexible micro-ORB. Frédéric Ogel, Bertil Folliot and Gaël Thomas. In Proceedings of the 2004 ACM SIGOPS European Workshop, pages 176-181.  2004. . [Abstract] [BibTeX] [.pdf] Smart devices, such as personal assistants, mobile phone or smart cards, continuously spread and thus challenge every aspect of our lives. However, such environments exhibit specific constraints, such as mobility, high-level of dynamism and most often restricted resources. Traditional middlewares were not designed for such constraints and, because of their monolithic, static and rigid architectures, are not likely to become a fit.

    In response, we propose a flexible micro-ORB, called FlexORB, that supports on demand export of services as well as their dynamic deployment and reconfiguration. FlexORB supports mobile code through an intermediate code representation. It is built on top of Nevermind, a flexible minimal execution environment, which uses a reflexive dynamic compiler as a central common language substrate upon which to achieve interoperability.

    Preliminary performance measurements show that, while being relatively small (120 KB) and dynamically adaptable, FlexORB outperforms traditional middlewares such as RPC, CORBA and Java RMI.
    @inproceedings{sigopsew:04:ogel:micro-orb,
      author = {Ogel, Frédéric and Folliot, Bertil and Thomas, Gaël},
      title = {A step toward ubiquitous computing: an efficient flexible micro-ORB},
      booktitle = {Proceedings of the 2004 ACM SIGOPS European Workshop},
      publisher = {ACM},
      year = {2004},
      pages = {176--181}
    }

Book chapters ()

  1. Peer-to-Peer storage. Olivier Marin, Sébastien Monnet and Gaël Thomas. In Distributed Systems: Design and Algorithms, pages 59-80.  2011. [Abstract] [BibTeX] [.pdf] Peer-to-peer storage applications are the main actual implementations of large-scale distributed software. A peer-to-peer storage application offers five main operations: a lookup operation to find a file, a read operation to read a file, a write operation to modify a file, an add operation to inject a new file and a remove operation to delete a file. However, most current peer-to-peer storage applications are limited to file sharing: they do not implement the write and the delete operations.
    @incollection{book:11:marin:peer-to-peer-storage,
      author = {Marin, Olivier and Monnet, Sébastien and Thomas, Gaël},
      title = {Peer-to-Peer storage},
      booktitle = {Distributed Systems: Design and Algorithms},
      publisher = {John Wiley & Sons, Ltd.},
      year = {2011},
      pages = {59-80}
    }
  2. Large-Scale peer-to-peer game applications. Sébastien Monnet and Gaël Thomas. In Distributed Systems: Design and Algorithms, pages 81-103.  2011. [Abstract] [BibTeX] [.pdf] Massively multiplayer online games (MMOG) recently emerged as a popular class of applications with up to millions of users, spread over the world, connected through the Internet to play together. Most of these games provide a virtual environment in which players evolve, and interact with each other. When a player moves, moves an object, or performs any operation that has an impact on the virtual environment, players around him can see his actions.
    @incollection{book:11:monnet:large-scale-game,
      author = {Monnet, Sébastien and Thomas, Gaël},
      title = {Large-Scale peer-to-peer game applications},
      booktitle = {Distributed Systems: Design and Algorithms},
      publisher = {John Wiley & Sons, Ltd.},
      year = {2011},
      pages = {81-103}
    }
  3. Virtualisation logicielle : de la machine réelle à la machine virtuelle abstraite. Bertil Folliot and Gaël Thomas. In Techniques de l'Ingénieur, pages 1-15.  2009. [Abstract] [BibTeX] [.pdf] Masquer l'hétérogénéité est un des grands challenges de l'informatique moderne : le nombre de configuration matériel est colossal et il est impossible de développer une application pour chacune de ces configurations spécifiques. La virtualisation logicielle apporte une réponse à ce problème en uniformisant l'accès au matériel, que ce soit l'accès au périphérique ou au processeur centrale. Deux domaines de l'informatique s'occupe de virtualisation : le domaine des systèmes d'exploitation s'occupe de masquer l'hétérogénéité des périphériques uniquement et le domaine des machines virtuelles s'occupe de masquer l'hétérogénéité des processeurs centraux.
    @incollection{book:09:thomas:virtualization,
      author = {Folliot, Bertil and Thomas, Gaël},
      title = {Virtualisation logicielle : de la machine réelle à la machine virtuelle abstraite},
      booktitle = {Techniques de l'Ingénieur},
      publisher = {Hermes},
      year = {2009},
      pages = {1--15}
    }
  4. Applications pair-à-pair de partage de données. Emmanuel Saint-James and Gaël Thomas. In Systèmes répartis en action : de l'embarqué aux systèmes à large échelle, pages 223-256.  2008. [Abstract] [BibTeX] [.pdf] Les applications réparties à très large échelle ont connu leur principal succès dans les applications de partage de fichiers. Une application de partage de fichiers est fondamentalement un système de fichiers minimal ne possédant qu’un unique répertoire et dans lequel la pérennité des données n’est pas un but, c’est-à-dire que les fichiers peuvent se perdre. Une application de partage de fichiers peut donc se résumer à deux fonctions : une fonction de lecture et une fonction d’ajout (ou d’écriture).
    @incollection{book:08:thomas:p2p,
      author = {Saint-James, Emmanuel and Thomas, Gaël},
      title = {Applications pair-à-pair de partage de données},
      booktitle = {Systèmes répartis en action~: de l'embarqué aux systèmes à large échelle},
      publisher = {Hermes},
      year = {2008},
      pages = {223--256}
    }
  5. Towards active applications: the virtual virtual machine approach. Frédéric Ogel, Gaël Thomas, Ian Piumarta, Antoine Galland, Bertil Folliot and Carine Baillarguet. In New Trends in Computer Science and Engineering, pages 1-21.  2003. [Abstract] [BibTeX] [.pdf] With the wide acceptance of distributed computing a rapidly growing number of application domains are emerging, leading to a growing number of ad hoc solutions that are rigid and poorly interoperable. Our response to this situation is a platform for building flexible and interoperable execution environments called the Virtual Virtual Machine. This article presents our approach, the architecture of the VVM and some of its primary applications.
    @incollection{book:03:ogel:vvm,
      author = {Ogel,Frédéric and Thomas, Gaël and Piumarta, Ian and Galland, Antoine and Folliot, Bertil and Baillarguet, Carine},
      title = {Towards active applications: the virtual virtual machine approach},
      booktitle = {New Trends in Computer Science and Engineering},
      publisher = {A92 Publishing House},
      year = {2003},
      pages = {1--21},
      edition = {POLIROM Press}
    }

French publications ()

  1. An actor based language for trusted execution environments. Subashiny Tanigassalame and Gaël Thomas. Fast abstract at COMPAS'19.  2019. [Abstract] [BibTeX] TEE (Trusted Execution Environment) capable processor enforces privacy by ensuring that a data deployed by a user in the cloud can never leak the processor. Technically, a TEE capable processor encrypts the data of the user when the data leaves the processor to go to memory, which ensures that neither the administrator nor a privileged software like the operating system or hypervisor can access the data. Those protected code is executed inside enclave, which is a region of encrypted memory. Entering and exiting these enclaves are very costly in terms of CPU cycles (around 8000 to 9000 CPU cycles). Today, despite software development kits provided by the main CPU manufacturers, programming a TEE remains difficult, not to mention multi-enclave usage. My PhD thesis concentrates on simplifying the use of TEEs for C language. An actor based programming model, EActors, is proposed in order to efficiently use SGX (Software Guard Extensions), a TEE provided by Intel, and facilitate multi-enclave programming. Based on EActors, we will propose a user-friendly domain specific language (DSL).

    The new DSL consists of generating the low-level details of the actors from annotated C language code. Code annotation is the key to the developer to indicate the data sensitivity. The DSL automatically isolates an annotated legacy code into actors, before placing these actors in different enclaves according to privacy aware data partitioning. Dividing a legacy code into actors also helps in minimising trusted code base, therefore reducing the attack surface of the code. We modify the Clang-LLVM compiler to provide a complete abstraction of generated actor models and SGX specific instructions. We use tainting methods to propagate annotations, targeting to detect anomalies jeopardising privacy. Alongside the traditional C code warnings generated by the compiler, new warnings will be generated in case of contradictory annotations, aims at avoiding unintended data leakage.
    @misc{fast-compas:19:tanigassalame:dsl,
      author = {Tanigassalame, Subashiny and Thomas, Gaël},
      title = {An actor based language for trusted execution environments},
      year = {2019}
    }
  2. Scalevisor : un pilote CPU et mémoire pour les gros multicœurs. Alexis Lescouet, Nicolas Derumigny and Gaël Thomas. In Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'18, pages 7.  2018. [Abstract] [BibTeX] [.pdf] Ces dernières années, le besoin de puissance de calcul a conduit à l’apparition de nouvelles architectures complexes utilisant le parallélisme pour gagner en puissance. Or, ces machines ne produisent que des performances médiocres si la gestion de ces ressources (mémoires, CPUs) ne permet pas de tirer profit du parallélisme. Malheureusement, l’introdution de nouvelles heuristiques de gestion de la mémoire dans les noyaux existants est un travail complexe qui requiert la modification de nombreuses parties du code. Plutôt que de modifier en profondeur le noyau, nous proposons de mettre en œuvre un pilote de périphériques dédié à la gestion de ces ressources et d’utiliser des techniques de virtualisation pour rendre ce pilote transparent pour le noyau. Ce pilote permettra la mise en œuvre de nouvelles heuristiques qui seront adaptables selon les spécificités du matériel et des applications.
    @inproceedings{compas:18:lescouet:scalevisor,
      author = {Lescouet, Alexis and Derumigny, Nicolas and Thomas, Gaël},
      title = {Scalevisor : un pilote CPU et mémoire pour les gros multicœurs},
      booktitle = {Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'18},
      year = {2018},
      pages = {7}
    }
  3. Détection automatique d'interférences entre threads. Mohamed Said Mosli Bouksiaa, François Trahay and Gaël Thomas. In Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'16, pages 7.  2016. [Abstract] [BibTeX] [.pdf] Comprendre les performances des applications multi-threadées peut s'avèrer difficile à cause des interférences entre threads. Alors que certaines interférences sont prévisibles (par exemple l'acquisition d'un verrou), d'autres sont plus subtiles (par exemple, le faux-partage) et complexes à détecter. Dans cet article, nous proposons une méthodologie et une métrique permettant de détecter les intérferences entre des threads et d'en quantifier l'impact sur les performances globales de l'application. Cette méthodologie consiste à étudier la variation de la durée d'exécution du code. Nous avons appliqué cette méthodologie à un ensemble de micro-benchmarks et d'applications. Les résultats montrent que cette méthodologie permet effectivement de détecter les interférences entre les threads d'une application.
    @inproceedings{compas:16:mosli:rdam,
      author = {Mosli Bouksiaa, Mohamed Said and Trahay, François and Thomas, Gaël},
      title = {Détection automatique d'interférences entre threads},
      booktitle = {Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'16},
      year = {2016},
      pages = {7}
    }
  4. Détection automatique d'anomalies de performance. Mohamed Said Mosli Bouksiaa, François Trahay and Gaël Thomas. In Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'15, pages 10.  2015. [Abstract] [BibTeX] [.pdf] Le débogage des applications distribuées à large échelle ou encore des applications HPC est difficile. La tâche est encore plus compliquée quand il s’agit d’anomalies de performance. Les outils qui sont largement utilisés pour la détection de ces anomalies ne permettent pas d’en trouver les causes.

    Dans cet article, nous présentons une approche basée sur l’analyse des traces d’exécution de programmes distribués. Notre approche permet de détecter des motifs récurrents dans les traces d’exécution et de les exploiter pour isoler les anomalies de performance. Les anomalies sont ensuite utilisées pour en trouver les causes. Les résultats préliminaires montrent que nos algorithmes arrivent à détecter automatiquement de nombreuses anomalies et à les associer avec leurs causes.
    @inproceedings{compas:15:mosli:perf-analysis,
      author = {Mosli Bouksiaa, Mohamed Said and Trahay, François and Thomas, Gaël},
      title = {Détection automatique d'anomalies de performance},
      booktitle = {Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'15},
      year = {2015},
      pages = {10}
    }
  5. Optimisation mémoire dans une architecture NUMA : comparaison des gains entre natif et virtualisé. Gauthier Voron, Gaël Thomas, Pierre Sens and Vivien Quema. In Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'15, pages 10.  2015. Best paper award. [Abstract] [BibTeX] [.pdf] L’exécution d’applications dans une architecture NUMA nécessite la mise en œuvre de poli- tiques adaptées pour pouvoir utiliser efficacement les ressources matérielles disponibles. Dif- férentes techniques qui permettent à un système d’exploitation d’assurer une bonne latence mémoire sur de telles machines ont déjà été étudiées. Cependant, dans le cloud, ces systèmes d’exploitation s’exécutent dans des machines virtuelles sous la responsabilité d’un hyperviseur, qui est soumis à des contraintes qui lui sont propres. Dans cet article, nous nous intéressons à ces contraintes et à la manière dont elles affectent les politiques NUMA déjà existantes. Nous étudions pour cela les effets d’une technique d’optimisation mémoire connue dans un système virtualisé et les comparons avec ceux obtenus dans un système d’exploitation.
    @inproceedings{compas:15:voron:xen-numa,
      author = {Voron, Gauthier and Thomas, Gaël and Sens, Pierre and Quema, Vivien},
      title = {Optimisation mémoire dans une architecture NUMA~: comparaison des gains entre natif et virtualisé},
      booktitle = {Proceedings of the Conférence en Parallélisme, Architecture et Système, COMPAS'15},
      year = {2015},
      pages = {10}
    }
  6. BatchQueue : file producteur / consommateur optimisée pour les multi-coeurs. Thomas Preud'Homme, Julien Sopena, Gaël Thomas and Bertil Folliot. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'11, pages 1-12.  2011. [Abstract] [BibTeX] [.pdf] Les applications séquentielles peuvent tirer partie des systèmes multi-coeurs en utilisant le parallélisme pipeline pour accroître leur performance. Dans un tel schéma de parallélisme, l'accélération possible est limitée par le surcoût dû à la communication coeur à coeur. Ce papier présente l'algorithme BatchQueue, un système de communication rapide concu pour optimiser l'utilisation du cache matériel, notamment au regard du pré-chargement. BatchQueue propose des performances améliorées d'un facteur 2 : il est capable d'envoyer un mot de données en 3,5 nanosecondes sur un système 64 bits, représentant un débit de 2 Gio/s.
    @inproceedings{cfse:11:preudhomme:batchqueue,
      author = {Preud'Homme, Thomas and Sopena, Julien and Thomas, Gaël and Folliot, Bertil},
      title = {BatchQueue : file producteur / consommateur optimisée pour les multi-coeurs},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'11},
      year = {2011},
      pages = {1--12}
    }
  7. I-JVM: une machine virtuelle Java pour l'isolation de composants dans OSGi. Nicolas Geoffray, Gaël Thomas, Gilles Muller, Pierre Parrend, Stéphane Frénot and Bertil Folliot. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'09, pages 1-12.  2009. [Abstract] [BibTeX] [.pdf] OSGi est une plateforme orientée composants implémentée en Java qui est de plus en plus utilisée pour le développement d'applications extensibles. Cependant, les machines virtuelles Java existantes ne sont pas capables d'isoler des composants entre eux. Par exemple, un composant malicieux peut bloquer l'exécution de la plateforme en allouant trop de mémoire ou modifier le comportement d'autres composants en modifiant des variables globales. Nous présentons I-JVM, une machine virtuelle Java qui offre une isolation légère entre composants tout en préservant la compatibilité avec les applications OSGi existantes. I-JVM résoud les 8 vulnérabilités connues sur la plateforme OSGi liées à la machine virtuelle, et ne diminue que de 20% les performances des applications en comparaison avec la machine virtuelle sur laquelle elle est implémentée.
    @inproceedings{cfse:09:geoffray:ijvm,
      author = {Geoffray, Nicolas and Thomas, Gaël and Muller, Gilles and Parrend, Pierre and Frénot, Stéphane and Folliot, Bertil},
      title = {I-JVM: une machine virtuelle Java pour l'isolation de composants dans OSGi},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'09},
      year = {2009},
      pages = {1--12}
    }
  8. Distribution transparente et dynamique de code pour applications Java. Nicolas Geoffray, Gaël Thomas and Bertil Folliot. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'06, pages 85-96.  2006. [Abstract] [BibTeX] [.pdf] La délégation d'exécution de code est un mécanisme prometteur pour les systèmes nomades et la répartition de charge. Les systèmes nomades pourraient profiter des ressources voisines pour leur envoyer du code à exécuter, et les serveurs Web pourraient répartir leur charge durant un pic de charge. Ce papier présente une infrastructure capable de distribuer une application Java existante entre plusieurs machines et de manière transparente. Le code source des applications ne nécessite pas de modification : nous utilisons le tissage dynamique d'aspects pour analyser (monitorage) et distribuer l'application pendant l'exécution. L'infrastructure est composée d'un environnement d'éxécution extensible et adaptable et de JnJVM, une machine virtuelle Java flexible étendue avec un tisseur d'aspect dynamique. Notre système est chargé pendant l'exécution lorsque l'application le nécessite, sans la redémarrer. Une première évaluation montre que notre système augmente le nombre de transactions par secondes d'un serveur Web de 73%.
    @inproceedings{cfse:06:geoffray:distribution,
      author = {Geoffray, Nicolas and Thomas, Gaël and Folliot, Bertil},
      title = {Distribution transparente et dynamique de code pour applications Java},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'06},
      year = {2006},
      pages = {85--96}
    }
  9. Propagation d'événements entre passerelles OSGi. Didier Donsez and Gaël Thomas. In Proceedings of the 2006 Atelier de travail OSGi, pages 1-5.  2006. [Abstract] [BibTeX] [.pdf] Le service de communication événementielle d'OSGi offre un cadre standard pour faire communiquer des services co-localisés. Nous proposons de propager ces événements hors de la passerelle à l'aide de ponts qui ne nécessitent aucune modification ni chez les émetteurs/récepteurs, ni dans le service d'événements. Plusieurs ponts peuvent alors cohabiter au sein de la même passerelle ce qui masque l'hétérogénéité des intergiciels à l'application et fait inter-opérer ces intergiciels. Notre infrastructure permet la construction de nouvelles applications reposant sur des intergiciels orientés message et nécessitant la flexibilité apportée par OSGi. Trois implantations de ponts sont présentés dans cet article et valident notre approche.
    @inproceedings{osgiw:06:donsez:event,
      author = {Donsez, Didier and Thomas, Gaël},
      title = {Propagation d'événements entre passerelles OSGi},
      booktitle = {Proceedings of the 2006 Atelier de travail OSGi},
      year = {2006},
      pages = {1--5}
    }
  10. MVV : une plate-forme à composants dynamiquement reconfigurables -- La machine virtuelle virtuelle. Frédéric Ogel, Gaël Thomas, Antoine Galland and Bertil Folliot. Technique et Science Informatiques (TSI). Vol. 23(10/2004), pages 1269-1299.  2004. [Abstract] [BibTeX] [.pdf] Le nombre toujours croissant de domaine d'application émergent entraîne un nombre croissant de solutions ad hoc, rigides et faiblement interopérables. Notre réponse à cette situation est une plate-forme pour la construction d'applications et d'environnements d'exécutions flexibles et interopérables appelée machine virtuelle virtuelle. Cet article présente notre approche, l'architecture de la plate-forme ainsi que les premières applications.
    @article{book:04:ogel:tsi,
      author = {Ogel, Frédéric and Thomas, Gaël and Galland, Antoine and Folliot, Bertil},
      title = {MVV : une plate-forme à composants dynamiquement reconfigurables -- La machine virtuelle virtuelle},
      journal = {Technique et Science Informatiques (TSI)},
      publisher = {Hermes},
      year = {2004},
      volume = {23},
      number = {10/2004},
      pages = {1269--1299}
    }
  11. Reconfigurations dynamiques de services dans un intergiciel à composants CORBA CCM. Assia Hachichi, Cyril Martin, Gaël Thomas, Simon Patarin and Bertil Folliot. In Proceedings of the conférence francophone sur le Déploiement et la (Re)configuration de logiciels, DECOR'04, pages 159-170.  2004. [Abstract] [BibTeX] [.pdf] De nos jours, les intergiciels à composants sont utilisés pour concevoir, développer, et déployer facilement les applications réparties, et assurer l'hétérogénéité, et l'interopérabilité, ainsi que la réutilisation des modules logiciels, et la séparation entre le code métier encapsulé dans des composants et le code système géré par les conteneurs. De nombreux standards répondent à cette définition tels: CCM (CORBA Component Model), EJB (Entreprise Java Beans) et .NET. Cependant ces standards offrent un nombre limité et figé de services systèmes, supprimant ainsi toute possibilité d'ajout de services systèmes ou de reconfiguration dynamiques de l'intergiciel. Nos travaux proposent des mécanismes d'ajout et d'adaptation dynamique des services systèmes, basés sur un langage de reconfiguration adaptable dynamiquement aux besoins de la reconfiguration et sur un outil de reconfiguration dynamique. Un prototype a été réalisé pour la plateforme OpenCCM, qui est une implémentation de la spécification CCM de l'OMG.
    @inproceedings{decor:04:hachichi:reconf,
      author = {Hachichi, Assia and Martin, Cyril and Thomas, Gaël and Patarin, Simon and Folliot, Bertil},
      title = {Reconfigurations dynamiques de services dans un intergiciel à composants CORBA CCM},
      booktitle = {Proceedings of the conférence francophone sur le Déploiement et la (Re)configuration de logiciels, DECOR'04},
      year = {2004},
      pages = {159--170}
    }
  12. Jnjvm : une plateforme Java adaptable pour applications actives. Gaël Thomas, Bertil Folliot and Frédéric Ogel. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'03, pages 1-12.  2003. [Abstract] [BibTeX] [.pdf] Le nombre de machines virtuelles Java dédiées à des domaines applicatifs particuliers ne cesse d'augmenter. Chacune de ces machines virtuelles modifie ou enrichit la sémantique de la machine virtuelle standard de Sun pour implanter des mécanismes dédiés. Ces mécanismes ne modifient pas fondamentalement la structure de cette machine virtuelle (ramasse-miettes, JIT, chargeur etc...).

    Nous proposons dans cet article une solution alternative : une machine virtuelle Java ouverte permettant à une application de spécifier précisément son environnement d'exécution. La partie fonctionnelle de l'application reste écrite en Java et les mécanismes non fonctionnels permettent de construire à la volée une machine virtuelle Java adaptée à l'application.
    @inproceedings{cfse:03:thomas:jnjvm,
      author = {Thomas, Gaël and Folliot, Bertil and Ogel, Frédéric},
      title = {Jnjvm : une plateforme Java adaptable pour applications actives},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'03},
      year = {2003},
      pages = {1--12}
    }
  13. Les Documents actifs basés sur une machine virtuelle. Gaël Thomas, Bertil Folliot and Ian Piumarta. In Proceedings of the 2002 Atelier journées des Jeunes Chercheurs en Systèmes, chapitre francais de l'ACM-SIGOPS, pages 441-447.  2002. [Abstract] [BibTeX] [.pdf] Les documents numériques et les réseaux permettent d'améliorer la qualité et la diffusion des documents, mais ces progrès entraînent de nouveaux probles : multimédia, cohérence entre répliquats, QoS, droits d'auteur... Ces difficultés peuvent être résolues par des normes et des protocoles pour des cas génériques mais ne peuvent pas l'être pour les cas particuliers. Dans cet article, nous présentons les documents actifs qui introduisent du code exécutable dans les documents pour permettre à chaque document de choisir les solutions adéquates à ses besoins.
    @inproceedings{asf:02:thomas:docactif,
      author = {Thomas, Gaël and Folliot, Bertil and Piumarta, Ian},
      title = {Les Documents actifs basés sur une machine virtuelle},
      booktitle = {Proceedings of the 2002 Atelier journées des Jeunes Chercheurs en Systèmes, chapitre francais de l'ACM-SIGOPS},
      year = {2002},
      pages = {441--447}
    }
  14. Protocole de membership hautement extensible : conception est expérimentations. Bertil Folliot and Gaël Thomas. In Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'01, pages 25-36.  2001. [Abstract] [BibTeX] [.pdf] La gestion du membership (connaissance des machines actives et détection des machines fautives) est un composant essentiel pour la plupart des serveurs basés sur des grappes de machines. La plupart des protocoles de membership existants sont faiblement extensibles avec le nombre de nœuds, limitant ainsi l'extensibilité du service. Cet article présente la conception d'un protocole de membership basé sur une structure en anneau à deux niveaux de multicast, extensible à plus de 1000 noeuds. Des expérimentations sur 70 machines simulant des grappes jusqu'à 1024 noeuds montrent pour de nombreux cas (démarrage à froid, noeuds fautifs, partitionnement du réseau) l'efficacité de ce protocole.
    @inproceedings{cfse:01:folliot:membership,
      author = {Folliot, Bertil and Thomas, Gaël},
      title = {Protocole de membership hautement extensible : conception est expérimentations},
      booktitle = {Proceedings of the Conférence Francaise en Systèmes d'Exploitation, CFSE'01},
      year = {2001},
      pages = {25--36}
    }

PhD and HDR ()

  1. Improving the design and the performance of managed runtime environments. Gaël Thomas. School: UPMC Sorbonne Université.  2012. [Abstract] [BibTeX] [.pdf] With the advent of the Web and the need to protect users against malicious applications, Managed Runtime Environments (MREs), such as Java or .Net virtual machines, have become the norm to execute programs. Over the last years, my research contributions have targeted three aspects of MREs: their design, their safety, and their performance on multicore hardware. My first contribution is VMKit, a library that eases the development of new efficient MREs by hiding their complexity in a set of reusable components. My second contribution is I-JVM, a Java virtual machine that eliminates the eight known vulnerabilities that a component of the OSGi framework was able to exploit. My third contribution targets the improvement of the performance of MREs on multicore hardware, focusing on the efficiency of locks and garbage collectors: with a new locking mechanism that outperforms all other known locking mechanisms when the number of cores increases, and with a study of the bottlenecks incurred by garbage collectors on multicore hardware. My research has been carried out in collaboration with seven PhD students, two of which having already defended. Building on these contributions, in a future work, I propose to explore the design of the next generation of MREs that will have to adapt the application at runtime to the actual multicore hardware on which it is executed.
    @phdthesis{hdr:12:thomas,
      author = {Thomas, Gaël},
      title = {Improving the design and the performance of managed runtime environments},
      school = {UPMC Sorbonne Université},
      year = {2012}
    }
  2. Applications actives : construction dynamique d'environnements d'exécution flexibles homogène. Gaël Thomas. School: Université Pierre et Marie Curie.  2005. [Abstract] [BibTeX] [.pdf] L'émergence de nouveaux domaines informatiques entraîne de nouveaux besoins en terme de mécanismes systèmes que les environnements traditionnels ne couvrent pas. Actuellement, il n'existe pas de solution pour introduire ces mécanismes sans introduire d'hétérogénéité entre les plate-formes d'exécution. Pour résoudre ce problème, nous proposons de placer le code spécialisé dans l'application et d'exécuter l'application, qui devient active, dans un environnement générique et standard.

    Cette architecture repose sur une plate-forme hautement adaptable développée pendant ces travaux, la micro machine virtuelle. Elle a été testée avec une machine virtuelle Java réflexive et adaptable appelée la JnJVM. Pour valider notre approche, trois spécialisations de la JnJVM ont été implantées. Elles construisent des JVM dédiées au tissage d'aspects, à la migration d'un fil d'exécution et à de l'analyse d'échappement.
    @phdthesis{thesis:05:thomas,
      author = {Thomas, Gaël},
      title = {Applications actives : construction dynamique d'environnements d'exécution flexibles homogène},
      school = {Université Pierre et Marie Curie},
      year = {2005}
    }

Other ()

  1. Assessing the scalability of garbage collectors on many cores. Lokesh Gidra, Gaël Thomas, Julien Sopena and Marc Shapiro. Best papers from PLOS '11, ACM SIGOPS Operating System Review (OSR). Vol. 45(3), pages 15-19.  2011. [Abstract] [BibTeX] [.pdf] Managed Runtime Environments (MRE) are increasingly used for application servers that use large multi-core hardware. We find that the garbage collector is critical for overall performance in this setting. We explore the costs and scalability of the garbage collectors on a contemporary 48-core multiprocessor machine. We present experimental evaluation of the parallel and concurrent garbage collectors present in OpenJDK, a widely-used Java virtual machine. We show that garbage collection represents a substantial amount of an application's execution time, and does not scale well as the number of cores increases. We attempt to identify some critical scalability bottlenecks for garbage collectors.
    @article{osr:11:gidra:gc,
      author = {Gidra, Lokesh and Thomas, Gaël and Sopena, Julien and Shapiro, Marc},
      title = {Assessing the scalability of garbage collectors on many cores},
      journal = { Best papers from PLOS~'11, ACM SIGOPS Operating System Review (OSR)},
      publisher = {ACM},
      year = {2011},
      volume = {45},
      number = {3},
      pages = {15--19}
    }
  2. Remote Core Locking (RCL): migration of critical section execution to improve performance. Jean-Pierre Lozi, Gaël Thomas, Julia Lawall and Gilles Muller. Poster at the EuroSys European Conference on Computer Systems, EuroSys '11.  2011. [BibTeX] [.pdf]
    @misc{poster-eurosys:11:lozi:rcl,
      author = {Lozi, Jean-Pierre and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Remote Core Locking (RCL): migration of critical section execution to improve performance},
      year = {2011}
    }
  3. Remote Core Locking: Migrating critical section execution to improve the performance of multithreaded applications. Jean-Pierre Lozi, Gaël Thomas, Julia Lawall and Gilles Muller. Work in progress at the Symposium on Operating Systems Principles, SOSP '11.  2011. [BibTeX]
    @misc{wip-sosp:11:lozi:rcl,
      author = {Lozi, Jean-Pierre and Thomas, Gaël and Lawall, Julia and Muller, Gilles},
      title = {Remote Core Locking: Migrating critical section execution to improve the performance of multithreaded applications},
      year = {2011}
    }
  4. VMKit: a substrate for virtual machines. Nicolas Geoffray, Gaël Thomas, Charles Clément, Bertil Folliot and Gilles Muller. Poster at the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '09.  2009. [BibTeX] [.pdf]
    @misc{poster-asplos:09:geoffray:vmkit,
      author = {Geoffray, Nicolas and Thomas, Gaël and Clément, Charles and Folliot, Bertil and Muller, Gilles},
      title = {VMKit: a substrate for virtual machines},
      year = {2009}
    }

Others