• Architecture/Technology. After several decades of regular progression of processor performance, energy constraints are forcing computer architectures towards heterogeneous multi-cores, composed of a mix of cores and accelerators. Accelerators can be highly energy efficient circuits because they are specialized. The obvious downside is that their application scope is limited. So finding accelerators which realize the best possible tradeoff between energy efficiency and application scope has become a key micro-architecture issue. Unfortunately, another technology constraint is rapidly piling up: transient and permanent faults. So we also need these accelerators to be fault-tolerant.
  • Applications. But high-performance applications themselves have been largely redefined in the past decade, with many of high-performance and embedded applications now largely based on machine-learning algorithms (e.g., see Intel call for a focus on Recognition, Mining and Synthesis applications).
  • Machine-Learning. And remarkably enough, at the same time, machine-learning algorithms have been evolving a lot since 2006 or so, with Deep Learning, i.e., Deep Neural Networks, becoming state-of-the-art algorithms across a broad set of applications.
  • And more (technology). I keep being amazed at how old and new technologies we have been struggling to apply to traditional architectures, such as analog circuits, 3D stacking or memristors, are beautifully suited to hardware neural networks.
  • Putting it all together. These remarkably simultaneous trends provide an almost unique opportunity: designing accelerators which can be highly energy efficient because they need to execute only a very limited set of algorithms (ASIC-like), but which would still be useful for a broad set of applications. And on top of it, the inherent robustness of neural networks can make such accelerators fault-tolerant.
    I gave a keynote on the topic at ISCA in 2010 entitled The Rebirth of Neural Networks; I added on the slides comments roughly corresponding to what I said; the content could be a bit outdated though.

How to proceed ? Our "roadmap" consists in progressively improving the application scope, machine-learning accuracy, energy efficiency, performance and fault tolerance of neural network accelerators. On the way, we create different designs which illustrate our progress. We implement all accelerators at least till the layout stage, and whenever possible funding-wise, we tape them out.

And so far ? We have shown the following things:
  • ASIC-like energy efficiency on a digital CMOS and an analog design.
  • Tolerance to permanent faults on both GPUs and a custom design.
  • Tolerance to transient faults.
  • That about half of the PARSEC benchmarks could benefit from an NN accelerator.
  • A small-footprint high-throughput accelerator for enabling state-of-the-art machine-learning in data centers or embedded systems (see our ASPLOS 2014 article which just got the best paper award);
  • We are taping out a 3D stacked NN to outline that 3D stacking might be a particularly suitable scalability path for neuromorphic architectures (see our CASES 2014 article).
  • And we have quite a few things in the works, so stay tuned...

NN accelerator for signal processing

Neural Networks again ? While hyped in the 1980s and 1990s, hardware neural networks fell out of favor, largely brought down by three factors: better machine-learning algorithms (e.g., SVM), small application scope in the era of scientific computing and the killer micro. Interestingly, on all three counts, the situation has drastically changed: NN algorithms are state-of-the-art again with Deep NNs, we mentioned above the changing application landscape, and the clock frequency has stalled, killing the killer micro.
Quite a few companies have taken note, as shown below.

Too far-fetched for industry ? Think again. There is no commercial product yet, but several major companies have recently started to work on (hardware) neural networks:

Does this have anything to do with neuroscience ? Partly, though probably not the way you think. The goal is not to develop hardware tools to run neuroscience models faster. The goal is to run existing machine-learning applications more efficiently on hardware. Still, we monitor the relationship between machine-learning and neuroscience models, and we grab from neuroscience models anything that can help us implement machine-learning tasks more efficiently, (cynically) discarding what appears to be functionally useless.
Consider the simple example on the right of a 1-layer spiking neural network learning MNIST digits (each cell is the receptive field of a neuron). This is not news, others have done the same. But we have shed bio-realism and obtained a very dense implementation of this functionality in hardware, better than the ones obtained using machine-learning algorithms, and at the expense of only a moderate loss of accuracy. This is where neuroscience comes in for us.
Finally, note that this application/efficiency-driven filtering process might turn out to be useful for neuroscience in the end.

Projects and Collaborations. I am blessed with awesome collaborators, spread in various institutions across the world; I mention just a few below, there are many others, see the publications list.
  • ICT, Beijing China: an intensive collaboration with Prof. Chengyong Wu and Prof. Yunji Chen; a few facts:
    • Set up a joint lab on accelerators for emerging high-performance applications; the official signature ceremony was held in Beijing on December 2012 (photos);
    • Got the 1000 Talent award.
    • Got two Senior Visiting Research Scientist Awards from the Chinese Academy of Sciences in 2009 and 2011 for the collaboration;
    • Zheng Li, who defended his PhD in December 2010, has obtained the Chinese government Award for Excellence in Research Outside China, attributed to PhDs conducted outside China; congratulations to Zheng !
  • Intel Collaborative Research Institute on Computational Intelligence: invited to participate in 2013; co-organized the Brain-Inspired Computing (BIC) workshop at ISCA 2013 with Daniel Ben-Dayan Rubin (Intel).
  • Google Faculty Research Award on "Exploring Neuro-Inspired Hardware Accelerators for Mobile and Data Center Processing" in 2013.
  • BenchNN/NNlib: a joint effort with Univ. Wisconsin, CEA (France), ICT (China) to develop a set of benchmarks based on NNs.
  • MHANN (with IMS Bordeaux, CNRS UMphi, Thales TRT): a memristor-based neural network accelerator (hybrid tape-out in 2014).
  • NEMESIS (with CEA LETI, Univ. Bourgogne, Univ. Toulouse): a 3D stacked neural network coupled with a vision sensor (tape out in 2014).
  • Arch2Neu (with CEA LETI): an accelerator for signal processing using analog neurons (tape out in 2012 and 2013); co-organized the workshop on "Neuro-Inspired Accelerators for Computing" (NIAC) at HiPEAC 2013 together with Rodolphe Heliot; Rodolphe also gave a keynote on Arch2neu at Computing Frontiers (CF) in 2011.