- Architecture/Technology. After several decades of regular
progression of processor performance, energy constraints are forcing
computer architectures towards heterogeneous multi-cores, composed of
a mix of cores and accelerators. Accelerators can be highly energy
efficient circuits because they are specialized. The obvious
downside is that their application scope is limited. So finding
accelerators which realize the best possible tradeoff between energy
efficiency and application scope has become a key micro-architecture
issue. Unfortunately, another technology constraint is rapidly piling
up: transient and permanent faults. So we also need these accelerators
to be fault-tolerant.
- Applications. But high-performance applications themselves have
been largely redefined in the past decade, with many of
high-performance and embedded applications now largely based on
machine-learning algorithms (e.g., see Intel call for a focus
Mining and Synthesis applications).
Machine-Learning. And remarkably enough, at the same time,
machine-learning algorithms have been evolving a lot since 2006 or so,
with Deep Learning, i.e., Deep Neural Networks, becoming
state-of-the-art algorithms across a broad set of applications.
And more (technology). I keep being amazed at how old and new
technologies we have been struggling to apply to traditional
architectures, such as analog circuits, 3D stacking or memristors, are
beautifully suited to hardware neural networks.
Putting it all together. These remarkably simultaneous trends
provide an almost unique opportunity: designing accelerators which
can be highly energy efficient because they need to execute only a very
limited set of algorithms (ASIC-like), but which would still be useful
for a broad set of applications. And on top of it, the inherent
robustness of neural networks can make such accelerators fault-tolerant.
I gave a keynote on the topic at ISCA in 2010 entitled The Rebirth
of Neural Networks; I added on the slides comments roughly
corresponding to what I said; the content could be a bit outdated though.
How to proceed ? Our "roadmap" consists in progressively
improving the application scope, machine-learning accuracy,
energy efficiency, performance and fault tolerance of neural network
accelerators. On the way, we create different designs which
illustrate our progress. We implement all accelerators at least till
the layout stage, and whenever possible funding-wise, we tape them
And so far ? We have shown the following things:
- ASIC-like energy efficiency on a digital CMOS and an analog design.
- Tolerance to permanent faults on both GPUs and a custom design.
- Tolerance to transient faults.
- That about half of the PARSEC benchmarks could benefit from an
- A small-footprint high-throughput accelerator for enabling state-of-the-art machine-learning in data centers
or embedded systems (see our ASPLOS 2014 article which just got the best paper award);
- We are taping out a 3D stacked NN to outline that 3D stacking might be a particularly suitable scalability path for neuromorphic architectures (see our CASES 2014 article).
- And we have quite a few things in the works, so stay tuned...
NN accelerator for signal processing
Neural Networks again ? While hyped in the 1980s and
1990s, hardware neural networks fell out of favor, largely brought
down by three factors: better machine-learning algorithms (e.g., SVM),
small application scope in the era of scientific computing and the killer
micro. Interestingly, on all three counts, the situation has
drastically changed: NN algorithms are state-of-the-art again with
Deep NNs, we mentioned above the changing application landscape, and
the clock frequency has stalled, killing the killer micro.
Quite a few companies have taken note, as shown below.
Too far-fetched for industry ? Think again. There is no commercial product yet, but
several major companies have recently started to work on (hardware) neural networks:
Does this have anything to do with neuroscience ? Partly,
though probably not the way you think. The goal is not to develop
hardware tools to run neuroscience models faster. The
goal is to run existing machine-learning applications more
efficiently on hardware. Still, we monitor the relationship between
machine-learning and neuroscience models, and we grab from
neuroscience models anything that can help us implement
machine-learning tasks more efficiently, (cynically) discarding
what appears to be functionally useless.
Consider the simple example on the right of a 1-layer spiking neural
network learning MNIST digits (each cell is the receptive field of a
neuron). This is not news, others have done the same. But we have
shed bio-realism and obtained a very dense implementation of this
functionality in hardware, better than the ones obtained using
machine-learning algorithms, and at the expense of only a moderate
loss of accuracy. This is where neuroscience comes in for us.
Finally, note that this application/efficiency-driven filtering
process might turn out to be useful for neuroscience in the end.
Projects and Collaborations. I am blessed with awesome
collaborators, spread in various institutions across the world; I
mention just a few below, there are many others, see the publications
ICT, Beijing China: an intensive collaboration with Prof. Chengyong Wu and Prof. Yunji Chen; a few facts:
Intel Collaborative Research Institute on Computational
Intelligence: invited to participate in 2013; co-organized the Brain-Inspired
Computing (BIC) workshop at ISCA 2013 with Daniel Ben-Dayan Rubin
Google Faculty Research Award on "Exploring Neuro-Inspired
Hardware Accelerators for Mobile and Data Center Processing" in
BenchNN/NNlib: a joint effort
with Univ. Wisconsin, CEA (France), ICT (China) to develop a set of
benchmarks based on NNs.
MHANN (with IMS Bordeaux, CNRS UMphi, Thales TRT):
a memristor-based neural network accelerator (hybrid tape-out in 2014).
NEMESIS (with CEA LETI, Univ. Bourgogne, Univ. Toulouse): a 3D
stacked neural network coupled with a vision sensor (tape out in
Arch2Neu (with CEA
LETI): an accelerator for signal processing using analog neurons (tape
out in 2012 and 2013);
co-organized the workshop on "Neuro-Inspired
Accelerators for Computing" (NIAC) at HiPEAC 2013 together with
Rodolphe Heliot; Rodolphe also gave a keynote on Arch2neu at Computing
Frontiers (CF) in 2011.
- Set up a joint lab on accelerators for emerging high-performance applications; the official signature ceremony was held in Beijing on December 2012 (photos);
- Got the 1000 Talent award.
- Got two
Senior Visiting Research Scientist Awards from the Chinese Academy of Sciences in 2009 and 2011 for the collaboration;
- Zheng Li, who defended his PhD in December 2010, has obtained the Chinese government Award for Excellence in Research Outside China, attributed to PhDs conducted outside China; congratulations to Zheng !