
Interstellar: using Halide’s scheduling language to analyze DNN accelerators. 24th International Conference on Architectural Support for Programming Languages and Operating Systems ( ASPLOS) 807–820 (ACM, 2019). Tangram: optimized coarse-grained dataflow for scalable NN accelerators. 22nd International Conference on Architectural Support for Programming Languages and Operating Systems ( ASPLOS) 751–764 (ACM, 2017). TETRIS: scalable and efficient neural network acceleration with 3D memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (2017). Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. Wafer-scale deep learning (Hot Chips 2019 Presentation) (Cerebras, 2019).Ĭhen, Y. International Symposium on Computer Architecture ( ISCA) 1–12 (ACM, 2017). In-datacenter performance analysis of a tensor processing unit.

Error bounds for approximations with deep ReLU networks. Memory that never forgets: emerging nonvolatile memory and the implication for architecture design. International Symposium on High Performance Computer Architecture ( HPCA) 331–344 (IEEE, 2019). Machine learning at Facebook: understanding inference at the edge. Scaling for edge inference of deep neural networks. Deep learning scaling is predictable, empirically. 56th Design Automation Conference ( DAC) (IEEE, 2019). On-chip memory technology design space explorations for mobile deep neural network accelerators. 55th Design Automation Conference ( DAC) (IEEE, 2018). On-chip deep neural network storage with multi-level eNVM. The N3XT approach to energy-efficient abundant-data computing. Energy-efficient abundant-data computing: the N3XT 1,000. Detailed simulations also show that our hardware results could be scaled to 64-chip Illusion systems.Īly, M. Our approach is tailored for on-chip non-volatile memory with resilience to permanent write failures, but is applicable to several memory technologies. Illusion is flexible and configurable, achieving near-ideal energy and execution times for a wide variety of DNN types and sizes. An eight-chip Illusion system hardware achieves energy and execution times within 3.5% and 2.5%, respectively, of an ideal single chip with no off-chip memory. Here, we report a DNN inference system-termed Illusion-that consists of networked computing chips, each of which contains a certain minimal amount of local on-chip memory and mechanisms for quick wakeup and shutdown. Fitting entire DNNs in on-chip memory is challenging due, in particular, to the physical size of the technology. Such off-chip memory accesses incur considerable costs in terms of energy and execution time.

Hardware for deep neural network (DNN) inference often suffers from insufficient on-chip memory, thus requiring accesses to separate memory-only chips.
