Candidacy

State-of-the-Art in AI Hardware Acceleration

V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “ Efficient processing of deep neural networks: A tutorial and survey ”, Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017
N. P. Jouppi, C. Young, N. Patil, and D. Patterson, “A domain-specific architecture for deep neural networks”, Commun. ACM, vol. 61, no. 9, 50–59, Aug. 2018.
Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “Shidiannao: Shifting vision processing closer to the sensor,” in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), 2015.

Compute-Centric Multitenancy Techniques

Y. Choi and M. Rhu, “ PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units ” in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 220–233.
S. Ghodrati, B. H. Ahn, J. Kyung Kim, S. Kinzer, B. R. Yatham, N. Alla, H. Sharma, M. Alian,E. Ebrahimi, N. S. Kim, C. Young, and H. Esmaeilzadeh, “ Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks ” in Proceedings. 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 681–697.
H. Kwon, L. Lai, M. Pellauer, T. Krishna, Y.-H. Chen, and V. Chandra, “ Heterogeneous Dataflow Accelerators for Multi-DNN Workloads ” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 71–83.
F. G. Blanco, E. Russo, M. Palesi, D. Patti, G. Ascia, and V. Catania, “ A Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems ” in Proceedings of the 61st ACM/IEEE Design Automation Conference (DAC), 2024.
S. Kim, H. Kwon, J. Song, J. Jo, Y.-H. Chen, L. Lai, and V. Chandra, “ DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads ” in Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023, 73–86.
H. Fan, S. I. Venieris, A. Kouris, and N. Lane, “ Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads ” in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, 353–366.
P. Subedi, J. Hao, I. K. Kim, and L. Ramaswamy, “ AI Multi-Tenancy on Edge: Concurrent Deep Learning Model Executions and Dynamic Model Placements on Edge Devices ” in 2021 IEEE 14th International Conference on Cloud Computing (CLOUD), 2021, pp. 31–42.
M. Odema, L. Chen, H. Kwon, and M. A. Al Faruque, “ S CAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators ” in Proceedings. 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2024, pp. 565–579
Q. Wang, W. Fang, L. Qian, Y. Chen, and N. N. Xiong, “ An Intelligent Co-Scheduling Framework for Efficient Super-Resolution on Edge Platforms With Heterogeneous Processors ” IEEE Internet of Things Journal, vol. 11, no. 10, pp. 17 651–17 662, 2024.
J. Choi, Y. Ha, J. Lee, S. Lee, J. Lee, H. Jang, and Y. Kim, “Enabling Fine-Grained Spatial Multitasking on Systolic-Array NPUs Using Dataflow Mirroring” IEEE Transactions on Computers,vol. 72, no. 12, pp. 3383–3398, 2023.

Memory-Centric Multitenancy Techniques

E. Baek, D. Kwon, and J. Kim, “ A multi-neural network acceleration architecture ” in Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020, pp. 940–953.
Y. H. Oh, S. Kim, Y. Jin, S. Son, J. Bae, J. Lee, Y. Park, D. U. Kim, T. J. Ham, and J. W.
Lee, “ Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling ” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 584–597.
S.-C. Kao and T. Krishna, “ MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores ” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022, pp. 814–830.
Z. Liu, J. Leng, Z. Zhang, Q. Chen, C. Li, and M. Guo, “ VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling ” in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022, 388–401.
S. Kim, H. Genc, V. V. Nikiforov, K. Asanovi´c, B. Nikoli´c, and Y. S. Shao, “ MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks ” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 828–841.
S. Zeng, G. Dai, N. Zhang, X. Yang, H. Zhang, Z. Zhu, H. Yang, and Y. Wang, “ Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective ”IEEE Transactions on Computers, vol. 72, no. 5, pp. 1314–1328, 2023.
Q. Liang, W. A. Hanafy, N. Bashir, A. Ali-Eldin, D. Irwin, and P. Shenoy, “ Delen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI ” in Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation, 2023, 209–221.

Communication-aware Multitenancy Techinques

J. S. Jeong, J. Lee, D. Kim, C. Jeon, C. Jeong, Y. Lee, and B.-G. Chun, “ Band: coordinated multi-DNN inference on heterogeneous mobile processors ” in Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (MobiSys), 2022, 235–247.
S. Kim, J. Zhao, K. Asanovic, B. Nikolic, and Y. S. Shao, “ Aurora: Virtualized accelerator orchestration for multi-tenant workloads ” in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, 62–76.
J. Davis and M. E. Belviranli, “ Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems ” in 2024 Design, Automation Test in Europe Conference Exhibition (DATE), 2024, pp. 1–6.
X. Zhang, C. Hao, P. Zhou, A. Jones, and J. Hu, “ H2H: heterogeneous model to heterogeneous system mapping with computation and communication awareness ” in Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC), 2022.
J. Zhang, X. Wang, Y. Ye, D. Lyu, G. Xiong, N. Xu, Y. Lian, and G. He, “ M2M: A Fine-Grained Mapping Framework to Accelerate Multiple DNNs on a Multi-Chiplet Architecture ” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 32, no. 10, pp. 1864–1877, 2024.

PhD Candidacy Exam

State-of-the-Art in AI Hardware Acceleration

Compute-Centric Multitenancy Techniques

Memory-Centric Multitenancy Techniques

Communication-aware Multitenancy Techinques