PhD Candidacy Exam
- Title: Architectural Strategies and Scheduling Algorithms for Enhanced
Accelerator Utilization in Multitenant AI Workloads
- Committee: Luca Carloni, Kenneth Shepard, Martha Kim
-
Time and date: 1:00pm-3:00pm, Thursday, January 30, 2025
-
Location: CSB 453 (CS Conference Room).
State-of-the-Art in AI Hardware Acceleration
- V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “
Efficient
processing of deep neural networks: A tutorial and survey
”, Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017
- N. P. Jouppi, C. Young, N. Patil, and D. Patterson, “A domain-specific architecture for deep neural networks”, Commun. ACM, vol. 61, no. 9, 50–59, Aug. 2018.
- Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “Shidiannao: Shifting vision processing closer to the sensor,” in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), 2015.
Compute-Centric Multitenancy Techniques
- Y. Choi and M. Rhu, “
PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units
” in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 220–233.
- S. Ghodrati, B. H. Ahn, J. Kyung Kim, S. Kinzer, B. R. Yatham, N. Alla, H. Sharma, M. Alian,E. Ebrahimi, N. S. Kim, C. Young, and H. Esmaeilzadeh, “
Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks
” in Proceedings. 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 681–697.
- H. Kwon, L. Lai, M. Pellauer, T. Krishna, Y.-H. Chen, and V. Chandra, “
Heterogeneous Dataflow Accelerators for Multi-DNN Workloads
” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 71–83.
- F. G. Blanco, E. Russo, M. Palesi, D. Patti, G. Ascia, and V. Catania, “
A Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems
” in Proceedings of the 61st ACM/IEEE Design Automation Conference (DAC), 2024.
- S. Kim, H. Kwon, J. Song, J. Jo, Y.-H. Chen, L. Lai, and V. Chandra, “
DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads
” in Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023, 73–86.
- H. Fan, S. I. Venieris, A. Kouris, and N. Lane, “
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads
” in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, 353–366.
- P. Subedi, J. Hao, I. K. Kim, and L. Ramaswamy, “
AI Multi-Tenancy on Edge: Concurrent Deep Learning Model Executions and Dynamic Model Placements on Edge Devices
” in 2021 IEEE 14th International Conference on Cloud Computing (CLOUD), 2021, pp. 31–42.
- M. Odema, L. Chen, H. Kwon, and M. A. Al Faruque, “
S
CAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
” in Proceedings. 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2024, pp. 565–579
- Q. Wang, W. Fang, L. Qian, Y. Chen, and N. N. Xiong, “
An Intelligent Co-Scheduling Framework for Efficient Super-Resolution on Edge Platforms With Heterogeneous Processors
” IEEE Internet of Things Journal, vol. 11, no. 10, pp. 17 651–17 662, 2024.
- J. Choi, Y. Ha, J. Lee, S. Lee, J. Lee, H. Jang, and Y. Kim, “Enabling Fine-Grained Spatial Multitasking on Systolic-Array NPUs Using Dataflow Mirroring” IEEE Transactions on Computers,vol. 72, no. 12, pp. 3383–3398, 2023.
Memory-Centric Multitenancy Techniques
- E. Baek, D. Kwon, and J. Kim, “
A multi-neural network acceleration architecture
” in Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020, pp. 940–953.
- Y. H. Oh, S. Kim, Y. Jin, S. Son, J. Bae, J. Lee, Y. Park, D. U. Kim, T. J. Ham, and J. W.
Lee, “
Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling
” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 584–597.
- S.-C. Kao and T. Krishna, “
MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores
” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022, pp. 814–830.
- Z. Liu, J. Leng, Z. Zhang, Q. Chen, C. Li, and M. Guo, “
VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling
” in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022, 388–401.
- S. Kim, H. Genc, V. V. Nikiforov, K. Asanovi´c, B. Nikoli´c, and Y. S. Shao, “
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 828–841.
- S. Zeng, G. Dai, N. Zhang, X. Yang, H. Zhang, Z. Zhu, H. Yang, and Y. Wang, “
Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective
”IEEE Transactions on Computers, vol. 72, no. 5, pp. 1314–1328, 2023.
- Q. Liang, W. A. Hanafy, N. Bashir, A. Ali-Eldin, D. Irwin, and P. Shenoy, “
Delen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI
” in Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation, 2023, 209–221.
Communication-aware Multitenancy Techinques
- J. S. Jeong, J. Lee, D. Kim, C. Jeon, C. Jeong, Y. Lee, and B.-G. Chun, “
Band: coordinated multi-DNN inference on heterogeneous mobile processors
” in Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (MobiSys), 2022, 235–247.
- S. Kim, J. Zhao, K. Asanovic, B. Nikolic, and Y. S. Shao, “
Aurora: Virtualized accelerator orchestration for multi-tenant workloads
” in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, 62–76.
- J. Davis and M. E. Belviranli, “
Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems
” in 2024 Design, Automation Test in Europe Conference Exhibition (DATE), 2024, pp. 1–6.
- X. Zhang, C. Hao, P. Zhou, A. Jones, and J. Hu, “
H2H: heterogeneous model to heterogeneous system mapping with computation and communication awareness
” in Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC), 2022.
- J. Zhang, X. Wang, Y. Ye, D. Lyu, G. Xiong, N. Xu, Y. Lian, and G. He, “
M2M: A Fine-Grained Mapping Framework to Accelerate Multiple DNNs on a Multi-Chiplet Architecture
” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 32, no. 10, pp. 1864–1877, 2024.