HYPERSCALE INFRASTRUCTURECOMS E6998, Dept of Computer Science, Columbia University
Home | Lectures | Projects

LECTURES
A tentative set of papers that we will cover is listed below, though the list may change based on the interests of the class. All students are required to read the papers before they are presented and will be graded based on apparent understanding of the material in the papers and contributions to class discussions on the papers. Students will be asked to explain various aspects of the papers during class as part of the discussions.


September 5 - Course Overview and Background

September 12 - Virtualization
September 19 - Orchestration

  • Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes, "Large-scale Cluster Management at Google with Borg", Proceedings of the 7th European Conference on Computer Systems (EuroSys), Bordeaux, France, April 2015.

  • Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, Ben Christensen, Alex Gartrell, Maxim Khutornenko, Sachin Kulkarni, Marcin Pawlowski, Tuomas Pelkonen, Andre Rodrigues, Rounak Tibrewal, Vaishnavi Venkatesan, and Peter Zhang, "Twine: A Unified Cluster Management System for Shared Infrastructure", Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Virtual, November 2020.
September 26 - Storage

  • Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel, "Finding a Needle in Haystack: Facebook's Photo Storage", Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Vancouver, BC Canada, October 2010.

  • Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, and Leonidas Rigas, "Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency", Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, October 2011.
October 3 - File Systems

  • Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, "The Google File System", Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP), Bolton Landing, NY USA, October 2003.

  • Satadru Pan, Theano Stavrinos, Yunqiao Zhang, Atul Sikaria, Pavel Zakharov, Abhinav Sharma, Shiva Shankar P, Mike Shuey, Richard Wareing, Monika Gangapuram, Guanglei Cao, Christian Preseau, Pratap Singh, Kestutis Patiejunas, JR Tipton, Ethan Katz-Bassett, and Wyatt Lloyd, "Facebook's Tectonic Filesystem: Efficiency from Exascale", Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST), Virtual, February 2021.
October 10 - Caching

  • Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, Venkateshwaran Venkataramani, "Scaling Memcache at Facebook", Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Lombard, IL USA, April 2013.

  • Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkat Venkataramani, "TAO: Facebook's Distributed Data Store for the Social Graph", Proceedings of the 2013 USENIX Annual Technical Conference (USENIX ATC), San Jose, CA USA, June 2013.
October 17 - Storage Engines
October 24 - Key-Value Stores

  • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels, "Dynamo: Amazon's Highly Available Key-value Store", Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP), Stevenson, WA USA, October 2007.

  • Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, "Bigtable: A Distributed Storage System for Structured Data", Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Seattle, WA USA, November 2006.
October 31 - Midterm Project Presentations
November 7 - Databases

  • James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, "Spanner: Google's Globally-Distributed Database", Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Hollywood, CA USA, October 2012.

  • Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao, "Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases", Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD), Chicago, IL USA, May 2017.
November 14 - Analytics and Logging

  • Yutian "James" Sun, Tim Meehan, Rebecca Schlussel, Wenlei Xie, Masha Basmanova, Orri Erling, Andrii Rosa, Shixuan Fan, Rongrong Zhong, Arun Thirupathi, Nikhil Collooru, Ke Wang, Sameer Agarwal, Arjun Gupta, Dionysios Logothetis, Kostas Xirogiannopoulos, Bin Fan, Amit Dutta, Varun Gajjala, Rohit Jain, Ajay Palakuzhy, Prithvi Pandian, Sergey Pershin, Abhisek Saikia, Pranjal Shankhdhar, Neerad Somanchi, Swapnil Tailor, Jialiang Tan, Sreeni Viswanadha, Zac Wen, Deepak Majeti, Aditi Pandit, and Biswapesh Chattopadhyay, "Presto: A Decade of SQL Analytics at Meta", Proceedings of the 2023 ACM International Conference on Management of Data (SIGMOD), Seattle, WA USA, June 2023.

  • Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin Meza, Justin Teller, Kaushik Veeraraghavan, "Gorilla: A Fast, Scalable, In-Memory Time Series Database", Proceedings of the VLDB Endowment (VLDB), Kohala Coast, HI USA, September 2015.
November 21 - Edge Computing

  • Petros Gigis, Matt Calder, Lefteris Manassakis, George Nomikos, Vasileios Kotronis, Xenofontas Dimitropoulos, Ethan Katz-Bassett, and Georgios Smaragdakis, "Seven Years in the Life of Hypergiants' Off-Nets", Proceedings of the ACM SIGCOMM 2021 Conference (SIGCOMM), Virtual, August 2021.
November 28 - No class
December 5 - Final Project Presentations