Imagine some product team inside Google wants 100,000 CPU cores + RAM + flash + accelerators + disk in a couple of months. We need to decide where to put them, when; whether to deploy new machines, or re-purpose/reconfigure old ones; ensure we have enough power, cooling, networking, physical racks, data centers and (over longer a time-frame) wind power; cope with variances in delivery times from supply logistics hiccups; do multi-year cost-optimal placement+decisions in the face of literally thousands of different machine configurations; keep track of parts; schedule repairs, upgrades, and installations; and generally make all this happen behind the scenes at minimum cost.
And then after breakfast, we get to dynamically allocate resources (on the small-minutes timescale) to the product groups that need them most urgently, accurately reflecting the cost (opex/capex) of all the machines and infrastructure we just deployed, and monitoring and controlling the datacenter power and cooling systems to achieve minimum overheads – even as we replace all of these on the fly.
This talk will highlight some of the exciting problems we’re working on inside Google to ensure we can supply the needs of an organization that is experiencing (literally) exponential growth in computing capacity.
John Wilkes has been at Google since 2008, where he is working on automation for building warehouse scale computers, with a current focus on delivering network capacity. Before this, he worked on cluster management for Google’s compute infrastructure (Borg, Omega, Kubernetes). He is interested in far too many aspects of distributed systems, but a recurring theme has been technologies that allow systems to manage themselves.
He received a PhD in computer science from the University of Cambridge, joined HP Labs in 1982, and was elected an HP Fellow and an ACM Fellow in 2002 for his work on storage system design. Along the way, he’s been program committee chair for SOSP, FAST, EuroSys and HotCloud, and has served on the steering committees for EuroSys, FAST, SoCC and HotCloud. He’s listed as an inventor on 50+ US patents, and has an adjunct faculty appointment at Carnegie-Mellon University. In his spare time he continues, stubbornly, trying to learn how to blow glass.
In this talk, we discuss the storage order guarantees in modern IO stack design. Preserving the storage order has long been the holy grail in storage system design. Most applications require that certain blocks are made durable in order. Journaling filesystems require that the journal transactions are made durable in order. Soft-update requires that the metadata blocks are made durable in an order satisfying certain ordering dependencies among the metadata blocks.
Filesystems have been using the highly expensive transfer-and-flush paradigm to ensure the storage order among a set of blocks. One of the most critical issues in modern IO stack design is how to mitigate the overhead of the transfer-and-flush based order preserving mechanism, whose importance cannot be emphasized enough.
For the past few decades, a fair amount of work has been proposed to mitigate or eliminate the overhead of storage order guarantee both in hardware design as well as in software design. The work includes using SuperCap based writeback caches for SSDs, using the NO_BARRIER filesystem option, including a checksum in the journal commit block, and more. Recent proposals for a multi-queue block device interface, e.g., NVMe and the Zoned Name Space are for addressing the transfer-and-flush overhead. We will briefly examine each of these techniques and discuss their pros and cons. Finally, we set forth a few research issues for mitigating the transfer-and-flush overhead in modern ULL (ultra low latency) storage devices.
Prof. Youjip Won is ICT Endowed Chair Professor at School of Electrical Engineering, KAIST. Prof. Youjip Won is known for his work on the Android IO stack optimization, filesystem and block layer design for SSD and NVRAM. His research interests include Operating System, Distributed System, Storage System and Software support for byte-addressable NVRAM.
13 years ago, seL4 became the first OS kernel with a proof of implementation correctness, followed by proofs extending the correctness to the binary code, as well as proofs of security enforcement. This triggered much research activity on the application of formal methods to systems code, including proofs of safety properties of file systems, communication protocols. Formal methods are now also heavily used used in industry.
However, to the best of my knowledge, there still is no non-trivial system that is trustworthy in a strong sense, in that its complete trusted computing base (TCB) – at least the software part of it – is verified and proved to enforce a security policy. In the talk I will give an overview of seL4 and its verification story and look at some existing deployments in critical systems. I will then discuss our current activities for extending trustworthiness to the rest of the TCB. This covers verifiable OS designs, as well as research on reducing the verification effort of TCB components.
Day 1: Monday, June 13
09:00 Welcome and registration
09:40 Opening session
10:00 Keynote #1: Building warehouse-scale computers, John Wilkes, Principal Software Engineer, Google
11:20 Session A: Performance
- FaaS in the Age of (sub-)μs I/O: A Performance Analysis of Snapshotting
- Eliminate the Overhead of Interrupt Checking in Full-System Dynamic Binary Translator
- Highlight: Unikraft: Fast, Specialized Unikernels the Easy Way (EuroSys 2021)
14:00 Session B: Applied Scheduling
- Overflowing Emerging Neural Network Inference Tasks from the GPU to the CPU on Heterogeneous Servers
- Efficient Sharing of Linked DMA Channels on Multi-Sensor Devices by LDMA Task Scheduler
- Highlight: PaSh: Light-touch Data-Parallel Shell Processing (EuroSys 2021)
15:35 Session C: Accelerators
- Bulk JPEG Decoding on In-Memory Processors
- TACC: A Secure Accelerator Enclave for AI Workloads
- Highlight: FlexDriver: A Network Driver for Your Accelerator (ASPLOS 2022)
16:50 Poster session and reception
19:00 End of day
Day 2: Tuesday, June 14
09:00 Welcome and registration
09:30 Keynote #2: Bring an order to the chaos: Preserving the order in the modern IO stack, Youjip Won, Korea Advanced Institute of Science and Technology (KAIST)
10:50 Session D: Storage Internals
- O-AFA : Order Preserving All Flash Array
- Fantastic SSD Internals and How to Learn and Use Them
- Highlight: FragPicker: A New Defragmentation Tool for Modern Storage Devices (SOSP 2021)
- Instant Data Sanitization on Multi-Level-Cell NAND Flash Memory
12:30 Boxed Lunch
13:00 Social event: Tzipori Tour
18:00 End of day
Day 3: Wednesday, June 15
09:00 Welcome and registration
09:30 Keynote #3: Can we make trustworthy systems a reality?, Gernot Heiser, UNSW Sydney
10:50 Session E: Storage “Externals”
- I/O Interface Independence with xNVMe
- Understanding Modern Storage APIs: A systematic study of libaio, SPDK, and io_uring
- Highlight: PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy (OSDI 2020)
12:25 Session F: Data Redundancy
- Dedup-for-Speed: Storing Duplications in Fast Flash Mode for Enhanced Read Performance
- Highlight: The what, The from, and The to: The Migration Games in Deduplicated Systems (FAST 2022)
13:15 Closing remarks
14:30 Meetup welcome and registration
15:00 Meetup: Specialized Systems: Blending Hardware and Software
Meetup co-located with SYSTOR 2022. Talks by Prof. Mark Silberstein (Technion) and Edward Bortnikov, Ph.D (VP Technology for Pliops).
Participation is free.
Registration and more info: here
17:00 End of day
Session A: Performance
Session Chair: Roy Friedman (Technion)
FaaS in the Age of (sub-)μs I/O: A Performance Analysis of Snapshotting
Christos Katsakioris, Chloe Alverti (National Technical University of Athens), Vasileios Karakostas (University of Athens), Konstantinos Nikas, Georgios Goumas, Nectarios Koziris (National Technical University of Athens)
Eliminate the Overhead of Interrupt Checking in Full-System Dynamic Binary Translator
Gen Niu, Fuxin Zhang, Xinyu Li (Institute of Computing Technology, Chinese Academy Of Sciences)
Highlight: Unikraft: Fast, Specialized Unikernels the Easy Way (EuroSys 2021)
Simon Kuenzer (NEC Laboratories Europe GmbH), Vlad-Andrei Bădoiu (University Politehnica of Bucharest), Hugo Lefeuvre (The University of Manchester), Sharan Santhanam (NEC Laboratories Europe GmbH), Alexander Jung (Lancaster University), Gaulthier Gain (University of Liège), Cyril Soldani (University of Liège), Costin Lupu (University Politehnica of Bucharest), Stefan Teodorescu (University Politehnica of Bucharest), Costi Răducanu (University Politehnica of Bucharest), Cristian Banu (University Politehnica of Bucharest), Laurent Mathy (University of Liège), Răzvan Deaconescu (University Politehnica of Bucharest), Costin Raiciu (University Politehnica of Bucharest), Felipe Huici (NEC Laboratories Europe GmbH)
Session B: Applied Scheduling
Session Chair: Orna Agmon Ben-Yehuda (CRI, University of Haifa)
Overflowing Emerging Neural Network Inference Tasks from the GPU to the CPU on Heterogeneous Servers
Adithya Kumar, Anand Sivasubramaniam, Timothy Zhu (The Pennsylvania State University)
Efficient Sharing of Linked DMA Channels on Multi-Sensor Devices by LDMA Task Scheduler
You Ren Shen, BO YAN HUANG, Chang Lin Shih, Pai H. Chou (National Tsing Hua University)
Highlight: PaSh: Light-touch Data-Parallel Shell Processing (EuroSys 2021)
Nikos Vasilakis (MIT), Konstantinos Kallas (University of Pennsylvania), Konstantinos Mamouras (Rice University), Achilles Benetopoulos (UC Santa Cruz), Lazar Cvetković (ETH Zurich)
Session C: Accelerators
Session Chair: Gala Yadgar (Technion)
Bulk JPEG Decoding on In-Memory Processors
Joel Nider, Jackson Dagger, Niloo Gharavi, Daniel Ng, Alexandra (Sasha) Fedorova (University of British Columbia)
TACC: A Secure Accelerator Enclave for AI Workloads
Jianping Zhu, Rui Hou, Dan Meng (Institute of Information Engineering, Chinese Academy of Sciences)
Highlight: FlexDriver: A Network Driver for Your Accelerator (ASPLOS 2022)
Haggai Eran (NVIDIA & Technion), Maxim Fudim (NVIDIA), Gabi Malka (Technion), Gal Shalom (NVIDIA & Technion), Noam Cohen (NVIDIA), Amit Hermony (NVIDIA), Dotan Levi (NVIDIA), Liran Liss (NVIDIA), Mark Silberstein (Technion)
Session D: Storage Internals
Session Chair: Aviad Zuck
O-AFA : Order Preserving All Flash Array
Seung Won Yoo, Joontaek Oh, Youjip Won (KAIST (Korea Advanced Institute of Science and Technology)
Fantastic SSD Internals and How to Learn and Use Them
Nanqinqin Li (University of Chicago and Princeton University); Mingzhe Hao (University of Chicago); Huaicheng Li (University of Chicago and Carnegie Mellon University); Xing Lin, Tim Emami (NetApp); Haryadi S. Gunawi (University of Chicago)
Highlight: FragPicker: A New Defragmentation Tool for Modern Storage Devices (SOSP 2021)
Jonggyu Park (Sungkyunkwan University), Young Ik Eom (Dept. of Electrical and Computer Engineering / College of Computing and Informatics, Sungkyunkwan University)
Instant Data Sanitization on Multi-Level-Cell NAND Flash Memory
Md Raquibuzzaman, Matchima Buddhanoy, Aleksandar Milenkovic, Biswajit Ray (The University of Alabama in Huntsville)
Session E: Storage “Externals”
Session Chair: Youjip Won (Korea Advanced Institute of Science and Technology (KAIST))
I/O Interface Independence with xNVMe
Simon A. F. Lund (Samsung), Philippe Bonnet (IT University of Copenhagen), Klaus Jensen, Javier Gonzalez (Samsung)
Understanding Modern Storage APIs: A systematic study of libaio, SPDK, and io_uring
Diego Didona, Jonas Pfefferle, Nikolas Ioannou, Bernard Metzler (IBM Research Zurich); Animesh Trivedi (VU Amsterdam)
Highlight: PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy (OSDI 2020)
Saurabh Kadekodi, Francisco Maturana, Suhas Jayaram Subramanya, Juncheng Yang, K. V. Rashmi, and Gregory R. Ganger. (CMU)
Session F: Data Redundancy
Session Chair: Danny Harnik (IBM Research)
Dedup-for-Speed: Storing Duplications in Fast Flash Mode for Enhanced Read Performance
Jaeyong Bae, Jaehyung Park, Yuhun Jun, Euiseong Seo (Sungkyunkwan University)
Highlight: The what, The from, and The to: The Migration Games in Deduplicated Systems (FAST 2022)
Roei Kisous (Technion – Israel Institute of Technology), Ariel Kolikant (Technion – Israel Institute of Technology), Abhinav Duggal (DELL EMC), Sarai Sheinvald (ORT Braude College of Engineering), Gala Yadgar (Technion – Israel Institute of Technology)