Accepted Posters

We are delighted to announce the posters that have been accepted to ACM SYSTOR’22.

Integrity Verification in Cloud Key-Value Stores

Grisha Weintraub, Leonid Rise, Alon Kadosh (IBM Security)

Abstract

Database-as-a-Service (DBaaS) is a common approach for storing data in the cloud. However, this approach introduces several concerns regarding data integrity. As a client does not maintain the data, its completeness and correctness might be compromised by the cloud provider or a malicious entity that penetrated the cloud. We are introducing a novel method for verifying data integrity in cloud databases with a focus on key-value stores.

Poster

Selective scrubbing based on algorithmic randomness

Rahul Vishwakarma, Peter Gatsby, Jinha Hwang (California State University Long Beach), Bing Liu (Dell Technologies)

Abstract

Disk scrubbing is a background process ton fix read errors by reading the disks (e.g., hard disk drive and solid-state drive). The best practice is intelligently scheduling the scrubbing process for the entire disks in the storage array. However, running scrubbing for all the disks can significantly increase the system load and degrade system performance when there is high incoming IO on the system. Deciding “which disk to scrub” complemented with “when to scrub” can significantly improve the overall reliability of the storage system along with the resource and power saving of the data center. We present a solution on an open-source SMART dataset that performs selective scrubbing and designs a scrub frequency based on the scrub cycle. The method leverages an algorithmic randomness framework to quantify the health of the concerned drives and ranks them for selective scrubbing.

Poster

An End-to-end Framework for Privacy Risk Assessment of AI Models

Abigail Goldsteen, Shlomit Shachor, Natalia Raznikov (IBM Research)

Abstract

We present a first-of-a-kind end-to-end framework for running privacy risk assessments of AI models that enables assessing models from multiple ML frameworks, using a variety of low-level privacy attacks and metrics. The tool automatically selects which attacks and metrics to run based on answers to questions, runs the attacks, summarizes and visualizes the results in an easy-to-consume manner.

Poster

Implementing secure, policy-driven data access in a cloud environment

Eliot Salant (IBM Research), Klaas Baert (VRT Innovatie)

Abstract

The last several years have given rise to an increased emphasis on the protection of personal digital data. Businesses realize the value that their data hold – if they can provide the right set of data for a specified purpose to a specified user entity according to policy. For example, the European Union’s General Data Protection Regulation (GDPR) went into effect in 2018, and strictly defines rules on the access, storage and transfer of personal data.

Based on the world of video production, we demonstrate how we can extended the Open Source Fybrik platform (https://fybrik.io/) to create a secure workflow between an authenticated data requester and a backend data store, allowing for on-the-fly data redaction based on policy rules and runtime evaluation of requester attributes and context.

Poster

pmAddr – a Persistent Memory Centric Computing Architecture

Shai Taharlev, Amit Golander, Yigal Korman (TogaNetworks/Huawei)

Abstract

Memory-centric architectures provide extreme low latency to a shared memory pool. They should also be elastic, reliable, load balanced, cheaper than DRAM, thinly provisioned and support multi-tenancy. We present pmAddr, a persistent memory-centric solution with the above features, that serves random access reads and writes at 4us and 10us respectively. This is 2-3 orders-of-magnitude lower latency compared to modern storage, and proof that PM-centric processing is possible even today, using 2021 off-the-shelf hardware.

Poster

Sharp Behavioral Changes in Preemptible Instance Pricing

Danielle Movsowitz Davidow, Orna Agmon Ben-Yehuda, Orr Dunkelman (University of Haifa)

Abstract

Alibaba cloud was the second cloud provider to offer preemptible (spot) instances, and yet their price traces have never been analyzed. We analyzed thousands of prices traces collected for over 3 years, to find sharp and coordinated behavioral changes in the pricing.

Poster

Smart Network Metrics Derivation from Flow Logs

Kalman Meth, Eran Raichstein, Ronen Schaffer (IBM Research), Mario Macias, Joel Takvorian (Redhat), Katherine Barabash (IBM Research)

Abstract

Flow-Logs Pipeline is an observability tool that consumes raw network flow-logs and transforms them from their original format (e.g. NetFlow or IPFIX) into numeric metrics format. FLP allows to define transformations of data to generate condensed metrics that encapsulate network domain knowledge.

Poster

Auctioning Cluster Resources

Lee-or Alon, Sigal Oren (Ben-Gurion Universirty), Orna Agmon Ben-Yehuda (University of Haifa),

Abstract

Organizational clusters are often shared among users who compete over resources, but the organization’s goal is to increase the overall utility gathered from the cluster. That is, to maximize the aggregate benefit drawn from the cluster. To overcome this problem, we enhanced SLURM with an auctioning system. We evaluated our work on several real cluster traces. Our auctioning scheduler reduces the weight of the queued jobs by 3X–11X compared with backfilling.

Poster

WeRLman: To Tackle Whale (Transactions), Go Deep (RL)

Roi Bar-Zur, Ameer Abu-Hanna, Ittay Eyal, Aviv Tamar (Technion)

Abstract

Blockchain technology is responsible for the emergence of cryptocurrencies, such as Bitcoin and Ethereum. The security of a blockchain protocol relies on the incentives of its participants. Selfish mining is a form of deviation from the protocol where a participant can gain more than her fair share. Previous analyses of selfish mining make easing, nonrealistic assumptions. We introduce a more realistic model which takes transaction fees into consideration. However, this comes at the cost of an intractable state space. To solve the complex model, we use deep Reinforcement Learning (deep RL). Our method can then serve to analyze more realistic models or more blockchain protocols, leading to the design of more secure blockchains in the future.

Poster

Evaluating Compressed Indexes in DBMS

Oz Anani (Tel-Aviv University), Gal Lushi (Tel-Aviv University), Moshik Hershcovitch (IBM Research/ Tel-Aviv University), Adam Morrison (Tel-Aviv University)

Abstract

In-memory database management systems (DBMSs) are an essential part of real-world applications.

The memory footprint is the essential resource in such systems, while the database indexes consume a large portion of the memory and can reach up to 50% of total memory consumption.

B+tree is the most popular default index supported in these systems. There has been intensive work to create compressed B+tree indexes to lower the memory footprint

This paper evaluates two state-of-the-art compressed B+tree implementations, a Hybrid Index and Blind Trie implementation called SeqTree. We evaluate these compressed indexing on the system level. We integrated those indexing into HSTORE and evaluated them with TPC-C workload.

In our evaluation, SeqTree is more attractive since it consumes 30% less space than the Hybrid Index with the same throughput. We confirmed the potential of compressed B+-tree indexes. We see a 60%-75% reduction in the memory consumption of compressed B+-tree indexes compared to vanilla B+tree for 10%-15% in performance degradation.

Poster

A Hybrid Attribute Based Access Control Model Applied to Data in a Hybrid Cloud Environment

Grant Miller (IBM)

Abstract

Attribute Based Access Control (ABAC) has started to be more broadly adopted by enterprises around the world to provide more dynamic controls to accessing data that Role Based Access Control (RBAC) cannot [1]. The challenge is that RBAC is still widely used inside the same enterprises and ABAC has several limitations reducing its benefits. This poster will look at a hybrid implementation that overcomes disadvantages of ABAC while leveraging and integrating with RBAC managed systems.

Poster

System-level Crash Safe Sorting on Persistent Memory

Omri Arad , Yoav Ben Shimon, Ron Zadicario (Tel-Aviv University), Daniel Waddington (IBM Research), Moshik Hershcovitch (IBM Research/ Tel-Aviv University), Adam Morrison (Tel-Aviv University)

Abstract

Sorting is a fundamental operation in software systems. An example for that is a prepossessing phase before executing analytics operations.

Persistent memory (PM) is a nonvolatile device with low latency and byte-addressable access. Our experiments used Intel’s first commercially device called Intel Optane DC.

In this paper, we evaluate system-level Crash Safe Sorting trade-offs while running with PM. We choose the merge-sort algorithm, which is widely used to organize and search for information. We integrated our sorting algorithm into MCAS (Memory Centric Active Storage), which is a high-performance client-server key-value store explicitly designed for PM.

Our evaluation shows that Sorting fully on PM is only 1.58 slower than sorting on DRAM while giving persistent ability. In addition, the time for persisting the data is the bottleneck in the crash-safe implementation. This bottleneck can indicate the potential for utilizing a trade-off between performance and persistence.

Poster

SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation

Rana Shahout, Roy Friedman (Technion), Ran Ben Basat (University College London)

Abstract

Stream monitoring is fundamental in many data stream applications, such as financial data trackers, security, anomaly detection, and load balancing. In that respect, quantiles are of particular interest, as they often capture the user’s utility. For example, if a video connection has high tail latency, the perceived quality will suffer, even if the average and median latencies are low. In this work, we consider the problem of approximating the per-item quantiles. Elements in our stream are (ID, latency) tuples, and we wish to track the latency quantiles for each ID. Existing quantile sketches are designed for a single number stream (e.g., containing just the latency). While one could allocate a separate sketch instance for each ID, this may require an infeasible amount of memory. Instead, we consider tracking the quantiles for the heavy hitters (most frequent items), which are often considered particularly important, without knowing them beforehand. We first present a simple sampling algorithm that serves as a benchmark. Then, we design an algorithm that augments a quantile sketch within each entry of a heavy hitter algorithm, resulting in similar space complexity but with a deterministic error guarantee. Finally, we present SQUAD, a method that combines sampling and sketching while improving the asymptotic space complexity. Intuitively, SQUAD uses a background sampling process to capture the behaviour of the latencies of an item before it is allocated with a sketch, thereby allowing us to use fewer samples and sketches. Our solutions are rigorously analyzed, and we demonstrate the superiority of our approach using extensive simulations.

Poster

Model-Based Simulation for SMT Cores

Idan Raz , Maxim Barsky, Ella Sheory, Idan Yaniv, Dan Tsafrir (Technion — Israel Institute of Technology)

Abstract

Studies that evaluate new architectural designs of virtual memory typically employ a “model-based’’ methodology that relies on simulations of the translation lookaside buffer (TLB) coupled with empirical performance models. We observe that this methodology is limited in that each simulated thread of execution has its own dedicated TLB, whereas modern processors share a single TLB among multiple threads through simultaneous multithreading (SMT). Existing model-based research is thus unable to explore virtual memory designs in SMT context. We address this problem: (1) by showing that the behavior of different multiprogrammed thread combinations varies over time nontrivially, and by introducing a systematic approach for measuring this behavior with bounded error; (2) by developing a TLB simulator capable of realistically combining multiple memory-reference streams of the SMT threads into one; (3) by validating the simulator’s accuracy against real (Intel) processors to ensure the correctness of our approach, which required us to reverse engineer their TLB eviction policy; and (4) by showing how to build empirical models that predict runtimes of different SMT workloads from their combined simulated TLB miss rate. We demonstrate our methodology’s usefulness by evaluating a new TLB partitioning mechanism for SMT processor cores.

The 15th ACM International Systems and Storage Conference

Accepted Posters