Accepted Posters

We are delighted to announce the posters that have been accepted to ACM SYSTOR’25.

Compute-based Fault Tolerance for DNN

Adi Molkho (Huawei Cloud), Amit Golander (Tel-Aviv University) and Oded Schwartz (The Hebrew University)

Deep neural network (DNN) systems use many GPUs, which can fail—making fault tolerance (FT) essential to avoid cluster restarts. Traditional FT relies on frequent checkpointing, incurring high bandwidth and memory costs. We propose an alternative strategy using GPU redundancy, introducing
uniform and heterogeneous encoding approaches. We analyze their costs and recommend usage scenarios, especially for emerging rack-scale AI computers

From Alert Fatigue to Trusted Insights: A Simple Non-ML Anomaly Detection Experience

Avi Illouz, Leonid Rise, Michal Lazarovitz, Eli Shemesh and Grisha Weintraub (IBM)

Distinguishing actionable anomalies from noise in monitoring metrics is a critical operational challenge. In our experience, static thresholds often resulted in overwhelming alert fatigue, while pilot projects utilizing ML-based models proved to be uninterpretable and produced unreliable alerts. We implemented an interpretable system based on simple rules co-designed with our support team, focusing on sustained, meaningful deviations from a seasonal baseline. The result was a dramatic shift in operator trust and efficiency, with feedback indicating that “all alerts are good”.

MictlanX: Elastic code-defined object storage system

Ignacio Castillo-Barrios, Josa Compean and Ivan Lopez-Arevalo (CINVESTAV Research)

Modern cloud-native applications demand storage that is not only elastic and scalable but also programmable dynamic, policy-driven behavior at runtime. Existing object stores (e.g., S3, Swift) limit programmability to coarse configuration and separate deployment-time provisioning from execution-time logic. We present MictlanX, a code-defined object storage architecture that unifies infrastructure-as-code (IaC) and programmable data flows. Our design introduces (i) a Responsive Deployment Model enabling declarative, policy-driven instantiation and auto-scaling of storage pools, and (ii) an Adaptive Data Placement Model where linked ג€œbucketsג€ and in-transit filters (e.g., encryption, replication, compression) are specified in code and applied dynamically.
In our prototype, on a 16-node cluster, MictlanX™s elastic replication policy sustains over 20MB/s under bursty loads (IAT = 0.01s) and reduces 90% of queue delays to under 0.1s, outperforming fixed-replica modes by up to 4ֳ—. A comparative evaluation against MinIO, Google Drive, and Dropbox shows 23% higher throughput in read-heavy and balanced mixes. Ongoing work includes integration with Kubernetes operators for multi-region deployments and enhanced policy verification.

Buffer Size Impact on End-to-End DNN Acceleration in Near-Bank DRAM-PIM Systems

Lu Zhao, Yunyu Ling and Simei Yang (South China University of Technology)

Processing-in-Memory (PIM) helps reduce DRAM access bottlenecks in accelerating Deep Neural Networks (DNNs). However, existing near-bank PIM solutions, such as SK Hynix GDDR6-AiM, face two major challenges: (1) limited operation support, i.e.., mainly multiply-accumulate (MAC) operations; (2) small buffer capacity, leading to high host-PIM traffic and limited data reuse. To address these challenges, we propose architectural optimization that enable end-to-end DNN execution within the DRAM-PIM system using ResNet as a case study. Specifically, we optimize the architecture with two types of processing cores: bank-level PIMcores for convolution and a channel-level GBcore for element-wise operations such as pooling and residual addition. We also introduce a local buffer (LBUF) inside each PIMcore, complementing the existing global buffer (GBUF), to improve data reuse potential. Using a modified Ramulator2 simulator and ResNet18 workload, we evaluate our proposed architecture under various buffer configurations. Results show that adding a 64B LBUF reduces memory system cycles to 75% of the baseline, while increasing it to 256B provides less than 5% additional improvement. Moreover, combining an 8KB GBUF with a 64B LBUF achieves a 59% cycle reduction. These observations highlight the performance benefits of introducing buffer in near-bank PIM and motivate further study of energy and area trade-offs, especially for DNN training.

Accelerating AI through Novel Compute Architecture and Digital Photonic Gates

Fahim Bellan and Amit Golander (Tel-Aviv University)

The growing computational demands and performance limits of AI models are often constrained by their underlying mathematical processing units. This issue spans various model sizes and hardware—from laptop CPUs to large GPU clusters. Current units, like a CPU’s Floating-Point Unit (FPU), use multi-cycle pipelines. While this design supports complex computations, it also adds latency and slows floating-point performance. Spending multiple cycles on an operation is slow, but preferable to lowering the global clock frequency
to support these intricate units.
This paper proposes a new direction for AI and mathematical compute architecture. We leverage recent advancements, in which photonic and electronic gates can coexist on the same die[2], and mix fast photonic FPUs within the cheaper electronic logic used for the rest of the chip.

AURA: Automated Updates and Repository Assistant

Daniel Ivanov, Itamar Alon, Daniel Bar-On and Sarel Cohen (The Academic College of Tel Aviv-Yaffo)

AI development faces challenges from fragmented workflows and varied LLM integrations. This modular GitHub framework offers structured guidance, seamless support for major LLMs, and clear, low-maintenance workflows-enabling sustainable and collaborative innovation.

VisVec: A Milestone Towards Compressing Images by Converting Them to SVG using LLMs

Shchar Levy, Maayan Mashhadi, Sarel Cohen and Ohad Rubin (The Academic College of Tel Aviv-Yaffo)

Generating SVGs from raster images is a challenging compression task due to the semantic and structural mismatch between pixel and vector formats. We introduce VisVec, a large-scale dataset of triplets: raster image, SVG vector, and textual description, designed to improve training for vision language models (VLMs) in vector generation tasks. The initial dataset comprises 2.5k high-resolution, semantically rich examples. Our dataset aims to address key limitations in models like GPT-4V and Claude 3, which often fail to produce valid SVG outputs due to lack of structured training data. Our dataset achieves 10.52 compression ratio when using SVG over PNG. Training LLMs on it could enable high-quality image-to-SVG compression.

SafeStep: A Wearable System for Obstacle Detection

Bar Tzipori, Matanel Ohayon, Esti Bricker, Itai Dabran and Tom Sofer (Technion – Israel Institute of Technology)

Approximately 2.2 billion people around the world live with some degree of visual impairment. For individuals with severe vision loss, maintaining functional independence and safely navigating their environment is a significant challenge. In this paper, we present SafeStep, a novel wearable system that enables early detection of obstacles at waist height and above, thus filling the gap in existing aids and enhancing user safety and independence.

FocusFlow: A Real-Time Engagement System Enabled by Client-Side Inference

Roman Bazinin, Alona Gatker, Jonathan Shaya and Sarel Cohen (The Academic College of Tel Aviv-Yaffo)

Passive video consumption is a primary cause of student disengagement in online education. To address this, we developed FocusFlow, a web platform that transforms video lectures into an interactive experience. FocusFlow uses AI to monitor student engagement in real-time and intervenes with context-aware, AI-generated quizzes when it detects a drop in focus. The effectiveness of such a system hinges on its ability to provide immediate feedback, which poses a significant systems challenge.
We present the FocusFlow system, its features, and the critical architectural decision offloading inference to the client that makes its real-time capabilities possible. It uses a viewer’s webcam, with MediaPipe for facial landmark extraction and a custom-trained ONNX model running in the browser to generate a continuous engagement score. For each video, a generative AI creates a set of context-aware questions. If a student’s engagement drops, the system presents one of these pre-generated questions as a real-time intervention to re-engage them.