Reliability of Hardware/ Edge ML Accelerators.

Reliability of Hardware/ Edge ML Accelerators.

Victor Oyadongh, Electrical Engineering, College of Engineering, North Carolina Agricultural and Technical State University

Description

This study evaluates the reliability of hardware accelerators for edge machine learning applications by examining how computing architectures and model optimizations affect system robustness under fault conditions. It begins with a detailed review of the literature surrounding the vulnerability of current acccelerator architectures namely; GPU, DSP, NPU. It then investigates whether platform-specific vulnerabilities lead to differences in user- visible errors during fault injection experiments by focusing on two distinct platforms — a Raspberry Pi 4 augmented with a Coral Edge TPU and a Jetson Xavier utilizing GPU acceleration. Additionally, the study compares the impact of optimization frameworks (TensorFlow Lite versus TensorRT) and assesses the relative resilience of specialized computational units (GPU, DSP, and NPU) in common ML operations such as matrix multiplication and convolution. Using controlled fault injection techniques with tools like Tensor-FI and BFA, ML applications are deployed on both platforms to analyze error propagation and performance degradation. The resulting comparative analysis aims to identify the most reliable hardware accelerator and computing model configuration for edge ML deployments, providing valuable insights for designing robust and resilient edge computing systems.