Evaluating Training Acceleration through Selective Workload Skipping: Methods and Benchmarks
Open Access
Article
Conference Proceedings
Authors: Kareem Ibrahim, Milos Nikolic, Nicholas Giamblanco, Ali Hadi Zadeh, Enrique Torres Sanchez, Andreas Moshovos
Abstract: Work-skipping methods accelerate neural network training by selectively skipping work that is deemed not to contribute significantly to learning. The goal of such methods is to reduce training time while incurring little or negligible reduction in output accuracy. We identify a “blind-spot” in current best-practice methodologies used to evaluate the effectiveness of work-skipping methods. Current methodologies fail to establish objective ways of determining whether a time reduction vs. accuracy drop trade-off is indeed beneficial. We propose a set of guidelines for evaluating the effectiveness of workload skipping techniques. Our guidelines emphasize the importance of using wall clock time, comparing with random skipping baselines, incorporating early stopping or time-to-accuracy measures, and utilizing Pareto curves. By providing a structured framework, we aim to assist practitioners in accurately determining the true speed advantages of training acceleration algorithms that involve workload skipping. To illustrate the appropriateness of our guidelines we study two work-skipping methods: GSkip, which skips complete layer’s gradient computations and weight updates based on their relative changes, and DeadHorse, which selects data samples for backpropagation according to output confidence. We demonstrate how our methodology can establish when these methods are indeed beneficial. We find that on many occasions, random skipping, early termination, or hyperparameter tuning may be as effective if not more.
Keywords: Neural Network Training Acceleration, Workload-Skipping, Gradient Skipping, Sample Skipping, Performance Evaluation
DOI: 10.54941/ahfe1005898
Cite this paper:
Downloads
11
Visits
44