Search results for: Chengcheng Hu
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1

Search results for: Chengcheng Hu

1 SAP-Reduce: Staleness-Aware P-Reduce with Weight Generator

Authors: Lizhi Ma, Chengcheng Hu, Fuxian Wong

Abstract:

Partial reduce (P-reduce) has set a state-of-the-art performance on distributed machine learning in the heterogeneous environment over the all-reduce architecture. The dynamic P-reduce based on the exponential moving average (EMA) approach predicts all the intermediate model parameters, which raises unreliability. It was noticed that the approximation trick leads to the wrong way of obtaining model parameters in all the nodes. In this paper, SAP-reduce is proposed, which is a variant of the all-reduce distributed training model with staleness-aware dynamic P-reduce. SAP-reduce directly utilizes the EMA-like algorithm to generate the normalized weights. To demonstrate the effectiveness of the algorithm, the experiments are set based on a number of deep learning models, comparing the single-step training acceleration ratio and convergence time. It was found that SAP-Reduce simplifies dynamic P-Reduce and outperforms the intermediate approximation one. The empirical results show SAP-Reduce is 1.3× −2.1× faster than existing baselines.

Keywords: collective communication, decentralized distributed training, machine learning, P-Reduce

Procedia PDF Downloads 0