Elastic Deep Learning (EDL) is an LF AI Foundation incubation project designed to help deep learning cloud service providers build cluster cloud services using deep learning frameworks such as PaddlePaddle. The project was originally developed and open-sourced by Baidu and it is licensed under the Apache 2.0 license. EDL includes a Kubernetes controller, PaddlePaddle auto-scaler, which changes the number of processes of distributed jobs to the idle hardware resource in the cluster, and a new fault-tolerable architecture.
Provides parallelism strategies to minimize adjustment overheads.
Accuracy verification on multiple models compared those without scaling.
Any components can be killed or joined at any time.
Easy to Use
Few lines of code need to be added to support EDL.
Please visit us on GitHub where our development happens. We invite you to join our community both as a user of EDL and also as a contributor to its development. We look forward to your contributions!