Frequently Asked Questions
Training with Multiple Nodes
RLite implements distributed training based on Ray. All users need to do is configure the Ray cluster and call rlite.init().
Specifically, users first need to SSH into the master node and start the Ray cluster:
# On master node
ray start --head
Then, users need to SSH into other nodes and start Ray workers:
# On other nodes
ray start --address='<master_node_ip>:6379'
After that, users can start training with the same code:
# On master node
python my_script.py