Q: So as a heavy user of Megatron, what is the difference / is Megatron being deprecated etc ? A: Nvidia NeMo is built on top of Megatron. You can use the same model or other open source models within NeMo Q: Can NeMo run on a single Nvidia L40? A: NeMo can run on L40s too. If you are using cloud apis you wont need GPUs. Depending on the use-cases and GPUs available, we have a tool that can help you with initial model size estimates for training/inference. You can host NIMS locally [too]. Q: I thought NeMo had no support for pipeline parallelism ? At least the documentation still says that A: Nemo Framework does support pipeline parallelism. https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/parallelisms.html Q: Could you give more details about how Megatron is used within NeMo ? For example if I have a fix or added functionality in Megatron, how would I implement it in NeMo ? I assume Megatron core is somewhere in the NeMo codebase A: That's right, part of NeMo is built on top of Megatron-Core, the customizer module - it's an abstraction layer on top of it with some higher level functionality. M-core can be used for more customization and building a framework on top of (megatron-lm for example) or use NeMo FW out of the box Misc chat: https://github.com/NVIDIA/Megatron-LM?tab=readme-ov-file#training-speed-and-scalability