NVIDIA SolutionS Architect Intern - CSP&CRISP - 2025

At NVIDIA, we're pioneering transformative AI technologies that reshape industries globally. Join our dynamic, forward-thinking team working directly at the bleeding edge of Large Language Models (LLMs). This internship offers a unique opportunity to contribute to ​customer-accelerated deployments​ where your engineering skills directly impact leading enterprises. You'll collaborate closely with senior engineers and hardware experts to push the boundaries of efficiency and capability in LLM training and inference. Gain unparalleled experience applying your knowledge to real-world challenges on NVIDIA's industry-defining Tensor Core GPUs and full-stack AI platforms. Stand shoulder-to-shoulder with world-class researchers and engineers solving the next generation of AI scale and speed.

What you’ll be doing

  • ​Drive LLM efficiency:​​ Design and leverage advanced low-precision quantization techniques (INT8, FP8, FP4) to optimize inference performance for customer deployments.

  • ​Innovate with frameworks:​​ Simulate, optimize, and extend cutting-edge training & inference frameworks (e.g., vLLM, SGLang, TensorRT-LLM, NeMo, Megatron) within NVIDIA's ecosystem.

  • ​Enable new AI capabilities:​​ Integrate and validate next-generation LLM architectures and features within core frameworks to expand NVIDIA's solution offerings.

  • ​Tune for peak performance:​​ Conduct rigorous performance analysis and tuning of LLM workloads for optimal execution on cloud and on-premises NVIDIA platforms.

  • ​Collaborate on customer solutions:​​ Partner with engineering teams and solution architects to translate customer requirements into high-impact LLM engineering implementations.

What we need to see

  • Pursuing a ​MS or PhD​ in Computer Science, Artificial Intelligence, Electrical Engineering, or a related field.

  • ​Hands-on experience​ with large language model (LLM) training and/or inference frameworks from project work, research, or prior internships.

  • ​Strong proficiency in PyTorch​ and ​Python programming.

  • ​Solid foundational understanding​ of:

    • Transformer architectures & core LLM algorithms.

    • Principles and trade-offs of model quantization techniques.

    • Distributed training paradigms (e.g., FSDP, ZeRO, 3D/5D parallelism, RLHF infrastructure).

  • A ​link to your GitHub profile or code samples​ is required with your application (demonstrating relevant projects).

Ways to stand out from the crowd

  • Demonstrable experience with ​quantization tools and workflows​ (e.g., GPTQ, AWQ, SmoothQuant).

  • ​Contributions to relevant Open Source Software projects​ (e.g., vLLM, SGLang, Hugging Face Transformers, PyTorch, DeepSpeed).

  • ​Understanding of GPU architecture (CUDA), high-performance computing concepts, and cluster communication libraries​ (e.g., NCCL, MPI).

  • Record of ​published research​ in machine learning, NLP, or systems at major conferences/journals.

  • Experience deploying or optimizing workloads on ​NVIDIA GPUs​ and familiarity with ​NVIDIA AI software stacks.



Subscribe now!

Want to receive frequent updates by email? Subscribe to our automatic job service!

Related vacancies