Responsibilities:

  • Explore, experiment with, and implement novel algorithms to improve the performance, efficiency, and capabilities of large language models (LLMs).
  • Stay up to date with the latest advances in academia and industry; design and validate new approaches to address key challenges in pre-training, such as cost efficiency, data quality, and model capability scaling.
  • Collaborate closely with engineering and data teams to translate research ideas into scalable, production-ready pre-training systems and workflows.

Qualifications:

  • Master’s or PhD degree (or equivalent practical research experience) in Computer Science, Artificial Intelligence, Mathematics, or a related field.
  • Research experience: At least 3 years of hands-on experience in deep learning and/or NLP, including a minimum of 1 year of direct, hands-on involvement in pre-training large-scale language models at the 30B+ parameter scale.
  • Strong algorithmic foundation: Deep understanding of the mathematical principles behind deep learning and NLP; solid expertise in architectures and techniques such as Transformers, Diffusion models, and RLHF.
  • Strong engineering and systems skills: Extensive experience with large-scale system debugging, profiling, and performance optimization in distributed training environments.
If you are interested in these job openings, please submit your resume and cover letter to shandahr@shanda.com. We also welcome assistance from recruitment agencies.