Responsibilities:

  • Understand training data requirements for foundation models and translate model performance goals into clear, high-quality data delivery plans and execution roadmaps.
  • Identify and source high-quality industry data resources for foundation model training; design and execute data sourcing and expansion strategies to build long-term data reserves.
  • Design and manage the overall foundation model training data framework, including data asset accumulation, organization, and lifecycle management.

Qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • Familiarity with common data cleaning and processing techniques and tools, such as content extraction, deduplication, and quality/toxicity filtering models.
  • At least 1 year of relevant experience in related fields; solid understanding of training data requirements for large language models and multimodal models; experience working with data vendors or maintaining data supplier networks is a strong plus.
  • Experience with data asset management concepts and practices, including data cataloging, governance, and lifecycle management.
If you are interested in these job openings, please submit your resume and cover letter to shandahr@shanda.com. We also welcome assistance from recruitment agencies.