Skip to main content
S&P Global

Data Scientist

5d

S&P Global

New York City, US · Full-time · $75,000 – $90,000

About this role

The Collection Platforms & AI team builds ML-powered products and capabilities for natural language understanding, data extraction, information retrieval, and data sourcing solutions. You will spearhead development of production-ready AI products and pipelines. Work in a global team encouraged for thoughtful risk-taking and self-initiative.

Own all stages of the data science project lifecycle, developing, deploying, monitoring, and scaling ML and GenAI models through the full Software Development Life Cycle into production. Perform exploratory data analysis, proof-of-concepts, model benchmarking, and validation experiments for both ML and GenAI approaches. Partner with business leaders, domain experts, and end-users to gather requirements and align on success metrics.

Join a dynamic team solving diverse problems using applied machine learning and web development with end-to-end implementation from inception to productionizing. Be part of a highly skilled, hands-on technical team in a highly engaging work environment. Contribute at enterprise scale within a global company.

Develop next generation products while enhancing existing ones to solve high-impact business problems. Build end-to-end production-ready pipelines from ideation to deployment. Grow by contributing to high-complexity, high-impact challenges with the team that has delivered breakthrough products.

Requirements

  • Strong grasp of statistics, probability, and mathematics underpinning modern AI including linear programming, optimization, and multi-dimensional optimizers like Adam, SGD
  • Hands-on experience with large language models (e.g., OpenAI, Anthropic, Llama), prompt engineering, fine-tuning/customization, and embedding-based retrieval
  • Intermediate proficiency in Python (NumPy, Pandas, SpaCy, scikit-learn, PyTorch/TF 2, Hugging Face Transformers)
  • Understanding of ML & Deep Learning models including architectures for NLP (transformers), GNNs, and multimodal systems
  • Solid understanding of database structures and SQL
  • Ability to perform independent research and synthesize current AI/ML research with track record of applying new methods in production
  • Experience in end-to-end GenAI or advanced NLP projects such as NER, table extraction, OCR integrations, or GNN solutions
  • Familiarity with orchestration and deployment tools: Airflow, Redis, Flask/Django/FastAPI, SQL, R-Shiny/Dash/Streamlit

Responsibilities

  • Develop and deploy large-scale ML and GenAI-powered products and pipelines
  • Develop, deploy, monitor, and scale models through the full Software Development Life Cycle into production
  • Perform exploratory data analysis, proof-of-concepts, model benchmarking, and validation experiments for ML and GenAI
  • Partner with business leaders, domain experts, and end-users to gather requirements and align on success metrics
  • Follow coding standards, perform code reviews, and optimize data science workflows
  • Evaluate, interpret, and communicate results to executive stakeholders

Benefits

  • Part of a dynamic team that solves diverse problems using applied machine learning and web development with end-to-end implementation
  • Build solutions at enterprise scale as part of a global company
  • Grow with a highly skilled, hands-on technical team
  • Contribute to solving high-complexity, high-impact problems end-to-end
  • Build end-to-end production-ready pipelines from ideation to deployment