All roles

[Remote] Director, Data & AI/ML Platform Engineering

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Stitch Fix is redefining retail by combining human creativity with advanced data science and Generative AI. They are seeking a Director of Data & AI/ML Platform Engineering to lead the engineering organization responsible for the enterprise data platform, machine learning platform, and generative AI platform, ensuring they meet the needs of various user groups across the company.

Responsibilities

  • Data infrastructure at scale. The systems that ingest, store, and make data accessible across the company - petabyte-scale lakehouse, event streaming, workflow orchestration, data governance, and the self-service tools that make this infrastructure usable without platform team involvement at every step
  • Machine learning platform. The infrastructure that enables data scientists and engineers to build, experiment, and serve models in production at speed - feature stores, training pipelines, distributed model serving, and the MLOps practices that keep production models healthy, observable, and improving
  • Generative AI platform. The platform that enables teams across the company to build, deploy, and govern AI agents and GenAI-powered applications - runtime and routing infrastructure, self-service agent-building tools, context and retrieval management, observability and evaluation frameworks, and the cost and safety controls that keep AI reliable, governed, and improving in production
  • The next generation of personalization and decisioning. The foundational platform work behind the company's highest-priority strategic initiatives - partnering with Data Science, Algorithms, and Product to build the next generation of intelligence infrastructure: deeper understanding of clients, products, and style, powered by real-time data, AI reasoning, and systems that continuously improve
  • Set and own the product vision for each platform area. Treat internal platforms as products. Understand your users, define north star metrics for platform health and adoption, build a roadmap that earns trust, and communicate the vision in a way that rallies engineers and gains stakeholder buy-in
  • Own platform modernization decisions. Lead strategic architectural shifts - open table format migration, feature store re-foundation, model serving modernization, agentic AI infrastructure buildout - on behalf of users and stakeholders. Drive these from problem definition through adoption, not just implementation
  • Compress time from idea to production. Build the developer experience, self-service tooling, and golden paths that reduce friction for every type of user - from engineers and data scientists building pipelines and models, to analysts exploring data in BI tools, to business operators building and running AI-assisted workflows. Speed to insight and speed to production are both critical
  • Lead and grow the organization. Manage engineering managers and senior ICs across three platform areas. Create clarity, remove blockers, and develop people - while continuously evolving how the team works, applying the AI capabilities you build to accelerate your own org's velocity and shaping the skills and structure the team needs for an AI-first engineering model
  • Drive cross-functional alignment. Partner with Data Science, ML Engineering, Data Engineering, Product, and Business leaders to align platform investment with business priorities. Represent the platform in quarterly planning, architecture reviews, and executive forums
  • Communicate with authority at every level. Write crisp strategy documents. Present platform trade-offs to the C-suite. Sit with an engineer and whiteboard a system design. Fluency across these modes is a requirement, not a nice-to-have
  • Run the business. Own budget, headcount planning, vendor relationships, contractor management, and the long-horizon platform strategy. Balance investment in new capabilities with operational excellence and the reduction of legacy

Skills

  • 10+ years in software, data, or ML/AI platform engineering; 5+ years leading engineering managers or multi-team platform organizations
  • Track record of owning and evolving production-grade platform systems at scale - not just building them, but driving adoption, rationalizing legacy, and measurably improving developer and data science productivity over time
  • History of making and landing consequential architectural decisions in complex, high-availability environments; comfort with the full lifecycle from design through post-launch iteration
  • Hands-on experience with distributed compute and storage (Spark, Trino/Presto, Apache Iceberg or Delta Lake), event streaming (Kafka, Flink), workflow orchestration (Airflow), and data governance and quality systems
  • Feature engineering and feature stores, model training pipelines, model deployment and serving (Ray Serve, Triton, or equivalent), monitoring and validation, and the operational practices of running ML in production (MLOps)
  • LLM orchestration frameworks, retrieval-augmented generation (RAG), agent architectures, evaluation frameworks, cost and latency governance, and the emerging standards around agentic AI (Model Context Protocol or equivalent)
  • Experience building internal developer platforms (IDPs), self-service tooling, and platform abstractions that reduce friction for engineering teams; familiarity with developer experience metrics and platform adoption patterns
  • Distributed systems design, container orchestration (Kubernetes), and cloud infrastructure at scale (AWS preferred)
  • Product-led mindset. You approach internal platforms the same way a strong product leader approaches external products: segmented user personas, defined success metrics, a prioritized roadmap, and a bias toward adoption and impact over feature completeness
  • 360-degree execution. You own the full loop - discovery and planning, iterative delivery, production quality, user enablement and evangelism, and the feedback loops that close on real-world impact
  • Strategic communication and influence. You can make a compelling case for a multi-year platform investment to a CxO, write a technical design doc your engineers will actually follow, and give a data scientist a useful answer about why their job is slower than it should be. Each of these is a different skill; you have all three
  • You represent users' needs inside the platform team. You hold the bar on developer experience, self-service reliability, and documentation quality. You treat user complaints as signal, not noise

Benefits

  • Competitive salary
  • Benefits
  • Equity
  • Annual bonus
  • New hire and ongoing grants of restricted stock units, depending on employee and company performance
  • Medical, dental, vision, and other benefits

Company Overview

  • Stitch Fix is a personal styling platform that delivers curated and personalized apparel and accessory items for women. It was founded in 2011, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is http://stitchfix.com.
  • Company H1B Sponsorship

  • Stitch Fix has a track record of offering H1B sponsorships, with 4 in 2026, 22 in 2025, 18 in 2024, 17 in 2023, 45 in 2022, 34 in 2021, 30 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Related roles

    [Remote] Senior Data Engineering Consultant

    Remote · USA Full-time

    [Remote] Financial Product Marketing Specialist

    Remote · USA Full-time

    [Remote] Principal, Business Development

    Remote · USA Full-time

    [Remote] Customer Support Engineer (Inference)

    Remote · USA Full-time

    [Remote] Network Operations Manager

    Remote · USA Full-time

    [Remote] Principal Strategic Application Security Consultant, Mandiant, Google Cloud

    Remote · USA Full-time

    [Remote] Senior Software Engineer 2

    Remote · USA Full-time

    [Remote] Account Development Executive

    Remote · USA Full-time

    [Remote] Account Development Executive (Miami, FL)

    Remote · USA Full-time

    [Remote] Sr Finance Manager (Oracle EPM Administrator)

    Remote · USA Full-time

    Remote Night Jobs – Live Chat Customer Support | $25–$35/hr

    Remote · USA Full-time

    Customer Support Engineer (REMOTE)

    Remote · USA Full-time

    Experienced Guest Services Assistant – Travel Industry Expertise & Customer-Focused Remote Opportunity at arenaflex

    Remote · USA Full-time

    Skilled Inpatient Care Coordinator - RN, PT, OT, or SLP - Remote Opportunity with naviHealth in Nyack, NY

    Remote · USA Full-time

    Senior P&C Insurance Product Management Specialist

    Remote · USA Full-time

    HyperCare Support Specialist / Digital Dentistry Support Specialist

    Remote · USA Full-time

    Immediate Hiring: Sustainment Engineer/ ATP / Orlando FL

    Remote · USA Full-time

    Experienced Entry-Level Remote Customer Chat Support Specialist – Thriving in a Dynamic arenaflex Team

    Remote · USA Full-time

    Sr. Scientist, Scientific Operations

    Remote · USA Full-time

    Proposal Writer, Enterprise Marketing

    Remote · USA Full-time