MLOps Engineer at Shizuku AI

MISSION

As the founding MLOps engineer, design and build Shizuku’s ML infrastructure from the ground up. Establish the complete pipeline — from data ingestion through training environments to model serving — creating an internal platform that empowers ML engineers to iterate on models at maximum velocity.

Replace individual, siloed development environments with a unified team-scale ML development platform, maximizing the speed of Shizuku’s evolution.

ABOUT SHIZUKU

Shizuku is a Japan-born AI companion actively engaging audiences on YouTube and X (formerly Twitter). Already running live streams and cultivating a growing community, Shizuku is now entering its next phase of rapid scale.

As the first Japanese startup to receive investment from a16z, we closed our seed round and are on a mission to bring Japanese entertainment × AI to the global stage.

TEAM STRUCTURE

You will work closely with founder Aki (ML engineer and researcher, ex-Meta, ex-Luma AI) and Engineering Director Ohno to drive the design and construction of our ML infrastructure. As the first MLOps engineer, you’ll have significant autonomy — from technology selection to operational design.

Post-foundation, career paths include both a management track leading a growing team and an IC track deepening technical expertise, tailored to your aspirations.

CURRENT STATE & WHAT YOU’LL BUILD

Infrastructure Status: Modern application infrastructure is in place, but ML training and MLOps tooling are not yet established. AWS adoption is planned
What You’ll Build: An internal platform for ML engineers developing Shizuku’s AI models. The goal: eliminate siloed, ad-hoc local workflows and code ownership by individuals, replacing them with a team-oriented ML development foundation

KEY RESPONSIBILITIES

Design, build, and operate the end-to-end ML training pipeline: data collection/preprocessing → training → evaluation → deployment
Design and build GPU training infrastructure on AWS (A100, L4, etc.) with cost optimization
Build an internal ML platform for engineers: experiment tracking, model versioning, and reproducibility guarantees
Design and build model serving infrastructure: inference APIs, auto-scaling, and latency management
Establish training data management and quality assurance pipelines
Design and implement CI/CD for ML: automated training, model testing/evaluation, and staged rollouts
Drive production integration of models in collaboration with ML Engineer and SWE teams
Build monitoring and visibility infrastructure for long-term compute cost and GPU utilization tracking

REQUIREMENTS

3+ years of experience designing, building, and operating cloud infrastructure on AWS, GCP, or equivalent platforms
Experience building ML/DL pipelines and infrastructure
Hands-on experience designing and operating production environments using container technologies (Docker/Kubernetes)
Experience managing infrastructure as code (Terraform, Pulumi, etc.)
Strong Python skills for building tools and pipelines
Ability to work on-site at our Tokyo office (primarily in-office with flexible remote arrangements)

NICE TO HAVE

Experience building, operating, and cost-optimizing GPU clusters (A100, H100, L4, etc.)
Experience with ML platforms: SageMaker, Vertex AI, Ray, Kubeflow, etc.
Experience deploying and operating experiment tracking infrastructure: MLflow, Weights & Biases, DVC, etc.
Experience building model serving infrastructure: Triton Inference Server, TorchServe, vLLM, SGLANG, etc.
Experience designing and building internal ML development platforms
Domain-specific knowledge of ML workloads in speech, NLP, or vision
Experience as a founding infrastructure/MLOps engineer at a startup
Technical communication skills in English (currently Japanese-first internally; transitioning to a global environment in the mid-term)

WHO YOU ARE

Founding Engineer Mentality — You don’t wait for established systems to improve — you define the design philosophy and build the foundation from zero. You’re energized by creating the system itself, not just refining one
ML-Literate Infrastructure Engineer — You understand the unique characteristics of ML training and inference workloads, and you translate that understanding into optimally designed infrastructure
Purpose-Driven Ownership — You reverse-engineer from “maximizing ML team velocity,” set your own priorities, and drive execution autonomously
Comfort with Ambiguity — You design for a world where model count, training frequency, and data volume are still being defined — starting small and scaling architecturally as the picture clarifies
Resilience & Respect — You engage as an equal partner with ML Engineers and SWEs, elevating the entire team’s productivity through collaboration

MLOps Engineer

Overview

Required Skills

Similar Jobs

Engineering Manager, AX Promotion Division (AI Solution Department), Tokyo

QA Engineer (Workflow Platform Development Division, Nagoya)

Forward Deployed Engineer (FDE) & Data Collection Lead

Staff Fullstack Engineer - Realtime and Personalization

Apply for this position

Company Information

Shizuku AI

Similar Jobs

Engineering Manager, AX Promotion Division (AI Solution Department), Tokyo

QA Engineer (Workflow Platform Development Division, Nagoya)

Forward Deployed Engineer (FDE) & Data Collection Lead

Staff Fullstack Engineer - Realtime and Personalization