SINGAPORE / REMOTE
/
FULL TIME

AI Infrastructure Architect

We are seeking a senior infrastructure architect to design, build, and own the cloud-agnostic platform powering our Causal AI operating systems. This is a high-ownership, greenfield role.

AI Infrastructure Architect

Job Title: AI Infrastructure Architect

Company: Nubio.World

Location: SINGAPORE / REMOTE

Reports To: CTO

About Nubio.World: Forging the Causal AI Frontier

At Nubio, we are architecting Causal AI Operating Systems to decode and master the hidden physics of high-stakes industries. We treat complex sectors—finance, aviation, energy, and global supply chains—as intricate, simulatable universes. Our mission is to move beyond superficial correlations and build AI that models causal depth, enabling cognitive, adaptive industrial systems.

Inspired by the philosophy that "information is primary, more fundamental than energy and matter," we engineer systems that simulate reality through self-play and reinforcement learning—discovering strategies that transcend known data.

We are a high-velocity, mission-driven team with a "relentless shipping" mindset.

The Role: Building the Cognitive Platform

Nubio is seeking a senior AI Infrastructure Architect to design and own the cloud-agnostic infrastructure platform that powers our Causal AI operating systems. You will ensure our world model simulation and inference engines run reliably, securely, and at scale — across AWS, GCP, and Azure — without vendor lock-in.

This is a greenfield role. You will make foundational architectural decisions that shape how Nubio operates for the next decade.

What You Will Do

  • Cloud-Agnostic Infrastructure Design: Architect and implement multi-cloud infrastructure using Terraform and Pulumi across AWS, GCP, and Azure — building for portability, not lock-in.
  • MLOps & LLMOps Pipelines: Build and maintain end-to-end pipelines for training, evaluation, versioning, and deployment of causal AI models at scale.
  • Data Ingestion Architecture: Engineer high-throughput, fault-tolerant data pipelines that ingest from aviation, energy, and supply chain data sources across multiple geographies.
  • GPU & Accelerator Workloads: Provision and manage GPU and accelerator workloads for large-scale model training and real-time inference — cloud-provider-agnostic.
  • CI/CD & Deployment Frameworks: Establish zero-downtime CI/CD pipelines for platform microservices and model serving endpoints across all environments.
  • Observability & Reliability: Implement distributed tracing, structured logging, and alerting stacks across cloud environments using OpenTelemetry, Prometheus, and Grafana.
  • Security & Compliance: Architect enterprise-grade, secure infrastructure that satisfies the compliance requirements of aviation and energy sector customers.
  • Platform Engineering Culture: Champion infrastructure-as-code practices, SRE principles, and reliability engineering standards across the engineering organisation.

Ideal Candidate Profile

  • Experience: 5+ years in infrastructure engineering, platform engineering, or Site Reliability Engineering (SRE).
  • Cloud-Agnostic Mindset: Deep expertise in at least two of AWS, GCP, or Azure — with a principled, portable design philosophy that avoids unnecessary vendor dependency.
  • Kubernetes & Containers: Strong proficiency in Kubernetes, Helm, and container orchestration at production scale.
  • Infrastructure as Code: Hands-on experience with Terraform or Pulumi for managing multi-cloud environments.
  • MLOps Tooling: Experience with MLOps platforms (Kubeflow, MLflow, Ray, or similar) and GPU workload scheduling on cloud infrastructure.
  • Languages: Proficient in Python and bash; Go is a strong plus.
  • Observability: Experience with OpenTelemetry, Prometheus/Grafana, or Datadog for production observability.
  • Industry Exposure: Experience in regulated industries (aviation, energy, or finance) is a strong advantage.

Tech Stack (Cloud-Agnostic)

Cloud: AWS / GCP / Azure (multi-cloud)

Orchestration: Kubernetes, Helm

IaC: Terraform, Pulumi

CI/CD: GitHub Actions

MLOps: Kubeflow / MLflow / Ray

Observability: OpenTelemetry, Prometheus, Grafana

Data: PostgreSQL, Redis, object storage (S3/GCS/Azure Blob)

Languages: Python, Bash, Go

Please send your CV to careers@nubio.world.

Benefits

We invest in your wellbeing so you can do your best work.

Unlimited PTO Icon - Quantum | Webflow Template
Unlimited PTO
Health Icon - Quantum | Webflow Template
Health benefits
Flexible Hours Icon - Quantum | Webflow Template
Flexible hours
Great Culture Icon - Quantum | Webflow Template
Great culture

More positions

Explore other open roles at Nubio.

Browse all positions