Senior Software Engineer, AI Infrastructure

Engineering · Full-time · CA, United States

Job description

THE COMPANY Our mission is to build the Covariant Brain, a universal AI to give robots the ability to see, reason and act on the world around them. Bringing AI from research in the lab to the infinite variability and constant change of our customer’s real-world operations requires new ideas, approaches and techniques.

Success in the real world requires a team that represents that world: diversity of backgrounds, points of view, and experiences. Our common denominator: ambitious expectations, love of learning, empathy for those around us, and a team-first mindset.

THE ROLE The AI Infrastructure team makes Covariant’s robot data accessible and easy to leverage to develop, debug, and deploy AI-based software. Our vision is to automate and refine every step of the AI lifecycle, from collecting, indexing, and annotating data to training and deploying new models and monitoring performance across our robot fleet. To this end we are designing a data platform that processes terabytes of robot telemetry data every day making it searchable and usable by the rest of the company. We build the core libraries, services, and tools that form the foundation of AI software development at Covariant and we are hiring senior engineers to help us achieve our vision.

AREAS OF FOCUS

  • Building services and APIs to search and annotate our rapidly growing robot dataset
  • Designing libraries to help us train, deploy, monitor, and understand our models
  • Full stack development of tools that leverage our libraries and services to visualize and explore Covariant’s robot data

YOU WILL

  • Work closely with the research and solutions teams to spec, develop, and ship features for our robot data platform
  • Lead and manage full-stack projects with cross-functional stakeholders
  • Build tools to search and visualize robot telemetry data and facilitate fast performance iteration
  • Implement scalable data pipelines to ingest and process robot telemetry data
  • Develop and deploy distributed systems that span customer warehouses to the public and private cloud
  • Advocate for and facilitate quality software design principles including system observability and debuggability

YOU HAVE

  • 4+ years of programming experience in modern programming languages such as Python
  • 4+ years of experience working on full stack, backend web development, or cloud infrastructure
  • Designed, built, and deployed modern web APIs
  • Designed and deployed solutions using public cloud providers like AWS
  • Experience with containerization technologies like Docker and container orchestration platforms like Kubernetes and Amazon ECS
  • Strong communication skills; able to efficiently communicate technical details to a varied audience
  • Experience with building model training infrastructure, libraries, and tools
  • The ability to work independently on open-ended cross-functional projects

NICE TO HAVES

  • Experience architecting data infrastructure for machine learning systems
  • Experience with Django and/or Postgres

SAMPLE WEEK IN THE LIFE

  • Develop a scalable data pipeline leveraging services like Amazon SQS or Kinesis
  • Design a new database model and corresponding API endpoints and views
  • Deploy a service to Kubernetes and monitor its performance
  • Triage and debug a performance issue in Postgres
  • Add a feature to a computational graph library
  • Meet with the research team to gather requirements and understand how we should support a new research project, such as training and deploying a new model
  • Prepare a technical deep-dive presentation on a project you recently completed
  • Independently run a meeting for your latest project to keep stakeholders on other teams up-to-date