Woven Capital
Site Reliability Engineer, Cloud Platform

Site Reliability Engineer, Cloud Platform

Engineering · Tokyo, Japan

Job description

About Woven by Toyota Woven by Toyota, a part of the Toyota Group, is challenging the current state of mobility through human-centric innovation and empowering mobility transformation. Through our AD/ADAS technology, our automotive software development platform Arene OS, our mobility test course Toyota Woven City, and Toyota’s growth fund, Woven Capital, we are pioneering the movement of people, goods, information, and energy, weaving a future of enhanced safety, connectivity and well-being for all.

=========================================================================

TEAM Our mission is to make software development for Woven by Toyota and the greater Toyota organization as a whole more productive and efficient. We use the latest technologies to help engineering teams go faster, with safety as our top priority. Our modern, agile, and transparent services are designed to bring to life Woven by Toyota's vision of "Mobility to Love, Safety to Live."

WHO ARE WE LOOKING FOR? The Enterprise Technology SRE team collaborates with the product development team, sharing the same codebase, but with a primary focus on non-functional requirements. Our objective is to enhance production readiness and reliability. We are looking for an SRE engineer with a background in software engineering, DevOps, and cloud engineering. You will be passionate about establishing SRE best practices, and you'll report to our SRE Manager. This role is hybrid, requiring on-site presence three days per week.

RESPONSIBILITIES

Develop software systems for improved product monitoring, reliability, and development efficiency
You will have on-call responsibilities to monitor and respond to incidents, ensuring service health. Our 8-hour on-call rotation includes workdays, weekends, and holidays, and can be done remote.
Provide guidance on reliability practices throughout the software development lifecycle, including architecture and code reviews
Establish SRE best practices within product teams, including capacity planning, chaos testing, and disaster recovery drills
Learn from incidents through blameless post-mortems and address service reliability issues through hands-on coding
Enhance development and operations teams' efficiency

MINIMUM QUALIFICATIONS

Bachelor’s degree in Computer Science, Technology, Engineering, Mathematics, or equivalent practical experience
4+ years of experience in Go, Python, or a similar language. Proficient in data structures, algorithms, and software design
Intermediate to advanced level of expertise in public cloud technologies, Kubernetes, and Infrastructure as Code
Proficient in production on-call, troubleshooting, and incident management
Business level English skills

NICE TO HAVES

Hands-on experience in SRE best practices, including SLO monitoring, disaster recovery planning, chaos testing, capacity planning, automation, toil reduction and more
Experience with APM solutions and monitoring systems such as Prometheus, Wavefront, and GCP monitoring
Previous experience in an SRE, DevOps, or Platform Engineering role
AWS, GCP, or Kubernetes Certifications
Japanese language skill to talk with customers.

Org chart

This job is not in the org chart

Teams

This job is not in any teams

Site Reliability Engineer, Cloud Platform

Job description

Org chart

Teams

Related jobs

Senior Software Engineer

Data Engineer

Electrical Engineering Internship

Information System Security Officer / IT Specialist

Electrical Engineer