Remote
Posted 3 years ago

Website Wikimedia Foundation

Summary

The Wikimedia Foundation is looking for a Senior Site Reliability Engineer (SRE) to join our team, reporting to the Director of Data Engineering. As an SRE on the Data Engineering team, you will be responsible for building, maintaining and operating the shared data infrastructure that empowers the use of data at the Foundation as well as the Wikimedia Movement. You will be part of a larger community of SREs where you’ll have plenty of space and opportunities to learn and get familiar with our tech. For more details about our stack see: https://wikitech.wikimedia.org/wiki/Data_Engineering

We are a fully remote, internationally distributed team. We see each other in person 1-2 times a year during one of our off-sites (the last few have been in places like Copenhagen, Majorca and New York) or Wikimania, the annual international conference for the Wiki community.

You are responsible for:

Deployment, configuration and maintenance of the distributed data systems that comprise our data platform. Our stack includes Hadoop, Kafka, Spark, Cassandra, Presto, Druid, Airflow, Superset, DataHub, Turnilo
Monitoring of systems and services, optimization of performance and resource utilization
Cookbook/runbook implementation for common maintenance actions
Development and maintenance of data platform infrastructure running on Kubernetes as well as Ceph storage, virtual machines and bare metal servers.
Automation and streamlining of tasks as well as identifying process gaps

Skills and Experience:

5+ of years experience in an SRE/Operations/DevOps role as part of a team
Experience supporting high availability distributed production systems
Comfortable with configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.), and modern observability infrastructure (monitoring, metrics and logging)
Comfortable with shell and scripting languages such as Python, Go, Bash, Ruby
Good understanding of Linux/Unix fundamentals and debugging skills
Excellent written and verbal communication skills
BS or MS degree, preferably in Computer Science, or equivalent work experience

Qualities that are important to us:

Commitment to the mission of the organization and our values
Commitment to our guiding principles
Commitment to diversity, equity, and inclusion
Cross-cultural sensitivity and awareness
Collaborative working experience

Additionally, we’d love it if you have:

Experience working with Big Data
Experience with Hadoop, Yarn, Spark and Kafka
Experience implementing containerized applications (Docker, Kubernetes)
Experience with package management for operating systems (Debian, etc)
We are avid supporters (and users) of open source software; history of contributing to Open
Source projects is valued
Prior participation in the Wikimedia movement

To apply for this job please visit boards.greenhouse.io.