Job Opportunity for Senior Site Reliability Engineer (Data Engineering) at Wikimedia Foundation

  • Remote

Website Wikimedia Foundation

Summary

The Wikimedia Foundation is looking for a Senior Site Reliability Engineer (SRE) to join our team, reporting to the Director of Data Engineering. As an SRE on the Data Engineering team, you will be responsible for building, maintaining and operating the shared data infrastructure that empowers the use of data at the Foundation as well as the Wikimedia Movement. You will be part of a larger community of SREs where you’ll have plenty of space and opportunities to learn and get familiar with our tech. For more details about our stack see: https://wikitech.wikimedia.org/wiki/Data_Engineering

We are a fully remote, internationally distributed team. We see each other in person 1-2 times a year during one of our off-sites (the last few have been in places like Copenhagen, Majorca and New York) or Wikimania, the annual international conference for the Wiki community.

You are responsible for:

  • Deployment, configuration and maintenance of the distributed data systems that comprise our data platform. Our stack includes Hadoop, Kafka, Spark, Cassandra, Presto, Druid, Airflow, Superset, DataHub, Turnilo  
  • Monitoring of systems and services, optimization of performance and resource utilization
  • Cookbook/runbook implementation for common maintenance actions
  • Development and maintenance of data platform infrastructure running on Kubernetes as well as Ceph storage, virtual machines and bare metal servers.
  • Automation and streamlining of tasks as well as identifying process gaps

Skills and Experience:

  • 5+ of years experience in an SRE/Operations/DevOps role as part of a team
  • Experience supporting high availability distributed production systems
  • Comfortable with configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.), and modern observability infrastructure (monitoring, metrics and logging)
  • Comfortable with shell and scripting languages such as Python, Go, Bash, Ruby
  • Good understanding of Linux/Unix fundamentals and debugging skills
  • Excellent written and verbal communication skills
  • BS or MS degree, preferably in Computer Science, or equivalent work experience

Qualities that are important to us:

  • Commitment to the mission of the organization and our values
  • Commitment to our guiding principles
  • Commitment to diversity, equity, and inclusion
  • Cross-cultural sensitivity and awareness
  • Collaborative working experience

Additionally, we’d love it if you have:

  • Experience working with Big Data 
  • Experience with Hadoop, Yarn, Spark and Kafka
  • Experience implementing containerized applications (Docker, Kubernetes)
  • Experience with package management for operating systems (Debian, etc)
  • We are avid supporters (and users) of open source software; history of contributing to Open
  • Source projects is valued
  • Prior participation in the Wikimedia movement

To apply for this job please visit boards.greenhouse.io.