
Description of the position
System Expert.
Stimulating. Motivating. Challenging.
Do you have a DevOps mindset? Do you enjoy working with cutting-edge technology? Our office in Spain has a young spirit and there are many colleagues with a cloud computing background waiting for you :)
Location: Madrid, Spain
NAGRA, a digital TV division of the Kudelski Group provides security and multiscreen user experience solutions for the monetization of digital media. The company offers content providers and DTV operators worldwide secure, open, integrated platforms and applications over broadcast, broadband and mobile platforms, enabling compelling and personalized viewing experiences.
System Expert
Mission
You will be part of our Cloud Operations Team which works on R&D activities to develop and deliver the Nagra Pay TVs Next Generation solutions for our customers.
As a System Expert of the Site Reliability Engineer (SRE) team, you are responsible of deploying and maintaining our SaaS services up and running. You are a major actor on the continuous improvement of the automation, performances, scalability, and monitoring of our solutions. You will join a friendly team with strong leadership whose members share a collaborative mindset with people from development and operations.
Responsibilities
- Manage and improve the reliability and performance of cloud infrastructure and applications deployed on Public Cloud and part of Digital TV ecosystems.
- Work with product engineering teams to define, deliver and fine-tune automated deployment/operations/monitoring tools or new product features.
- Deploy and upgrade systems in AWS environment, based on EC2 IaaS, EKS, AWS services and databases.
- Contribute to software and architecture design based on performance and reliability observations, on industry best practices.
- Follow AWS services evolution and propose related product improvements.
- Define metrics, collect data, and improve service monitoring to detect problems before they’re visible to customers.
- Predict future systems failures and work proactively to mitigate them.
- Optimize and tune systems to ensure reliably and highly availability in a 24/7 production environment.
- Troubleshoot and solve production issues from the network and application layers all the way down to the infra level including reliability issues (availability, performance, monitoring, etc.).
- Collaborate with security architects to define and implement cloud security features based on our ISMS and best practices.
- Monitor and improve cloud costs by implementing cloud best practices.
- Participate in 24/7 on-call rotation policy to monitor service and respond to any emergency problems.
Requirements / Profile
- 5+ years’ experience as System Expert, Site Reliability Engineer or similar.
- Strong knowledge of AWS Public Cloud and containers technologies (Docker, Kubernetes/EKS) and networking.
- Strong knowledge with Linux systems in a production environment including configuration, networking and basic scripting (Python, Bash, Ruby, or any other).
- Strong knowledge and experience in cloud security implementation.
- Strong knowledge with logging and monitoring tools such as ELK, Prometheus or Grafana.
- Good experience with solutions deployment and monitoring in AWS.
- Ability to develop tools and scripts to automate infrastructure deployment/upgrades.
- Familiar with Agile framework, DevOps concepts.
- Good spoken and written English skills.
Technical Knowledge Required
- AWS services (Load-balancer, Autoscaling groups, EKS, S3, Lambda, EC2, VPC, Security Groups, etc) – AWS certifications are a plus.
- Kubernetes knowledge: Standard deployment and usage of Kubernetes product – K8s certifications are a plus.
- Service Mesh concepts – Istio is a plus.
- Cloud deployment tools: Ansible, Terraform, CloudFormation, Helm.
- Database technologies including SQL (like Oracle) and NoSQL databases (Cassandra, Mongo, Dynamo DB, ElasticSearch).
- Monitoring tools: Grafana, Prometheus, AlertManager, Kiali
- CI/CD pipelines architecture and technologies: Jenkins, GitLab, AWS CodePipeline
- Nginx
- AWS Cloud security concepts and tooling: IAM, CloudTrail, Secrets Manager, DDOS protection tools, CloudFlare, etc.
- Hands-on experience in Linux.
- Experience working with version control systems such as CodeCommit/GitHub/GitLab.
- Network: firewall, TCP, firewall, routing, ACLs
- KeyCloak user management tools
Soft Skills
- Creative problem-solver, exhibiting common sense, analytical reasoning skills and taking ownership of issues to the point of resolution. Attention to detail.
- Fast learner, capable to quickly understand complex systems.
- Strong team player able to work with multi-cultural teams located in different countries.
- Strong customer focus and able to understand different needs of different teams he/she would work with.
- Motivated, autonomous, methodical, and proactive.
- Effective communication, written and oral, with various stakeholders.
Reference : 13154
Publication Date : 13-09-2022