View Our Website View All Jobs

DevOps - Senior Site Reliability Engineer

About Us

SignifAI is a venture backed startup based in Sunnyvale. CA and Tel Aviv.

SignifAI is machine intelligence that helps DevOps teams get to accurate answers faster by finding correlations in real-time, among very large volumes of log, event and metrics monitoring data. These correlations are driven by algorithms plus the team’s and SignifAI’s collective expertise. This means the team can get to root causes quickly, regardless of the seniority of the engineers currently on shift. And because SignifAI is a machine, it has perfect memory. This means it can match the cause and resolution of an issue from the past, with an issue happening right now. These powerful correlations also unlock predictive insights to issues that could threaten uptime in the future.

When DevOps teams deliver more uptime, they finally find the time to work on more complex problems that require creative solutions...precisely the things that machines can’t do.

Who are we looking for

We are looking for highly energetic, startup mentality and passionate site reliability (DevOps) engineer to manage our entire production platform while helping our customers to identify issues in their implementations with our product.

If you are not afraid of highly distributed scalable infrastructure, this position is for you.

The Role 

This is a unique role that involves site reliability engineering work combined with customer success responsibilities.

You will ensure highly scalable, reliable, and secure service while managing release processes, workflow, and live deployments. Helping to architect scalable deployments for complex production serving systems in AWS and Google Cloud. Automating everything. You will use Ansible and other tools to manage configuration and deployments. Continuously improving application and system monitoring, log analysis, and metrics. Improving operational tools in order to detect and rapidly respond to incidents and issues while using SignifAI's own product.

Contributing anywhere else you can to move the company’s engineering forward. You will also work behind the scene to support our customer's large production environments. This is huge opportunity to grow your skills fast with cutting edge technologies, open source libraries, and large scale challenges.

   
Your Qualifications

  • Ability to root cause sources of instability in a high-traffic, large-scale distributed system 
  • Experience with configuration and troubleshooting of Linux, Java, Tomcat, and other middleware technologies
  • Understands large-scale complex systems from a reliability perspective
  • Scripting abilities in python, go, or JVM-based languages
  • Passion for resolving reliability issues and identify strategies to mitigate going forward
  • Experience in both software engineering and cloud-based ops – having previously been in a DevOps, Site Reliability Engineer, or Operations Engineer role.
  • Passion for automation, performance, visibility, and using the best tools possible.

Pluses:

  • Computer Science degree from a strong program or professional experience (8200 or other technological unit)
  • AWS/Google Cloud, working and deploying into Kubernetes clusters.
  • Experience working on Agile projects in distributed teams.
Read More

Apply for this position

Required*
Apply with Indeed
Attach resume as .pdf, .doc, or .docx (limit 2MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*