Site Reliability Engineer

Other Jobs To Apply

<div class="container-3Gm1a"><b>Overview</b><br><p>Azure Specialized collaboratively work to bring the next generation of workloads to our Public Cloud platform. We work together across Microsoft to enable end to end new scenarios for Azure customers. Our team imagines and builds differentiating customer features and fundamental building blocks at the heart of the Azure platform working collaboratively with many industry partners.</p><p><br>We are a highly impactful team with robust growth opportunities. If you are interested in working on the latest areas that will help you develop skills in AI infrastructure, Cloud services, and Security, this is the team you are looking for! We are a small, agile and nimble team in Azure, focused on bringing the state of the art of mission-critical software into Microsoft.<br><br>As a SRE II in Azure Specialized, you will gain valuable experience in service architecture, datacenter networking, monitoring and security as well as working with partner teams. You have the opportunity to work on control and data plane enablement required by Azure Specialized workloads. A primary focus is designing, developing, deploying, managing, and monitoring various product features and infrastructure. This will allow you to develop backend infrastructure supporting diverse services. The work for this position will cross many layers of Azure Services, presenting unique engineering challenges. This role also offers great opportunities to work with many partner teams and gain broad exposure to control plane and data plane technologies end-to-end.<br><br>Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.</p><br><br><b>Responsibilities</b><br><ul><li>Acts as a Designated Responsible Individual (DRI) working on call to monitor service for degradation, downtime, or interruptions. Alerts stakeholders as to the status and gains approval to restore system/product/service for simple problems. Responds within Service Level Agreement (SLA) timeframe. Escalate issues to appropriate owners. </li><li>Contributes to efforts to collect, classify, and analyze data with little oversight on a range of metrics (e.g., health of the system, where bugs might be occurring). Contributes to the refinement of product features by escalating findings from analyses to inform decisions regarding the engineering of products. </li><li>Contributes to the development of automation within production and deployment of a complex product feature. Runs code in simulated, or other non-production environments to confirm functionality and error-free runtime for products with little to no oversight. </li><li>Contributes to efforts to ensure the correct processes are followed to achieve a high degree of security, privacy, safety, and accessibility. Checks for visible evidence to demonstrate compliance for product areas. Develops and holds an understanding of the implications of onboarding new technologies following expectations of compliance at Microsoft. </li><li>Remains current in skills by investing time and effort into staying abreast of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale. <br>Applies best practices to reliably build code that is based on well-established methods. Follows best practices for product development and scaling to customer requirements and applies best practices for meeting scaling needs and performance expectations. </li><li>Maintains communication with key partners across the Microsoft ecosystem of engineers. Considers partners across teams and their end goals for products to drive and achieve desirable user experiences and fitting the dynamic needs of partners/customers through product development. </li><li>Maintains operations of live service as issues arise on a rotational, on-call basis. Implements solutions and mitigations to more complex issues impacting performance or functionality of Live Site service and escalates as necessary. Reviews and writes issues postmortem and shares insights with the team.</li><li>Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions. Alerts stakeholders as to status and initiates actions to restore system/product/service for simple problems and complex problems when appropriate. Responds within Service Level Agreement (SLA) timeframe. Drives efforts to reduce incident volume, looking globally at incidences and providing broad resolutions. Escalates issues to appropriate owners.</li></ul><br><br><b>Qualifications</b><br><p><strong>Required/Minimum Qualifications:</strong></p><ul><li>Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.</li><li>1+ years experience managing physical infrastructure </li></ul><p>Other Qualifications:</p><ul><li>Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: <ul><li>Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.</li></ul></li></ul><p><strong>Additional or Preferred Qualifications:</strong></p><ul><li><div>Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.</div></li><li><div>2+ years technical experience working with large-scale cloud or distributed systems.</div></li><li>1+ year(s) people management experience.</li><li>Experience working on large-scale distributed services with on-call responsibilities.  </li><li>Ability to build and influence broadly towards common goals and priorities.  </li><li>Ownership of end-to-end project lifecycle with solid project management and communication skills.</li><li>Experience with managing physical infrastructure, supporting GPUs and InfiniBand</li></ul> <br><br><p>Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $100,600 - $199,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $131,400 - $215,400 per year. </p><p></p> <p>Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:<br><a href="https://careers.microsoft.com/us/en/us-corporate-pay">https://careers.microsoft.com/us/en/us-corporate-pay</a></p><br><p>This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.</p><br><hr><br><p>Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about <a href="https://careers.microsoft.com/v2/global/en/accessibility.html"><b><u>requesting accommodations.</u></b></a></p> </div>

Back to blog

Common Interview Questions And Answers

1. HOW DO YOU PLAN YOUR DAY?

This is what this question poses: When do you focus and start working seriously? What are the hours you work optimally? Are you a night owl? A morning bird? Remote teams can be made up of people working on different shifts and around the world, so you won't necessarily be stuck in the 9-5 schedule if it's not for you...

2. HOW DO YOU USE THE DIFFERENT COMMUNICATION TOOLS IN DIFFERENT SITUATIONS?

When you're working on a remote team, there's no way to chat in the hallway between meetings or catch up on the latest project during an office carpool. Therefore, virtual communication will be absolutely essential to get your work done...

3. WHAT IS "WORKING REMOTE" REALLY FOR YOU?

Many people want to work remotely because of the flexibility it allows. You can work anywhere and at any time of the day...

4. WHAT DO YOU NEED IN YOUR PHYSICAL WORKSPACE TO SUCCEED IN YOUR WORK?

With this question, companies are looking to see what equipment they may need to provide you with and to verify how aware you are of what remote working could mean for you physically and logistically...

5. HOW DO YOU PROCESS INFORMATION?

Several years ago, I was working in a team to plan a big event. My supervisor made us all work as a team before the big day. One of our activities has been to find out how each of us processes information...

6. HOW DO YOU MANAGE THE CALENDAR AND THE PROGRAM? WHICH APPLICATIONS / SYSTEM DO YOU USE?

Or you may receive even more specific questions, such as: What's on your calendar? Do you plan blocks of time to do certain types of work? Do you have an open calendar that everyone can see?...

7. HOW DO YOU ORGANIZE FILES, LINKS, AND TABS ON YOUR COMPUTER?

Just like your schedule, how you track files and other information is very important. After all, everything is digital!...

8. HOW TO PRIORITIZE WORK?

The day I watched Marie Forleo's film separating the important from the urgent, my life changed. Not all remote jobs start fast, but most of them are...

9. HOW DO YOU PREPARE FOR A MEETING AND PREPARE A MEETING? WHAT DO YOU SEE HAPPENING DURING THE MEETING?

Just as communication is essential when working remotely, so is organization. Because you won't have those opportunities in the elevator or a casual conversation in the lunchroom, you should take advantage of the little time you have in a video or phone conference...

10. HOW DO YOU USE TECHNOLOGY ON A DAILY BASIS, IN YOUR WORK AND FOR YOUR PLEASURE?

This is a great question because it shows your comfort level with technology, which is very important for a remote worker because you will be working with technology over time...