Week 14
Tuesday
No meeting held
Thursday
🎥 Meeting Recording
🚩 Agenda
- Presentation by Brennan Macaig on the role of an SRE
- Break for Pizza 🍕
- Break into teams and work on Immersion
🪄 Meeting Resources
TBD
📓 Meeting Notes
Attendance:
- In-Person: 15
- Virtual: 2
Presenter: Brennan Macaig - Site Reliability Engineer
Site Reliability Engineering - an Introduction
- Why should you listen?
- What is an SRE?
- Differences to DevOps?
- Why care about SRE?
- How is SRE implemented?
- What is it like?
- At Scale
- AWS
- Stories
- Questions
Why Listen?
- Experience with GCP & Azure
- 5 Years of experience in the field
- Worked at small and large companies
- Went to UML (2017-2020)
- Was a college tour guide and RA for a while
- Used to a full stack dev
What even is SRE?
- SRE are software engineers
- Solve operational problems
- Cross between traditional IT and Software development
Why Care?
- Limit the impacts of problems on customers
- Customers obsessed people
- Help maintain the SLA (Service level agreement)
- Improve observability into the tech stack
- Catch errors before impacting revenue
- Co-own productions with developers
- Highly focused on observability
- Telemetry
- Documentation
- Visibility of information
Why should I care?
- Rewarding jobs that re relatively high paying
- Fast paced
- Working on unique and unsolved problem
- Real impact into customer experiences
What makes an SRE an SRE?
- Be an owner
- Value customers above all
- Implement the DevOps philosophy, without being DevOps
- Create tools
Thinking like an SRE - DevOps
- 5 Principles
- Reduce org. silos
- Accept failure as NORMAL and INEVITABLE
- Implement chain gradually
- Leverage tooling and automation
- Measure everything
- SRE is not DevOps
Thinking like an SRE - Not DevOps
- Own production with devs.
- Make gradual changes and canarying and testing
- Leverage/create tooling and automation, especially to eliminate toil, tech debt, and overhead.
- Toil - "Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows"
Terms
- SLIs
- Service level indicator
- Measurement of services health or status.
- SLOs
- Service Level Objectives
- A target for your service regarding availability.
- SLAs
- Service Level Agreements
- A promise made to customers about availability.
Day in a life
- On-call and production outage management
- Writing code
- Standup, sprints, scrum, agile, JIRA tickets, investigations
- Meetings
- Meet to discuss new services
- Meet to discuss old services
- WRiting terraform
- Operations tasks
- Deal with scaling services
- Write techn plans
- Reduce techn debt
- Write scripts to reduce daily toil
- Other infra provisioning tools
- Deal with CI/CD pipelines
Customer Obsession
- Focus on positive customer experiences
- Developer's customers are the people who use the company's app
- SRE customers are developers
- Make processes better for devs
- Make workflows dev centered
- Make doing things wrong more challenging
- Protect customers from themselves
- Support when customers do custom stuff
At Scale
- Services at scale need to do?
- Reliability
- Easy to fix and work on
- Easy to deploy and create
- Be process defined/constrained
- Be monitored
- When possible, self heal
- Avoid late night pages
AWS - What do SREs touch?
- Since SREs own production, touch all prod assets
- Company dependent
- Typical things Brennan uses
- EC2, EKS, IAM, Lambda, S3, SNS, RDS, KMS, Route53, APIGateway, Step Functions, Cloudwatch, WAF, SecurityGroups (IAM)
❓ Questions
5 Years into the industry and no problem?
- Yes
- Backstory: @ UML for 2017 - 2020, not great in school, COVID happened, didn't enjoy online school, got internship. Stayed on Fall semester, over winter break decided to stay full time. Obtained a full time offer, continue working. Got job at Draft Kings after original company went down. Started to Draft Kings, then moved to Aqua Security, now a new company.
- Most important thing is getting internship experience. Internships and Co-op are very important to get hired.
Still important to get a degree
- Easier to get a degree is you have a degree or still working on a degree
Work Life balance
- Work to live
- Hard during on-call weeks
- Being able to seperate your work life and
- Company culture has more to do with the amount of worklaod you get as an SRE than the role itself.
- Depends more on your ability to segement or avoid work in your personal time and vice versa
What is ownership:
- Tagging and ensuring compliance
- Responsibility for when things go wrong, how to keep improving
🧑💻 Hands on Section
No hands on section today, good luck on exams!
🚀 Next Meeting
We'll be having a entertaining and fun night of AWS Trivia with prizes for winners! Be sure to check back on the schedule for more details soon!
🥂 Cheers!