If you haven't already read my "Introduction to Engineering Design Reviews" blog post, you should read that first.
The purpose of an Engineering Design Review (EDR) is to make sure that everyone on the team is on the same page, understands how a system is structured, and how all of the various components interact. It can also be useful in identifying deficiencies that may have been missed/overlooked as well as improvements or alternatives approaches to the design by using the team's collective brain power and experience.
Design Reviews are NOT intended to be critical in nature or formed as a presentation, but rather an open and collaborative discussion to ensure that we set each other up for success every time.
Why do we perform design reviews?
- Time efficiency.
Instead of ONLY having conversations over time with constant design changes, putting a bit more thought up front about what we are doing, why, and clarifying the requirements will save everyone time in the long run.
- Clarifying requirements.
Forcing a conversation to ensure that we have enough requirements to even be building something.
- Surfacing existing work.
Companies can be big and there may be existing components that can be leveraged.
- Onboarding new engineers.
Creates easily digestible documentation.
- Productive contribution.
Helps catch little details because someone with a different point of view jumps in.
When is a design review necessary?
- A new project, service, or component is being created
- Anytime a data contract change is proposed for an established service
- Any significant change to an existing design or deployment
- Any significant change to the requirements/acceptance criteria
Anyone is welcome to participate in the reviews, though there are a couple of ground rules that everyone needs to follow to keep things as time efficient as possible:
- No typing except for the note taker (sessions are recorded and archived)
- Only those who have read the supporting documents prior to the meeting may participate
- We are not necessarily looking to answer questions or solve problems in the room, rather understand objections and questions
Design Review process
- Write the engineering design document (see Design Document Structure at the bottom of this page)
- Background (required)
- Design goals (required)
- System diagram
- Design summary
- Design details
- Tradeoffs made
- Include any supporting documentation
- The team reviews individually and adds any questions they may have
- Review of action items
- Review ground rules/objectives (if new attendees present)
- Quick run-through for a big picture review (eg. are we in the right ballpark? what's the use cases?)
- Slower, more thorough review
- Next steps for the system discussed
- Questions for management
- Questions for end-users/customers
- Parking lot (interesting ideas that we won't pursue at this time)
Design Document Sample Structure
## Background Background on the problem we are trying to solve (should include some business value - or at least, WHY). ## Design Goals Requirements and goals of the project. This should include any RTO, RPO, SLO, load assumptions, etc. ## Solution Summary Summary of the solution in a paragraph or two. Should include assumptions and design limitations or unknown variables (including any proposed divestiture plans or proposals). ## Solution Details Nitty-gritty details, add/remove sections as necessary. The ones included below are examples. ## System Diagram UML, data flows, architecture diagram, etc. ## Wire Frames Is there a UI that needs to be completed? Are there wireframes or user flows for it? ## Code Are we leveraging any common repositories or libraries? Any open-source packages to accelerate/simplify things? Is this expanding an existing repository or a new repository? ## Testing What kind of testing are we going to do? Unit testing, integration/regression testing, etc. ## Scaling What aspects of scaling this up do have we considered? If the traffic or data stores grow 10x, are we still okay? What about 100x, 1000x, 10000x? Where are we going to have a problem? ## Operational Details How are we going to monitor this? Is this deemed a critical system to be added to on-call? What are the key performance indicators and metrics that need to be tracked. What calculations are we going to use to determine availability and other metrics? Are we going to alert on these? What do we do with these events? ## Tradoffs Made What big trade-offs were made coming up with this design? ## Other Proposals What other options were looked at or evaluated and weren't going to work? Why not? ## Appendix Are there any external supporting docs or references that would help to read/review?