Member-only story
Software Engineer Interview: How to Design a Search Engine like Google
Recently, I had an interview question that involves designing a search engine like Google and to talk through the overall architecture.
Let’s break it down to the following points:
- The overall architecture
- The role of the PageRank algorithm
- The offline metrics and online experiments
The Overall Architecture
First off, it’s good to talk about the overall architecture before deep diving into the specifics. Generally, sites like Google require more than 100 BILLION websites to be indexed and around 40K searches per second. So, the system needs to be fast, scalable and needs to adapt to the latest news.
Query Understanding
When a query is created, it goes through a spell check and is expanded with additional terms such as synonyms to cover the user’s query. For Google, they usually use RankBrain. RankBrain is a machine learning based search engine that helps Google process search results and provide more relevant search results for users. According to Google, RankBrain is used 10% of the time.
The query is then matched against a database, very likely by keyword matching for very fast retrieval. A…