Redefining Communications in GRPC, WeJam! (part 1)
3 years ago in July 2021, I was working in Pathao. We had a massive hackathon event for 3 days. And our team Genjam(in English it means: team Chaos) won the 💫2nd place 💥. This is the first part of the story of what we did and how we did it.
What did we try to solve?
Let’s set the background story first. Pathao is an on-demand digital platform in Bangladesh. But people know it as a ride-sharing, food delivery & parcel delivery company in Bangladesh. And I would say Pathao became the largest tech platform in Bangladesh where the engineers have to solve different varieties of tech problems every day.
With so many interesting existing problems, it’s a bit tough to choose the right one 😅. So what we(team Genjam) did, we tried to find a very old and consistent problem.......
Push Notification aka. Realtime Notification.
Since the beginning of time in Pathao we have been heavily dependent on Google’s FCM. So mainly, whenever we want to send real-time information to any of the users like drivers, customers, or business partners either in the web or mobile app we use Firebase push notifications or web sockets. And we have to maintain millions of HTTP connections with our backend servers.
💡 And in either way those are becoming liabilities.
Why?
First of all our web-socket service is very old and that does not scale up with our new micro-service architecture that much. Which is slowing us to enable new features in web apps.
Google FCM is not real-time as that was stated, in Pathao every split second matters. So we needed to send the notification to our driver app within seconds. But sometimes the FCM delivers the messages after 30-60 secs, which literally Pathao can’t afford.
Sometimes FCM acts weirdly based on geolocation sometimes notification works fine in Bangladesh while lagging notification in Nepal. And that is a nightmare for the support team. Like orders are being canceled, ride-sharing is on hold for an entire city 🙈.
In a peak time, we are missing almost 15% of all push notifications which should be delivered in real-time daily.
In peak time, Pathao has to maintain somewhat 100 Million active http connections with backend servers. Which causes a lot of issues.
Server Outages
Instant scalability
Network Bandwidth for servers
Network Bandwidth for http clients
Heavy battery draining in mobile clients due to long polling
TBH, Every one of these points has its own reason to get their own attention.
And we thought we should solve this, or at least try to build a technical MVP. Win or lose, this was worth trying, and we can live up to our name Chaos 😂! This solution itself is a chaotic idea 🤯.
What was the idea?
So we were trying to build an architecture with a working prototype called WeJam 📡
💡 💥 A GRPC based communication system for Clients & Servers
🤔 How would that solve anything ⁉️
This architecture/system can have the ability to provide solutions like
1 connection between the server & client is enough for receiving and sending data.
bi-directional ↔️ communication between server and clients.
No communication lag/loss if the server and client are connected
this can also solve dependency on FCM.
Low Network bandwidth usage because of photobuff
No long polling in mobile apps
Instant Scalability
Can be used for push notifications using bi-directional communication
For prototyping we chose these tools/technologies
Tech/Tool Usages gRPC Communication Protocol gRPC web gRPC implantation for Web Apps Envoy Proxy Reverse Proxy for gRPC golang Just another language 🙂 redis for pub/sub communication mysql Just another Database vue.js For showing implementation demo in web app swift For showing implementation demo in Mobile App
So the the prototype WeJam had these features
Server
Establish gRPC connection between server & client
Receive data from downstream clients and pass those to proper upstream services
Receive data from upstream services and route it to proper clients
Mobile
Build a connection with the upstream gRPC server
Detect connection health and auto re-connect
Transfer data through the gRPC connection
Web
Build a connection with the upstream gRPC server
Transfer data through the gRPC connection
Some of the features we planned while designing the system but couldn’t implement ☹️ like,
Implement a message communication system
Dead letter Queue for missing messages between the gRPC server and clients
What did we do actually?
Stay Tuned for Part 2 :)