Redefining Communications in GRPC, WeJam! (part 1)
3 years ago in July 2021, I was working in Pathao. We had a massive hackathon event for 3 days. And our team Genjam(in English it means: team Chaos) won the 💫2nd place 💥. This is the first part of the story of what we did and how we did it.
What did we try to solve?
Let’s set the background story first. Pathao is an on-demand digital platform in Bangladesh. But people know it as a ride-sharing, food delivery & parcel delivery company in Bangladesh. And I would say Pathao became the largest tech platform in Bangladesh where the engineers have to solve different varieties of tech problems every day.
With so many interesting existing problems, it’s a bit tough to choose the right one 😅. So what we(team Genjam) did, we tried to find a very old and consistent problem.......
Push Notification aka. Realtime Notification.
Since the beginning of time in Pathao we have been heavily dependent on Google’s FCM. So mainly, whenever we want to send real-time information to any of the users like drivers, customers, or business partners either in the web or mobile app we use Firebase push notifications or web sockets. And we have to maintain millions of HTTP connections with our backend servers.
Why?
- First of all our web-socket service is very old and that does not scale up with our new micro-service architecture that much. Which is slowing us to enable new features in web apps.
- Google FCM is not real-time as that was stated, in Pathao every split second matters. So we needed to send the notification to our driver app within seconds. But sometimes the FCM delivers the messages after 30-60 secs, which literally Pathao can’t afford.
- Sometimes FCM acts weirdly based on geolocation sometimes notification works fine in Bangladesh while lagging notification in Nepal. And that is a nightmare for the support team. Like orders are being canceled, ride-sharing is on hold for an entire city 🙈.
- In a peak time, we are missing almost 15% of all push notifications which should be delivered in real-time daily.
- In peak time, Pathao has to maintain somewhat 100 Million active http connections with backend servers. Which causes a lot of issues.
- Server Outages
- Instant scalability
- Network Bandwidth for servers
- Network Bandwidth for http clients
- Heavy battery draining in mobile clients due to long polling
TBH, Every one of these points has its own reason to get their own attention.
And we thought we should solve this, or at least try to build a technical MVP. Win or lose, this was worth trying, and we can live up to our name Chaos 😂! This solution itself is a chaotic idea 🤯.
What was the idea?
So we were trying to build an architecture with a working prototype called WeJam 📡
🤔 How would that solve anything ⁉️
This architecture/system can have the ability to provide solutions like
- 1 connection between the server & client is enough for receiving and sending data.
- bi-directional ↔️ communication between server and clients.
- No communication lag/loss if the server and client are connected
- this can also solve dependency on FCM.
- Low Network bandwidth usage because of photobuff
- No long polling in mobile apps
- Instant Scalability
- Can be used for push notifications using bi-directional communication
For prototyping we chose these tools/technologies
Tech/Tool | Usages |
---|---|
gRPC | Communication Protocol |
gRPC web | gRPC implantation for Web Apps |
Envoy Proxy | Reverse Proxy for gRPC |
golang | Just another language 🙂 |
redis | for pub/sub communication |
mysql | Just another Database |
vue.js | For showing implementation demo in web app |
swift | For showing implementation demo in Mobile App |
So the the prototype WeJam had these features
Server
- Establish gRPC connection between server & client
- Receive data from downstream clients and pass those to proper upstream services
- Receive data from upstream services and route it to proper clients
Mobile
- Build a connection with the upstream gRPC server
- Detect connection health and auto re-connect
- Transfer data through the gRPC connection
Web
- Build a connection with the upstream gRPC server
- Transfer data through the gRPC connection
Some of the features we planned while designing the system but couldn’t implement ☹️ like,
- Implement a message communication system
- Dead letter Queue for missing messages between the gRPC server and clients
What did we do actually?
Stay Tuned for Part 2 :)