Project Sentiment Tracker Using AWS Comprehend + Serverless in 1 hour
Using AWS Comprehend + DynamoDB + Serverless
Github is a great place to contribute and create amazing projects. As the project scales, so does the team and the issues from users. Most are clear, concise and contructive to the project. Developers leave comments and answer these questions. For a large team, maintaining a positive sentiment around the project needs to be tracked. The goal of this project was to provide a fast way to keep a signboard about the overall project sentiment.
AWS Comprehend, provides a NLP service to provide insights on contextual text including sentiment analysis. I wanted to experiment how quickly we can go from idea to deployment for a trivial service without having to manage any resources. Below is my experience to solve the project sentiment problem using AWS at the heart of the solution.
Project Sentiment Tracker
You can view it at: https://st.arif.work/
- Processes the recent issues and comments in the project
- Analysis the text for sentiment patterns
- Provides a quick badge for project owners
Analyzer - AWS Comprehend
AWS Comprehend was the obvious choice for setting up this low volume requirement. The API is easy and clear with no surprises on the usage pricing. Although it can do a lot more, I used it purely for the sentiment requirement. In the future, I hope to use Keyphrase Extraction, Topic Modeling and Entity Recognition to automatically tag issues and comments and even possibly assign them automatically.
Platform - Serverless - AWS Lambda
The API endpoints are simple. Apart from a small Oauth workflow, everything can be achieved without any added complexity. A perfect recipe for a serverless solution using AWS Lambda. I used Zappa and Flask as my deployment framework.
Database - DynamoDB
I’ve never used DynamoDB before in production. There are some nuances to its usage but for this small usecase, it was the easiest to get going. I used boto3 to connect to AWS services which is a breeze.
Github Connector - PyGithub
Presentation - Shields.io
Shields.io is the gold standard in presenting data points for a Github project. Apart from the vast array of services it supports (I was really surprised how much comes out of the box!), it also allows to integrate your own endpoint. The tracker integrates with shields to give the repo owner a badge that shows the current sentitment of the project.
The idea is simple:
- Submit a Github repo name
- Authenticate with Github
- Tracker connects with Github and fetches the most recent issues and comments
- Submit them to AWS Comprehend for analysis
- Store response in cache for 24 hours
- Shields.io makes badge requests which is served from sentiment cache
Source code is available at https://github.com/kontinuity/github-sentiment-tracker
- Oauth support for private repos
- Refresh repo sentiment on updates
- Circumvent Github rate limits
In part 2, I will walkthrough the setup and organization of code. Hope you enjoyed the article.