Image for post
Image for post

Sentiment Analysis of Movie Reviews

Classifying text data from a Data Source which consists of Movie Reviews. The processing of Text Data is mandatory before we start applying Machine Learning Techniques to them. We classified whether the Movie is having a positive or a negative rating by assigning them 1; if the rating is greater than 7 and 0 if the rating is less than 4. There are some unlabeled data that I did not include in my Analysis.


Image for post
Image for post

The following story will be a long one to read, so let’s not utilize our time in introduction to Airflow.

I was working on a Data Engineering project in which I had several tasks on my Bucket: Parsing JSON Data which consisted of various Data Sources; out of which picking up the Data Sources belonging to COVID-19. Then, I had to download the Data which was in JSON Format, convert it into a Pandas Data Frame and then Store it in Postgres Database. These activities should then be scheduled periodically using Apache Airflow (ETL Tool for Automation). …


Image for post
Image for post
Recommendation System

Imagine you are on a spree of shopping online and you looked up various Winter Jackets that you will purchase, so that you can withstand the Cold Weather with Winds this year (It is going to be a bit windy this year). You did love a jacket which belongs to a particular brand named: 1. You viewed the jacket for several minutes, read the whole description of the jacket material, the size etc. and then you wanted to explore some other brands to see if they have the similar items like the one you viewed for this long. And Boom…


Image for post
Image for post

I was working for Hartford Steam Boiler in the Summer and Fall of 2020 (remotely due to COVID-19) as a Machine Learning Intern. I did get a chance to work on large-scale Machine Learning Projects of HSB with state-of-the-art big data technologies like PySpark, SparkR, Hadoop, Scala and I was working on developing a prediction model (Bayesian Hierarchical Model) using Gibbs Sampler(R-JAGS) and writing around thousand lines of code in R. I was able to compute the lengthier chains of Markov Chain Monte Carlo ( MCMC ) and deploy the Model and its predicted Data into production. I was amazed…

Uttasarga Singh

Machine Learning Engineer / Software Developer. More than 3 years of experience in developing/deploying Machine learning Models and Web-based applications.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store