top of page

About Data Science Projects

In my data science projects, I have successfully applied a wide range of machine learning algorithms, including regression, classification, clustering, NLP, and neural networks. Utilized techniques such as decision trees, random forests, support vector machines, and deep learning models. Achieved accurate predictions and valuable insights through feature engineering, model tuning, and cross-validation. Proficient in programming languages such as Python and R, and experienced in using libraries like scikit-learn and TensorFlow. Strong track record of delivering impactful results in diverse domains.

covid.jpg

Covid-19 Detection

In this project, I developed a Convolutional Neural Network (CNN) model to accurately classify COVID-19 patients based on their X-ray images. Achieving an impressive accuracy of 97%, the model effectively distinguished between COVID-19 positive and negative cases. I further deployed this model on a web application using Streamlit, enabling easy access and real-time predictions. By leveraging deep learning techniques and medical imaging, I contributed to the field of healthcare diagnostics. Visit my Github repository to explore this project and witness the impact of AI in combating COVID-19.

Sentiment Analysis

In this project, we have 20,492 reviews and ratings of one hotel, and based on these reviews and ratings, our objective is to predict the sentiment of each review. Firstly, I completed the data cleaning and exploratory data analysis (EDA) phase. Then, I created word clouds for the entire dataset and each rating category. I removed all stopwords and punctuation marks. After that, I performed lemmatization on the dataset to reduce the length of sentences. Lastly, I built a model using the NLTK and SKLEARN libraries, which achieved an accuracy of 86%. The model predicts whether the sentiment of the sentences is 'GOOD,' 'BAD,' or 'NEUTRAL.' Finally, I deployed the model on Streamlit. 

nlp.jpg
healthcare.jpg

Liver Disease Prediction

This is a classification project. The dataset contains blood samples from 616 individuals, encompassing five types of liver stages: cirrhosis, fibrosis, hepatitis, suspect_disease, and no_disease. Our goal is to train the model using this dataset to develop fully functional software capable of accurately diagnosing the specific type of liver disease based on a given sample.

ts.jpg

Gold Price Prediction

It's a forecasting project. We have gold price data from 01-01-2016 to 21-12-2021, and our goal is to forecast the next 30 days' gold price, meaning we have to predict the gold price from 22-12-2021 to 22-01-2022. In this project, I used a neural network model. For forecasting the next 30 days, I built an LSTM model, and it provided accurate predictions. For more details, please check the .ipynb file.

hpp.jpg

House Price Prediction

This was a Kaggle competition project, and I also enrolled in this competition. The dataset contains 1460 rows and 81 columns for training data, and 1459 rows and 80 columns for testing data. This means there is an exact half-split of data for training and testing the model, and we have to achieve higher accuracy on this large-scale testing dataset. In this project, I used the AutoSklearn model, which gave me an 87% accuracy. For exploring entire project you can check this

ETA For Food Delivery

This ETA project is also from the Kaggle competition. The dataset contains 45,593 separate text files in the training data and 11,399 text files in the testing data. It means each text file contains information about one user, and we have to build a model that can predict the estimated time for food arrival for 11,399 customers. For that kind of project, XGBoost fits perfectly. I built an XGBRegressor model. After evaluating the model's performance, it gives a 5.71 RMSE value. You can check my full project here.

eeta.jpg
ar.jpg

Airline Passenger Traffic Forecasting

This airline passenger traffic forecasting project leverages the SARIMAX model to predict future trends in air travel. By analyzing historical data, the project accurately forecasts passenger traffic, aiding airlines in their decision-making process. The SARIMAX model incorporates seasonality, trends, and exogenous variables to provide robust predictions. With a focus on accuracy and reliability, this project showcases expertise in time series analysis, data preprocessing, and model selection. It offers valuable insights for airlines and industry professionals looking to optimize operations and plan for future passenger demand.

bottom of page