Karan Patil | Data Analyst
About Data Science Projects
In my data science projects, I have successfully applied a wide range of machine learning algorithms, including regression, classification, clustering, NLP, and neural networks. Utilized techniques such as decision trees, random forests, support vector machines, and deep learning models. Achieved accurate predictions and valuable insights through feature engineering, model tuning, and cross-validation. Proficient in programming languages such as Python and R, and experienced in using libraries like scikit-learn and TensorFlow. Strong track record of delivering impactful results in diverse domains.
Covid-19 Detection
In this project, I developed a Convolutional Neural Network (CNN) model to accurately classify COVID-19 patients based on their X-ray images. Achieving an impressive accuracy of 97%, the model effectively distinguished between COVID-19 positive and negative cases. I further deployed this model on a web application using Streamlit, enabling easy access and real-time predictions. By leveraging deep learning techniques and medical imaging, I contributed to the field of healthcare diagnostics. Visit my Github repository to explore this project and witness the impact of AI in combating COVID-19.
Sentiment Analysis
In this project, we have 20,492 reviews and ratings of one hotel, and based on these reviews and ratings, our objective is to predict the sentiment of each review. Firstly, I completed the data cleaning and exploratory data analysis (EDA) phase. Then, I created word clouds for the entire dataset and each rating category. I removed all stopwords and punctuation marks. After that, I performed lemmatization on the dataset to reduce the length of sentences. Lastly, I built a model using the NLTK and SKLEARN libraries, which achieved an accuracy of 86%. The model predicts whether the sentiment of the sentences is 'GOOD,' 'BAD,' or 'NEUTRAL.' Finally, I deployed the model on Streamlit.
Liver Disease Prediction
This is a classification project. The dataset contains blood samples from 616 individuals, encompassing five types of liver stages: cirrhosis, fibrosis, hepatitis, suspect_disease, and no_disease. Our goal is to train the model using this dataset to develop fully functional software capable of accurately diagnosing the specific type of liver disease based on a given sample.
Gold Price Prediction
It's a forecasting project. We have gold price data from 01-01-2016 to 21-12-2021, and our goal is to forecast the next 30 days' gold price, meaning we have to predict the gold price from 22-12-2021 to 22-01-2022. In this project, I used a neural network model. For forecasting the next 30 days, I built an LSTM model, and it provided accurate predictions. For more details, please check the .ipynb file.
House Price Prediction
This was a Kaggle competition project, and I also enrolled in this competition. The dataset contains 1460 rows and 81 columns for training data, and 1459 rows and 80 columns for testing data. This means there is an exact half-split of data for training and testing the model, and we have to achieve higher accuracy on this large-scale testing dataset. In this project, I used the AutoSklearn model, which gave me an 87% accuracy. For exploring entire project you can check this
ETA For Food Delivery
This ETA project is also from the Kaggle competition. The dataset contains 45,593 separate text files in the training data and 11,399 text files in the testing data. It means each text file contains information about one user, and we have to build a model that can predict the estimated time for food arrival for 11,399 customers. For that kind of project, XGBoost fits perfectly. I built an XGBRegressor model. After evaluating the model's performance, it gives a 5.71 RMSE value. You can check my full project here.
Airline Passenger Traffic Forecasting
This airline passenger traffic forecasting project leverages the SARIMAX model to predict future trends in air travel. By analyzing historical data, the project accurately forecasts passenger traffic, aiding airlines in their decision-making process. The SARIMAX model incorporates seasonality, trends, and exogenous variables to provide robust predictions. With a focus on accuracy and reliability, this project showcases expertise in time series analysis, data preprocessing, and model selection. It offers valuable insights for airlines and industry professionals looking to optimize operations and plan for future passenger demand.