Provide a centralized and easily accessible Congo River Basin rainforest data dashboard and repository
Python (FastAPI | SQLAlchemy | Pandas | Pytest) | JSON | REST API | AWS | PostgreSQL
- Built from scratch in less than 8 weeks with a team of eight other engineers and one UI/UX designer in an Agile environment
- Personally devised REST API functionality and unit tests, architected database with data on over 4,000 affected species, collected, scraped, and investigated messy species data
- Designed, tested, and deployed API application using FastAPI and SQLAlchemy, deployed database on AWS RDS with PostgreSQL
Source code can be found here.

Spotify Song Suggester Data Backend
Suggest new tracks on Spotify related to user’s input based on track’s audio features
Python (Flask | SQLAlchemy | Pandas | SciKit-Learn | matplotlib | NumPy) | JSON | REST API | AWS | PostgreSQL | Heroku
- Built from scratch in less than 1 week with a team of eight other engineers in an Agile environment
- Personally designed and built functionality for 8 REST API routes, architected database with over 131,000 tracks
- Developed API application using Flask and SQLAlchemy, deployed database on AWS RDS with PostgreSQL, merged and analyzed data with Pandas
Source code can be found here.

Microsoft Malware Prediction
Predictive modeling and machine learning with Microsoft Malware big dataset
Python (matplotlib | dask | NumPy | Pandas | SciKit-Learn)
- Performed analysis individually in less than 1 week
- Collected, cleaned, investigated, and visualized over 17,000,000 rows of data, trained and interpreted machine learning models on a 5% subset
- Wrote a custom function for minimizing the size of a DataFrame in memory
- Built and validated machine learning models in SciKit-Learn, manipulated data with Pandas and NumPy
Source code can be found here.

Expedia Hotel Data Storytelling
Data storytelling and visualization project on 2015 Expedia hotel data for further business analysis
Python (matplotlib | Seaborn | NumPy | Pandas | SciKit-Learn)
- Performed analysis individually in less than 1 week
- Collected, cleaned, investigated, and visualized data, trained and interpreted clustering algorithm on 6,000,000 rows
- Visualized data with customized graphs built in matplotlib and Seaborn, manipulated data with Pandas and NumPy
Source code can be found here.
