Projects

Ongoing and previous work

Provide a centralized and easily accessible Congo River Basin rainforest data dashboard and repository

Python (FastAPI | SQLAlchemy | Pandas | Pytest) | JSON | REST API | AWS | PostgreSQL

Built from scratch in less than 8 weeks with a team of eight other engineers and one UI/UX designer in an Agile environment
Personally devised REST API functionality and unit tests, architected database with data on over 4,000 affected species, collected, scraped, and investigated messy species data
Designed, tested, and deployed API application using FastAPI and SQLAlchemy, deployed database on AWS RDS with PostgreSQL

Source code can be found here.

Spotify Song Suggester Data Backend

Python (Flask | SQLAlchemy | Pandas | SciKit-Learn | matplotlib | NumPy) | JSON | REST API | AWS | PostgreSQL | Heroku

Built from scratch in less than 1 week with a team of eight other engineers in an Agile environment
Personally designed and built functionality for 8 REST API routes, architected database with over 131,000 tracks
Developed API application using Flask and SQLAlchemy, deployed database on AWS RDS with PostgreSQL, merged and analyzed data with Pandas

Source code can be found here.

Microsoft Malware Prediction

Predictive modeling and machine learning with Microsoft Malware big dataset

Python (matplotlib | dask | NumPy | Pandas | SciKit-Learn)

Performed analysis individually in less than 1 week
Collected, cleaned, investigated, and visualized over 17,000,000 rows of data, trained and interpreted machine learning models on a 5% subset
Wrote a custom function for minimizing the size of a DataFrame in memory
Built and validated machine learning models in SciKit-Learn, manipulated data with Pandas and NumPy

Source code can be found here.

Expedia Hotel Data Storytelling

Data storytelling and visualization project on 2015 Expedia hotel data for further business analysis

Python (matplotlib | Seaborn | NumPy | Pandas | SciKit-Learn)

Performed analysis individually in less than 1 week
Collected, cleaned, investigated, and visualized data, trained and interpreted clustering algorithm on 6,000,000 rows
Visualized data with customized graphs built in matplotlib and Seaborn, manipulated data with Pandas and NumPy

Source code can be found here.