alt text

Provide a centralized and easily accessible Congo River Basin rainforest data dashboard and repository

Python (FastAPI | SQLAlchemy | Pandas | Pytest) | JSON | REST API | AWS | PostgreSQL

  • Built from scratch in less than 8 weeks with a team of eight other engineers and one UI/UX designer in an Agile environment
  • Personally devised REST API functionality and unit tests, architected database with data on over 4,000 affected species, collected, scraped, and investigated messy species data
  • Designed, tested, and deployed API application using FastAPI and SQLAlchemy, deployed database on AWS RDS with PostgreSQL

Source code can be found here.


Spotify Song Suggester

Spotify Song Suggester Data Backend

Python (Flask | SQLAlchemy | Pandas | SciKit-Learn | matplotlib | NumPy) | JSON | REST API | AWS | PostgreSQL | Heroku

  • Built from scratch in less than 1 week with a team of eight other engineers in an Agile environment
  • Personally designed and built functionality for 8 REST API routes, architected database with over 131,000 tracks
  • Developed API application using Flask and SQLAlchemy, deployed database on AWS RDS with PostgreSQL, merged and analyzed data with Pandas

Source code can be found here.


Microsoft Malware Prediction

Microsoft Malware Prediction

Predictive modeling and machine learning with Microsoft Malware big dataset

Python (matplotlib | dask | NumPy | Pandas | SciKit-Learn)

  • Performed analysis individually in less than 1 week
  • Collected, cleaned, investigated, and visualized over 17,000,000 rows of data, trained and interpreted machine learning models on a 5% subset
  • Wrote a custom function for minimizing the size of a DataFrame in memory
  • Built and validated machine learning models in SciKit-Learn, manipulated data with Pandas and NumPy

Source code can be found here.


Expedia Hotel Data Storytelling

Expedia Hotel Data Storytelling

Data storytelling and visualization project on 2015 Expedia hotel data for further business analysis

Python (matplotlib | Seaborn | NumPy | Pandas | SciKit-Learn)

  • Performed analysis individually in less than 1 week
  • Collected, cleaned, investigated, and visualized data, trained and interpreted clustering algorithm on 6,000,000 rows
  • Visualized data with customized graphs built in matplotlib and Seaborn, manipulated data with Pandas and NumPy

Source code can be found here.