data analysis experience

From working with python, to querying APIs, and using visualization software.

thumbnail

Investigation of a dataset

The objective of this investigation was to determine what factors may be helpful in predicting whether or not a patient would miss their scheduled hospital appointment. For this investigation, I used a dataset from Kaggle that collects information from over 100k medical appointments in Brazil.
I highlighted some of the possible factors that may determine whether or not a patient will show up for their appointment, and carried out exploratory data analysis on each highlighted variable to discover any correlations between that variable and the dependent variable - the "no-show" variable.

Here's the full analysis
thumbnail

Data wrangling

For this project, I downloaded the 'weRateDogs' twitter archives, and programmatically downloaded the image predictions using the get method of the 'requests' library, and saved the file into another pandas DataFrame.
I queried the Twitter API to get the number of retweets and favourites for each tweet in the Twitter archives DataFrame using the Twitter id as the identifier, I then saved the results into a text file which I read into a third pandas DataFrame, and cleaned the datasets.
view my complete wrangling here, or checkout this summarized wrangle report. Alternatively, you can read this blog-style post about my insights.

thumbnail

data visualization

In this project, I carried out an exploratory data analysis on a loan dataset from ProsperLoans that contains various variables related to their borrowers and the company's loan services. The dataset was provied by Udacity and this helpful data dictionary sheds some light on the various variables.
From my analysis, I was able to visualize how factors like APR, loan amount, listing category and some other factors influence the outcome of a loan status. Here's a short slideshow just incase you're in a rush.

Have you got some time? Take a look at the full project here