Python

I started my python journey learning how to visualise, clean and present data. Nowadays as an analyst, python is more of an automation tool, writing scripts to automate the ingestion of data, whereas the visualisation is mostly left to Tableau. Therefore, the following notebooks either represent the work before starting my career in data, or data science projects i've done out of interest.

Statistical analysis

Statistical analysis - Naive Bayes spam filter (python)

Building a messaging spam filter using a Naive Bayes algorithm, training and test data.

Statistical analysis - Lottery addiction probability app (python)

Calculating probabilities in order to build some of the logic for an app designed to help lottery addicts.

Statistical analysis - Market analysis (python)

Examining the best markets for an E-learning company to advertise in, based upon average spend by location and language preferences. As part of the analysis, I visualise the distribution and identify outliers that may impact the data.

Statistical analysis - Fandango movie ratings controversy (python)

Based on a piece of popular data analysis from 2015, I explore Fandango's movie ratings by comparing two samples of popular movies from different years for inconsistencies in ratings. I use kernel density estimate (KDE) plots and create some summary statistics of my findings.

Visualisation, cleaning, analysis

Visualisation and analysis - popular data science questions (python)

Inferring popular data science topics from StackExchange's data science posts. Seaborn heat maps and other visuals, tracking interest of topics over time, public data base querying.

Data cleaning - Star Wars survey (python)

Just a short data cleaning walthrough, based on data collected by the team over at FiveThirtyEight regarding public opinion on the best Star Wars film. This was a good dataset to work with for data cleaning practice, as multiple answers were allowed for some of the questions.

Data visualisation - storytelling (python)

Putting into practice how to use information design principles such as data-ink ratios to create an engaging graph for a more general audience, using pandas and matplotlib.

Data cleaning - SAT scores (python)

A complete data-cleaning walkthrough for multiple datasets concerning American SAT scores in New York City (NYC).

Data cleaning - employee exit surveys (python)

Exit survey results from employees of the Department of Education, Training and Employment (DETE), and Technical and Further Education (TAFE) institute in Queensland, Australia. Aiming to demonstrate some data cleaning techniques and working with some missing data.

Data visualisation - Matplotlib and Seaborn (python)

Analysing a dataset about traffic conditions on an American interstate highway, to determine some indicators of heavy traffic.

Data cleaning - NumPy and Pandas (python)

First attempt at using NumPy and Pandas for a project (have used them subsequently and frequently). This dataset consists of used car listings on German eBay. The aim is to clean and analyse the data.

Analysing post feed from Hacker News (python)

This was my second ever project, no libraries used. Baby steps, part 2.

Analysing user engagement with free apps (python)

This was my first ever project, no libraries used. The scenario used is from the perspective of a data analyst at a company that makes free apps. The data has to be analysed to help developers make their next product.