Correlation in data

An overview

5 min readFeb 9, 2018

--

Perhaps the first step towards getting useful information out of a dataset is knowing how the individual parts correlate with one another.

In a nutshell, correlation describes how one set of numbers relates to the other, if they then show some relationship, we can use this insight to explore and test causation and even forecast future data.

Dataset: Our dataset was cobbled together from monthly average Ice Cream Production in 2011 and the average monthly temperature for the US.File: correlation/weatherIceCream.csv 

Scatter plots are great for visualizing these types of relationships, or at least identify if there is some relationship in the first place, notice that in the following example we got rid of the month column, and are only plotting temperature and ice cream production since those are our target variables.

File: Correlation/scatters1.py#Import Libraries
from bokeh.models import HoverTool
from bokeh.plotting import figure, show, output_file
import pandas as pd
# Read Data
df = pd.read_csv("correlation/weatherIceCream.csv"…

--

--