# Preamble script block to identify host, user, and kernel
import sys
! hostname
! whoami
print(sys.executable)
print(sys.version)
print(sys.version_info)
! pwd
%%html
<!--Script block to left align Markdown Tables-->
<style>
table {margin-left: 0 !important;}
</style>
Last GitHub Commit Date: 2 Mar 2021
The CT concepts include:
Recall our speed and time example, repeated below.
# Our data
time = [0,1.0,2.0,3.0,4.0,5.0,6.0]
speed = [0,3,7,12,20,30,45.6]
# Our model
def poly1(b0,b1,x):
# return y = b0 + b1*x
poly1=b0+b1*x
return(poly1)
# Our plotting function
import matplotlib.pyplot as plt
def make2plot(listx1,listy1,listx2,listy2,strlablx,strlably,strtitle):
mydata = plt.figure(figsize = (10,5)) # build a square drawing canvass from figure class
plt.plot(listx1,listy1, c='red', marker='v',linewidth=0) # basic data plot
plt.plot(listx2,listy2, c='blue',linewidth=1) # basic model plot
plt.xlabel(strlablx)
plt.ylabel(strlably)
plt.legend(['Data','Model'])# modify for argument insertion
plt.title(strtitle)
plt.show()
# Our "fitting" process
intercept = 5.0
slope = 3.0
modelSpeed = [] # empty list
for i in range(len(time)):
modelSpeed.append(poly1(intercept,slope,time[i]))
# Plotting results
make2plot(time,speed,time,modelSpeed,'time (sec.)','speed (m/s)','Plot of model and observations')
We are limited in the ability to fit the data because our representation function is limited to a straight line, now lets make a quadratic a possible model option.
# Our new model
def poly2(b0,b1,b2,x):
# return y = b0 + b1*x
poly2=b0+(b1+b2*x)*x # faster than b0 + b1*x + b2*x**2
return(poly2)
Now try fitting and plotting using our new model and should get indetical result, then we can explore using the new parameter b2
# Our "fitting" process
intercept = 5.0 # set to 0.0
slope = 3.0 # adjust to 2.0
curvature = 0.0 # adjust to 0.9
modelSpeed = [] # empty list
for i in range(len(time)):
modelSpeed.append(poly2(intercept,slope,curvature,time[i]))
# Plotting results
make2plot(time,speed,time,modelSpeed,'time (sec.)','speed (m/s)','Plot of model and observations')
# Our "fitting" process
intercept = 0.0 # set to 0.0
slope = 2.0 # adjust to 2.0
curvature = 0.9 # adjust to 0.9
modelSpeed = [] # empty list
for i in range(len(time)):
modelSpeed.append(poly2(intercept,slope,curvature,time[i]))
# Plotting results
make2plot(time,speed,time,modelSpeed,'time (sec.)','speed (m/s)','Plot of model and observations')
Now which "model" is more useful for these data? Explain your reasoning.
Lets take a look over the process we just implemented
That's it, later we will explore ways to quantify the fit, which will help us choose a data model when multiple models appear good.
## Now lets apply our tools to different data, first we will read data from a file
amatrix = []
xvalue = []
yvalue = []
rowNumA = 0
file1 = open("MyFile.txt", "r") # get the data
for line in file1:
amatrix.append([float(n) for n in line.strip().split()])
rowNumA += 1
file1.close() # Disconnect the file
for i in range(len(amatrix)): # deconstruct the list, rename each column
xvalue.append(amatrix[i][0])
yvalue.append(amatrix[i][1])
make2plot(xvalue,yvalue,[],[],'x-value','y-value','EDA Plot of model and observations')
# Our "fitting" process
intercept = 0.0 # 0.0
slope = 0.01 # 0.018
curvature = 1e-09 # -0.0001
modelY = [] # empty list
for i in range(len(xvalue)):
modelY.append(poly2(intercept,slope,curvature,xvalue[i]))
# Plotting results
make2plot(xvalue,yvalue,xvalue,modelY,'x-value','y-value','EDA Plot of model and observations')
Lets build a different type of data model, here we will use a special function called the normal distribution function. A useful notation using the Normal density function as an example is:
$$ \text{pdf(x)} = \frac{1}{\sigma \sqrt{2\pi}} \times exp (-\frac{(x-\mu)^2}{2 \sigma^2}) $$In the function, $x$ is the random variable, $\mu$ is the population mean and $\sigma^2$ is the population variance. These parameters ($\mu$, and $\sigma^2$) play the same role that $b_0,b_1,b_2, \dots$ play in our polynomial model - they simply adjust shape of the model.
Often we don't actually know the population values so we estimate them from the collection of observations, in this context these are called the sample mean and variance. Computation of the sample values is done using methods described in the lesson on descriptive statistics.
The integral of the $ \text{pdf(x)} $ from $-\infty~to ~ X $, produces a result called the cumulative distribution function. The value $ X $ is not a random variable, but the integral value of the probability of the random variable $x$ being less than or equal to $X$.
A useful notation using the Normal distribution as an example is:
$$ F(X) = \int_{-\infty}^X{\frac{1}{\sigma \sqrt{2\pi}} \times exp (-\frac{(x-\mu)^2}{2 \sigma^2}) dx}$$For the Normal distribution the integral is a special function called the Error function and can be written as:
$$ F(X) = \frac{1}{2} \cdot (1+erf(\frac{(X-\mu)}{\sqrt{2} \sigma}))$$We will use these concepts to build an alternative to poly1
and poly2
as data models.
Here we will build a normal distribution model, essentially the functions for the above equations, and then will plot them. Then we will sample from a list of numbers from 1 to 100 and see if the data model is representative of the sample.
import math
def normdensity(x,mu,sigma):
weight = 1.0 /(sigma * math.sqrt(2.0*math.pi))
argument = ((x - mu)**2)/(2.0*sigma**2)
normdensity = weight*math.exp(-1.0*argument)
return normdensity
def normdist(x,mu,sigma):
argument = (x - mu)/(math.sqrt(2.0)*sigma)
normdist = (1.0 + math.erf(argument))/2.0
return normdist
# Our "fitting" process
mu = 50.0 # 50.0
sigma = 10 # 850.01**0.5
modelY = [] # empty list
for i in range(len(xvalue)):
modelY.append(normdist(xvalue[i],mu,sigma))
# Plotting results
make2plot(xvalue,yvalue,xvalue,modelY,'x-value','y-value','EDA Plot of model and observations')
Lets return to our time/speed model and estimate the speed at 4.5 seconds
# Our "fitting" process
intercept = 0.0 # set to 0.0
slope = 2.0 # adjust to 2.0
curvature = 0.9 # adjust to 0.9
modelSpeed = [] # empty list
for i in range(len(time)):
modelSpeed.append(poly2(intercept,slope,curvature,time[i]))
# Plotting results
make2plot(time,speed,time,modelSpeed,'time (sec.)','speed (m/s)','Plot of model and observations')
print('Speed estimate at time ',4.5,'is ',poly2(intercept,slope,curvature,4.5))
print('Speed estimate at time ',9.5,'is ',poly2(intercept,slope,curvature,9.5))
Jamie Chan (2014) Learn Python in One Day and Learn It Well. LCF Publishing. Kindle Edition. http://www.learncodingfast.com/python
Grus, Joel. Data Science from Scratch: First Principles with Python. O'Reilly Media. Kindle Edition. (http://safaribooksonline.com)
Christian, B, and Griffiths Tom (2016) Algorithms to live by: The computer science of human decisions. Henry Holt and Company, ISBN 9781627790369 (hardcover)|ISBN 9781627790376 (electronic book)
https://www.amazon.com/Distributional-Statistics-Environment-Statistical-Computing/dp/1463508417
https://www.inferentialthinking.com/chapters/10/Sampling_and_Empirical_Distributions.html
https://www.inferentialthinking.com/chapters/15/Prediction.html