Download this page as a jupyter notebook at Lesson 1
Copyright © 2021 Theodore G. Cleveland and Farhang Forghanparast
Last GitHub Commit Date: 13 July 2021
# Script block to identify host, user, and kernel
import sys
! hostname; ! whoami; ! pwd;
print(sys.executable)
atomickitty sensei /home/sensei/engr-1330-webroot/1-Lessons/Lesson01 /opt/jupyterhub/bin/python3
%%html
<!-- Script Block to set tables to left alignment -->
<style>
table {margin-left: 0 !important;}
</style>
Computational thinking (CT) refers to the thought processes involved in expressing solutions as computational steps or algorithms that can be carried out by a computer.
CT is literally a process for breaking down a problem into smaller parts, looking for patterns in the problems, identifying what kind of information is needed, developing a step-by-step solution, and implementing that solution.
The entire point of this course is to develop problem solving skills and begin using some tools (Statistics, Numerical Methods, Data Science, implemented as JupyterLab/Python programs).
The scientific method (https://en.wikipedia.org/wiki/Scientific_method) is one example of an effective problem solving strategy. Stated as a protocol it goes something like:
This protocol can be directly adapted to CT/DS problems as:
For actual computational methods the protocol becomes:
Oddly enough the first step is the most important and sometimes the most difficult. In a practical problem, step 2 is sometimes difficult because a skilled programmer is needed to translate the governing principles into an algorithm for the general solution (step 4).
Consider a need to compute an arithmetic mean, what would the process look like?
Step 1. Develop script to compute the arithmetic mean of a stream of data of unknown length.
Step 2.
Step 3. Work a sample problem by-hand for testing the general solution.
Data |
---|
23.43 |
37.43 |
34.91 |
28.37 |
30.62 |
The arithmetic mean requires us to count how many elements are in the data stream (in this case there are 5) and compute their sum (in this case 154.76), and finally divide the sum by the count and report this result as the arithmetic mean.
$$ \bar x = \frac{1}{5}(23.43+37.43+34.91+28.37+30.62)=\frac{154.76}{5}=30.95 $$Step 4. Develop a general solution (code)
The by-hand exercise helps identify the required steps in an “algorithm” or recipe to compute mean values. First we essentially capture or read the values then count how many there are (either as we go or as a separate step), then sum the values, then divide the values by the count, and finally report the result.
In a flow-chart it would look like:
Flowchart for Artihmetic Mean Algorithm | ||
---|---|---|
Step 5. This step we would code the algorithm expressed in the figure and test it with the by-hand data and other small datasets until we are convinced it works correctly.
In a simple JupyterLab script
# Arithmetic Mean in Very Elementary and Primative Python
xlist = [23.43,37.43,34.91,28.37,30.62] # list is a type of data structure
howlong = len(xlist) # len is a built-in function that returns how many items in a list
accumulator = 0 # a variable to accumulate the sum
for i in range(howlong):
accumulator = accumulator + xlist[i]
print("arithmetic mean = ",(accumulator/howlong))
arithmetic mean = 30.951999999999998
Step 6. This step we would refine the code to generalize the algorithm. In the example we want a way to supply the xlist
from a file perhaps, and tidy the output by rounding to only two decimal places - rounding is relatively simple:
# Arithmetic Mean in Very Elementary and Primative Python
xlist = [23.43,37.43,34.91,28.37,30.62] # list is a type of data structure
howlong = len(xlist) # len is a built-in function that returns how many items in a list
accumulator = 0 # a variable to accumulate the sum
for i in range(howlong):
accumulator = accumulator + xlist[i]
print("arithmetic mean = ",round((accumulator/howlong),2))
arithmetic mean = 30.95
Reading from a file, is a bit more complicated. We need to create a connection to the file, then read the contents into our script, then put the contents into the xlist
xlist=[] # list (null) is a type of data structure
externalfile = open("data.txt",'r') # create connection to file, set to read (r), file must exist
how_many_lines = 0
for line in externalfile: # parse each line, append to xlist
xlist.append(line)
how_many_lines += 1
externalfile.close() # close the file connection
howlong = len(xlist) # len is a built-in function that returns how many items in a list
accumulator = 0 # a variable to accumulate the sum
for i in range(howlong):
accumulator = accumulator + float(xlist[i])
print("arithmetic mean = ",round((accumulator/howlong),2))
arithmetic mean = 30.95
Finally, if we want to reuse the code a lot, it is convienent to make it into a function
def average(inputlist):
# inputlist should be a list of values
howlong = len(inputlist) # len is a built-in function that returns how many items in a list
accumulator = 0 # a variable to accumulate the sum
for i in range(howlong):
accumulator = accumulator + float(inputlist[i])
result = (accumulator/howlong)
return(result)
Put our file reading and compute mean code here
xlist=[] # list (null) is a type of data structure
externalfile = open("data.txt",'r') # create connection to file, set to read (r), file must exist
how_many_lines = 0
for line in externalfile: # parse each line, append to xlist
xlist.append(line)
how_many_lines += 1
externalfile.close() # close the file connection
print("arithmetic mean = ",round(average(xlist),2))
arithmetic mean = 30.95
So the simple task of computing the mean of a collection of values, is a bit more complex when decomposed that it first appears, but illustrates a five step process (with a refinement step). Throughout the course this process is always in the background.
A lot of the problems we will encounter from a CT/DS perspective have already been solved, or at least analogs have been solved. It is perfectly acceptable to use prior work for a new set of conditions as long as proper attribution is made. We call this process CCMR:
In cases where we apply CCMR we are scaffolding parts (https://en.wikipedia.org/wiki/Scaffold_(programming)) - a legitimate and valuable engineering activity.
Computational and Inferential Thinking Ani Adhikari and John DeNero, Computational and Inferential Thinking, The Foundations of Data Science, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND) Chapter 1 https://www.inferentialthinking.com/chapters/01/what-is-data-science.html
Learn Python the Hard Way (Online Book) (https://learnpythonthehardway.org/book/) Recommended for beginners who want a complete course in programming with Python.
LearnPython.org (Interactive Tutorial) (https://www.learnpython.org/) Short, interactive tutorial for those who just need a quick way to pick up Python syntax.
How to Think Like a Computer Scientist (Interactive Book) (https://runestone.academy/runestone/books/published/thinkcspy/index.html) Interactive "CS 101" course taught in Python that really focuses on the art of problem solving.
How to Learn Python for Data Science, The Self-Starter Way (https://elitedatascience.com/learn-python-for-data-science)
Theodore G. Cleveland, Farhang Forghanparast, Dinesh Sundaravadivelu Devarajan, Turgut Batuhan Baturalp (Batu), Tanja Karp, Long Nguyen, and Mona Rizvi. (2021) Computational Thinking and Data Science: A WebBook to Accompany ENGR 1330 at TTU, Whitacre College of Engineering, DOI (pending)http://54.243.252.9/engr-1330-webroot/engr-1330-webbook/ctds-psuedocourse/site/