Introduction

Why study machine learning?

The availability of big data is transforming Civil Engineering profession. Data-driven tools and algorithms have the promise to model nonlinear civil engineering phenomena and extract information that is not possible through traditional modeling methods. Machine learning is a fast- growing field and its use in civil engineering will likely become routine in the next few years. The primary objective of this course is to provide a theoretical introduction and exposure to ML applications in Civil Engineering.

Computational Environment

The course itself is mathematically oriented and will require developing scripts (computer programs). The default computational environment is a Jupyter Notebook running an iPython kernel.

Note

In fact this on-line document is a collection of Jupyter Notebooks rendered using a program called Sphinx that converts the notebooks into a website (which you are now accessing). For me the author its cool because I just make notebooks and bind them at my leisure (there are some nuances to get figures into the books and embedding code - but its not too hard to do so. Jupyter as a literate programming environment is quite useful even outside of the University.

Jupyter Notebooks

To follow along with the examples in these notes, you need to have access to Jupyter Notebook. Jupyter Notebook is an open-source interactive computational environment that is based on a server-client structure. It includes a web server and a web application that works like an integrated development environment (IDE). This web application allows us to create Jupyter Notebook documents (commonly referred to simply as notebooks or IPYNB files) that consist of code, text, and images. To use Jupyter notebook, we have several options:

1. Anaconda

The first option is to install an instance on your own system. The easiest way to do that is to install a software distribution known as Anaconda. Anaconda comes with over 250 packages pre-installed, including NumPy, pandas, Matplotlib, and Scikit-Learn, all of which we’ll be using in this book. In addition, it includes many useful applications and IDEs,such as the Jupyter Notebook application mentioned above. This is probably the easiest approach if you already have a laptop and want to work offline.

2. Google Colaboratory

The second option to access Jupyter Notebook is to use Google Colab (short for Google Colaboratory). Google Colab is a free cloud-based Jupyter Notebook environment hosted by Google that requires no installation and offers free access to online computing resources. However, you need to be connected to the internet when you use Google Colab. This connection is mainly used to run the code and does not consume much data.

Note

Most of the scripts in this book can be cut-and-pasted into a Colab instance and seem to run as expected. The only realistic limitation is likely bandwidth (and possibly storage for big-data). Otherwise this is a good compromise and training environment for this course. Files need to be uploaded each connection and are destroyed when you disconnect, so you may need to write code to get files every time. Anyway it is a useable option.

3. Build from Repositories

A third option is to build a JupyterLab Server on a machine you own (such as AWS Virtual Private Server; Raspberry Pi runing on a home network, or similar set-up) and essentially replicate a Colab-type environment. This option is not for the faint of heart; it it the structure I used for this document. If your host machine is running Linux a good starting place is Installing Jupyter. This method is most definitely not point-and-click but can build a fully capable system on hardware you can control and scale.

Note

The hardware requirements are modest. This JupyterBook is developed on my home machine which is a Raspberry Pi 4B 8GB SBC using a 256GB MicroSD card to house the OS and data files. At current prices the hardware cost is about $240 so hardware is not a limiting issue.

Here’s a price list for your own JupyterHub server

Item

Price

Raspberry Pi 4B 8GB

$184.99

Power Supply

$10.99

Micro SD Card Class 10 or faster

$34.99

Total

$230.97

If you can find the Raspberry Pi at the MSRP ($75) you will fare even better.

Build Notes for Raspberry Pi running Ubuntu

Here are the build commands to make your own JupyterHub on a raspberry pi

First you will want a web server, might as well install R to see if we can get it into kernel list

# install and configure apache
sudo apt install apache2
sudo systemctl status apache2
sudo systemctl stop apache2
# install R
sudo apt-get install r-base-core
sudo apt-get install r-base

Next some Jupyter specific instructions

# install and configure JupyterHub
sudo apt install -y python3-pip
sudo apt install -y build-essential libssl-dev libffi-dev python3-dev
sudo apt-get install python3-venv
sudo python3 -m venv /opt/jupyterhub/
sudo apt install nodejs npm
sudo npm install -g configurable-http-proxy
sudo /opt/jupyterhub/bin/python3 -m pip install wheel
sudo /opt/jupyterhub/bin/python3 -m pip install jupyterhub jupyterlab
sudo /opt/jupyterhub/bin/python3 -m pip install ipywidgets
sudo mkdir -p /opt/jupyterhub/etc/jupyterhub/
cd /opt/jupyterhub/etc/jupyterhub/
sudo /opt/jupyterhub/bin/jupyterhub --generate-config
EDIT THE CONFIG "c.Spawner.default_url = '/lab'"
sudo mkdir -p /opt/jupyterhub/etc/systemd
sudo nano /opt/jupyterhub/etc/systemd/jupyterhub.service
INSERT <--
[Unit]
Description=JupyterHub
After=syslog.target network.target

[Service]
User=root
Environment="PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/jupyterhub/bin"
ExecStart=/opt/jupyterhub/bin/jupyterhub -f /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py

[Install]
WantedBy=multi-user.target
-->
sudo ln -s /opt/jupyterhub/etc/systemd/jupyterhub.service /etc/systemd/system/jupyterhub.service
sudo systemctl daemon-reload
sudo systemctl enable jupyterhub.service
sudo systemctl status jupyterhub.service
sudo systemctl start jupyterhub.service

If you want the mathematical typesetting to work, you also need a Latex engine.

If you want to be able to build PDF renderings of notebooks a few added dependencies need to be added:

--> dependencies to get nbconvert to work (this is a list, build a few at a time until it works)---------
  texlive-lang-french texlive-latex-base texlive-latex-recommended
  python-pil-doc python3-pil-dbg python-pygments-doc ttf-bitstream-vera
  python-pyparsing-doc dvipng imagemagick-6.q16 latexmk libjs-mathjax
  python3-sphinx-rtd-theme python3-stemmer sphinx-doc texlive-fonts-recommended
  texlive-latex-extra texlive-plain-generic sgml-base-doc debhelper
  gdb-doc python3-doc python3-pil.imagetk python3-gdbm-dbg python3-tk-dbg
  ghostscript-x imagemagick-doc autotrace cups-bsd | lpr | lprng enscript ffmpeg gimp 
  gnuplot grads graphviz hp2xx html2ps
  libwmf-bin mplayer povray radiance sane-utils transfig ufraw-batch colord libfftw3-bin 
  libfftw3-dev libgd-tools gvfs fonts-mathjax-extras fonts-stix libjs-mathjax-doc inkscape 
  libjxr-tools librsvg2-bin libwmf0.2-7-gtk www-browser zathura-ps zathura-djvu zathura-cb
<---- end dependencies ----

Build a desktop

# install Xfce and TightVNC for a desktop
sudo apt update
sudo apt install xfce4 xfce4-goodies
sudo apt install tightvncserver

Into ~/.vnc/xstartup 

---add--->

#!/bin/bash
xrdb \$HOME/.Xresources
startxfce4 &

Next open holes in the firewall for everything to work

sudo ufw allow from 192.168.1.1/24 to any port 5901 
sudo ufw allow 'Apache Full'
sudo ufw allow 'OpenSSH'

At this point you would be about 3-5 hours into the build, and should have a useable JupyterHub (a lot like the Colaboratory, but you own it - warts and all!

Why Python?

A skilled user can install an R kernel into Jupyter and run R scripts, or just run R directly. I tend to use python because I also teach undergraduate programming in python and don’t want to confuse myself. I am literate in R, but prefer python slightly. So we will default to python unless something is way easier in R (and will probably still do a mixed language call in that instance!).

While no prior experience with Python or R are required, familiarity with programming concepts as covered in ENGR 1330: Computational Thinking with Data Science and statistical concepts in CE 5315: Probabilistic Methods for Engineers is useful, as are the concepts and applications in CE 5310 Numerical Methods in Engineering.

The two computational environments can be downloaded and installed from

Software Title

Internet Source Link

Anaconda: A modeling environment that integrates Jupyter and Python

https://www.anaconda.com/

R statistical and programming environment

https://cran.r-project.org/