Download (right-click, save target as ...) this page as a jupyterlab notebook Lab21-TH
LAST NAME, FIRST NAME
R00000000
ENGR 1330 Laboratory 21 - Homework
Population: In statistics, a population is the entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements.
Sample: In statistics and quantitative research methodology, a sample is a set of individuals or objects collected or selected from a statistical population by a defined procedure. The elements of a sample are known as sample points, sampling units or observations.
Distribution (Data Model): A data distribution is a function or a listing which shows all the possible values (or intervals) of the data. It also (and this is important) tells you how often each value occurs.
From https://www.investopedia.com/terms
https://www.statisticshowto.com/data-distribution/
The file 08150000.pkf
is an actual WATSTORE formatted file for a USGS gage at Junction, Texas. The first few lines of the file look like:
Z08150000 USGS
H08150000 3030150994403004848267SW120902041854 1849 1634.32
N08150000 Llano Rv nr Junction, TX
Y08150000
308150000 19160522 11100 8.80
308150000 19170511 192 1.88
308150000 19180414 14900 10.50
308150000 19190924 35700
308150000 19200514 13700 10.00
308150000 19210319 880 2.19
308150000 19220403 16100 10.97
308150000 19230425 60400
The first column are some agency codes that identify the station , the second column after the fourth row is a date in YYYYMMDD format, the third column is a discharge in CFS, the fourth and fifth column are not relevant for this laboratory exercise. The file was downloaded from
https://nwis.waterdata.usgs.gov/tx/nwis/peak?site_no=08150000&agency_cd=USGS&format=hn2
In the original file there are a several codes that are manually removed, so use the file at
http://54.243.252.9/engr-1330-webroot/8-Labs/Lab21/08150000.pkf
The laboratory task is to fit the data models to this data, decide the best model from visual perspective, and report from that data model the magnitudes of peak flow associated with the probabilitiess below (i.e. populate the table)
Exceedence Probability | Flow Value | Remarks |
---|---|---|
25% | ???? | 75% chance of greater value |
50% | ???? | 50% chance of greater value |
75% | ???? | 25% chance of greater value |
90% | ???? | 10% chance of greater value |
99% | ???? | 1% chance of greater value (in flood statistics, this is the 1 in 100-yr chance event) |
99.8% | ???? | 0.002% chance of greater value (in flood statistics, this is the 1 in 500-yr chance event) |
99.9% | ???? | 0.001% chance of greater value (in flood statistics, this is the 1 in 1000-yr chance event) |
The first step is to read the file, skipping the first part, then build a dataframe:
# Code to download the file here, or manual download
# Read the data file
amatrix = [] # null list to store matrix reads
rowNumA = 0
matrix1=[]
col0=[]
col1=[]
col2=[]
with open('08150000.pkf','r') as afile:
lines_after_4 = afile.readlines()[4:]
afile.close() # Disconnect the file
howmanyrows = len(lines_after_4)
for i in range(howmanyrows):
matrix1.append(lines_after_4[i].strip().split())
for i in range(howmanyrows):
col0.append(matrix1[i][0])
col1.append(matrix1[i][1])
col2.append(matrix1[i][2])
# col2 is date, col3 is peak flow
#now build a datafranem
import pandas
df = pandas.DataFrame(col0)
df['date']= col1
df['flow']= col2
df.head()
Now explore if you can plot the dataframe as a plot of peaks versus date.
# Plot here
From here on you can proceede using the lecture notebook as a go-by, although you should use functions as much as practical to keep your work concise
# Descriptive Statistics
# Weibull Plotting Position Function
# Normal Quantile Function
# Fitting Data to Normal Data Model
Exceedence Probability | Flow Value | Remarks |
---|---|---|
25% | ???? | 75% chance of greater value |
50% | ???? | 50% chance of greater value |
75% | ???? | 25% chance of greater value |
90% | ???? | 10% chance of greater value |
99% | ???? | 1% chance of greater value (in flood statistics, this is the 1 in 100-yr chance event) |
99.8% | ???? | 0.002% chance of greater value (in flood statistics, this is the 1 in 500-yr chance event) |
99.9% | ???? | 0.001% chance of greater value (in flood statistics, this is the 1 in 1000-yr chance event) |
# Log-Normal Quantile Function
# Fitting Data to Normal Data Model
Exceedence Probability | Flow Value | Remarks |
---|---|---|
25% | ???? | 75% chance of greater value |
50% | ???? | 50% chance of greater value |
75% | ???? | 25% chance of greater value |
90% | ???? | 10% chance of greater value |
99% | ???? | 1% chance of greater value (in flood statistics, this is the 1 in 100-yr chance event) |
99.8% | ???? | 0.002% chance of greater value (in flood statistics, this is the 1 in 500-yr chance event) |
99.9% | ???? | 0.001% chance of greater value (in flood statistics, this is the 1 in 1000-yr chance event) |
# Gumbell EV1 Quantile Function
# Fitting Data to Gumbell EV1 Data Model
Exceedence Probability | Flow Value | Remarks |
---|---|---|
25% | ???? | 75% chance of greater value |
50% | ???? | 50% chance of greater value |
75% | ???? | 25% chance of greater value |
90% | ???? | 10% chance of greater value |
99% | ???? | 1% chance of greater value (in flood statistics, this is the 1 in 100-yr chance event) |
99.8% | ???? | 0.002% chance of greater value (in flood statistics, this is the 1 in 500-yr chance event) |
99.9% | ???? | 0.001% chance of greater value (in flood statistics, this is the 1 in 1000-yr chance event) |
# Gamma (Pearson Type III) Quantile Function
# Fitting Data to Pearson (Gamma) III Data Model
# This is new, in lecture the fit was to log-Pearson, same procedure, but not log transformed
Exceedence Probability | Flow Value | Remarks |
---|---|---|
25% | ???? | 75% chance of greater value |
50% | ???? | 50% chance of greater value |
75% | ???? | 25% chance of greater value |
90% | ???? | 10% chance of greater value |
99% | ???? | 1% chance of greater value (in flood statistics, this is the 1 in 100-yr chance event) |
99.8% | ???? | 0.002% chance of greater value (in flood statistics, this is the 1 in 500-yr chance event) |
99.9% | ???? | 0.001% chance of greater value (in flood statistics, this is the 1 in 1000-yr chance event) |
# Fitting Data to Log-Pearson (Log-Gamma) III Data Model
Exceedence Probability | Flow Value | Remarks |
---|---|---|
25% | ???? | 75% chance of greater value |
50% | ???? | 50% chance of greater value |
75% | ???? | 25% chance of greater value |
90% | ???? | 10% chance of greater value |
99% | ???? | 1% chance of greater value (in flood statistics, this is the 1 in 100-yr chance event) |
99.8% | ???? | 0.002% chance of greater value (in flood statistics, this is the 1 in 500-yr chance event) |
99.9% | ???? | 0.001% chance of greater value (in flood statistics, this is the 1 in 1000-yr chance event) |
# your interpretation here