The Colorado River is an approximately 862-mile (1,387 km) long river in the U.S. state of Texas. It is the 18th longest river in the United States and the longest river with both its source and its mouth within Texas.
The Colorado River originates south of Lubbock, on the Llano Estacado near Lamesa. It flows generally southeast out of the Llano Estacado and through the Texas Hill Country, then through several reservoirs including Lake J.B. Thomas, E.V. Spence Reservoir, and O.H. Ivie Lake. The river flows through several more reservoirs before reaching Austin, including Lake Buchanan, Inks Lake, Lake Lyndon B. Johnson (commonly referred to as Lake LBJ), and Lake Travis. The Llano River joins the Colorado at Lake LBJ near Kingsland, and the Pedernales River joins at Lake Travis near Briarcliff. After passing through Austin, the Colorado River continues flowing southeast until emptying into Matagorda Bay on the Gulf of Mexico, near Matagorda. The Colorado is the largest river lying entirely within Texas; it drains an area of about 39,900 square miles (103,350 square km) and receives several forks of the Concho River, the Pecan Bayou, and the San Saba, Llano, and Pedernales rivers.
The figure below shows a map of the Colorado River.
The river is an important source of water for farming, cities, and electrical power production. In addition to power plants operating on each of the major lakes, waters of the Colorado are used for cooling the South Texas Nuclear Project near Bay City. Altogether, there are over 7,500 miles of creeks, streams, and rivers in our basin, and well over 2 million people live and work here. The Colorado’s watershed includes several major metropolitan areas, including Midland-Odessa, San Angelo, and Austin, and there are hundreds of smaller towns and communities as well. Many communities, like Austin, rely on the Colorado River for 100% of their municipal water. Because of its importance to the state’s economy, environment, industry, and agriculture it is recognized as the lifeblood of Texas.
Streamflow data is downloaded for the most upstream streamflow monitoring station from "waterdata.usgs.gov". USGS Streamflow Monitoring Station 08117995 located in Borden County, Texas (Latitude 32°37'43", Longitude 101°17'06" NAD27) provided monthly streamflow records from March 1988 until May 2021.
Meteorological data is downloaded for the same period of time and the same time scale from "prism.oregonstate.edu". This data includes monthly precipitation and temperature (max, mean, and min) records.
Why is Streamflow Analysis Important?
Information gained from streamflow data is used for many different purposes:
In this project, you will analyse a streamflow dataset and build a Machine Learning Model to predict the flow status and flowrate in a river.
USGS describes the process at https://www.usgs.gov/special-topic/water-science-school/science/how-streamflow-measured?qt-science_center_objects=0#qt-science_center_objects
Streamgaging generally involves 3 steps:
The dataset http://54.243.252.9/engr-1330-webroot/4-Databases/ColoradoRiverData.csv is contains the following information:
Columns | Info. |
---|---|
Date | The date of a measurement in YYYY-MM format |
ppt (inches) | The total recorded precipitation in inches for each month |
tmin (degrees F) | The minimum recorded temperature in degrees Fahrenheit for each month |
tmean (degrees F) | The average recorded temperature in degrees Fahrenheit for each month |
tmax (degrees F) | The maximum recorded temperature in degrees Fahrenheit for each month |
Flowrate (cfs) | The average recorded streamflow in cubic feet per second for each month |
A script to get and download the database is provided below:
import requests
remote_url="http://54.243.252.9/engr-1330-webroot/4-Databases/ColoradoRiverData.csv" # set the url
rget = requests.get(remote_url, allow_redirects=True) # get the remote resource, follow imbedded links
open('ColoradoRiverData.csv','wb').write(rget.content); # extract from the remote the contents, assign
In a short essay (1-2 pages):
Some places to start are:
Cuo, L., Pagano, T. C., & Wang, Q. J. (2011). A Review of Quantitative Precipitation Forecasts and Their Use in Short- to Medium-Range Streamflow Forecasting, Journal of Hydrometeorology, 12(5), 713-728. Retrieved Oct 21, 2021, from https://journals.ametsoc.org/view/journals/hydr/12/5/2011jhm1347_
Yaseen, Z. M., El-Shafie, A., Jaafar, O., Afan, H. A., & Sayl, K. N. (2015). Artificial intelligence based models for stream-flow forecasting: 2000–2015. Journal of Hydrology, 530, 829-844. available at https://www.sciencedirect.com/science/article/abs/pii/S0022169415008069
Provide a summary (description) of the dataset in 2-3 pages. This summery should appropriately present the essential infromation about the dataset in a concise, well-written and clear manner. Things you may want to include ...
Your EDA section must include your answers for the following questions:
In this part, the goal is to make models to predict flowrates in the Colorado River, and then evaluate their performance using appropriate goodness-of-fit measures, and analyze the outcomes. Use the first 75% of the dataset for training your models and the remaining 25% for testing.
ppt | tmax | tmean | tmin | last month flowrate |
---|---|---|---|---|
0.0 | 113.0 | 99.0 | 85.0 | 0.0 |
4.5 | 95.0 | 85.0 | 75.0 | 74.5 |
2.2 | 20.0 | 10.0 | 0.0 | 55.0 |
1.0 | 80.0 | 60.0 | 40.0 | 36.3 |
0.0 | 80.0 | 60.0 | 40.0 | 12.0 |
note that you may not all the values for each case, depending on your best model.
Your "Model Building: Part 1" section must include your answers for the following questions:
In this part, the goal is to make models to predict whether the Colorado River's flowstate is in the "Flow" or the "No-Flow" state. Then, evaluate their performance using appropriate goodness-of-fit measures, and analyze the outcomes. Use the first 75% of the dataset for training your models and the remaining 25% for testing.
ppt | tmax | tmean | tmin |
---|---|---|---|
0.0 | 113.0 | 99.0 | 85.0 |
4.5 | 95.0 | 85.0 | 75.0 |
2.2 | 20.0 | 10.0 | 0.0 |
1.0 | 80.0 | 60.0 | 40.0 |
0.0 | 80.0 | 60.0 | 40.0 |
note that you may not all the values for each case, depending on your best model.
Your "Model Building: Part 2" section must include your answers for the following questions:
Each team must submit an effort sheet which is a table with a clear discription of the tasks undertaken by each member and has the signiture of all team members. The effort sheets should be submitted digitally via email.
This report must include:
Your report should be limited to 7 pages, 12 pt font size, double linespacing (exclusive of references which are NOT included in the page count). You need to cite/reference all sources you used. This report must be submitted by Midnight April 20th in PDF format.
This report must include:
All the references used in the entire length of the project.
This report must be submitted by Midnight May 2nd in PDF format, along with the following documents:
A well-documented Jupyter Notebook (.ipynb file) for the implementation of the data model user interface.
Above items can reside in a single notebook; but clearly identify sections that perform different tasks.
A how-to video demonstrating the application, performance and description of what you did for the project, including the problems that you solved as well as those that you were not able to solve.
A project management video (up to 5 minutes) in which you explain how you completed the project and how you worked as a team.
Above items can reside in a single video; but structure the video into the two parts; use an obvious transition when moving from "how to ..." into the project management portion.
Keep the total video length to less than 10 minutes; submit as an "unlisted" YouTube video, and just supply the link (someone on each team is likely to have a YouTube creator account). Keep in mind a 10 minute video can approach 100MB file size before compression, so it won't upload to Blackboard and cannot be emailed.