# Files from the Web (`requests.get ...`)
- Download files from a remote server


---
## Objectives
1. Apply packages to directly obtain a file from a remote server


---

### Downloading files from websites 

This section shows how to get files from a remote computer.  In the previous example, we can avoid the tedious select-right-click-save target .... step.   There are several ways to get files here we examine just one.  

The most important part is you need the FQDN (URL) to the file.

---

### A Method to get the actual file from a remote web server (unencrypted)

> - You know the FQDN to the file it will be in structure of "http://server-name/.../filename.ext"
> - The server is running ordinary (unencrypted) web services, i.e. `http://...`

We will need a module to interface with the remote server. Here we will use ``requests`` , so first we load the module

> You may need to install the module into your anaconda environment using the anaconda power shell, on my computer the commands are:
> - **sudo -H /opt/jupyterhub/bin/python3 -m pip install requests** 
>
> Or:
> - **sudo -H /opt/conda/envs/python/bin/python -m pip install requests**
>
> You will have to do some reading, but with any luck something similar will work for you. 

The example below will get a copy of a file named `all_quads_gross_evaporation.csv` that is stored on the class server, the FQDN/URL is http://54.243.252.9/engr-1330-webroot/4-Databases/all_quads_gross_evaporation.csv.  Here we can observe that the website is unencrypted `http` instead of `https`.  If we visit the URL we can confirm that the file exists (or get a 404 error, if there is no file).

First we will import the requests module.

In [1]:
import requests # Module to process http/https requests

Assuming the requests module loads, lets next clear any existing local (to our machine) copies of the file, in this example, we already have the name, so will just send a system command to delete the file.  This step is mostly for the classroom demonstration - the script will happily clobber existing files.

> The system command below may be different on windoze!  What's here works on MacOS and Linux.

In [2]:
import sys # Module to process commands to/from the OS using a shell-type syntax
! rm -rf all_quads_gross_evaporation.csv # delete file if it exists

Now we will generate a ``GET`` request to the remote http server.  I chose to do so using a variable to store the remote URL so I can reuse code in future projects.  The ``GET`` request (an http/https method) is generated with the requests method ``get`` and assigned to an object named ``rget`` -- the name is arbitrary.  Next we extract the file from the ``rget`` object and write it to a local file with the name of the remote file - esentially automating the download process. 

In [3]:
remote_url="http://54.243.252.9/engr-1330-webroot/4-Databases/all_quads_gross_evaporation.csv"  # set the url
rget = requests.get(remote_url, allow_redirects=True)  # get the remote resource, follow imbedded links
localfile = open('all_quads_gross_evaporation.csv','wb') # open connection to a local file same name as remote
localfile.write(rget.content) # extract from the remote the contents,insert into the local file same name
localfile.close() # close connection to the local file

In [4]:
print(type(localfile)) # verify object is an I/O object

<class '_io.BufferedWriter'>


In [5]:
# verify file exists
! pwd # list absolute path to script directory
! ls -lah # list directory contents, owner, file sizes ...

/home/sensei/ce-5319-webroot/ce5319jb/lessons/lesson4.1


total 404K
drwxrwxr-x  3 sensei sensei 4.0K Nov 22 19:54 .
drwxrwxr-x 21 sensei sensei 4.0K Nov 15 20:33 ..
drwxrwxr-x  2 sensei sensei 4.0K Oct  9 02:12 .ipynb_checkpoints
-rw-rw-r--  1 sensei sensei  571 Oct  9 02:12 1filesystems.ipynb
-rw-rw-r--  1 sensei sensei  550 Oct  9 02:12 2readwritecore.ipynb
-rw-rw-r--  1 sensei sensei  19K Oct  9 02:12 3readwriteweb.ipynb
-rw-rw-r--  1 sensei sensei  120 Nov  3 22:35 A.txt
-rw-rw-r--  1 sensei sensei 355K Nov 22 19:54 all_quads_gross_evaporation.csv
-rw-rw-r--  1 sensei sensei  886 Oct  9 02:12 filetransfer.ipynb


Now we can list the file contents and check its structure, before proceeding.

In [6]:
! cat all_quads_gross_evaporation.csv

YYYY-MM,104,105,106,107,108,204,205,206,207,208,304,305,306,307,308,309,404,405,406,407,408,409,410,411,412,413,414,504,505,506,507,508,509,510,511,512,513,514,601,602,603,604,605,606,607,608,609,610,611,612,613,614,701,702,703,704,705,706,707,708,709,710,711,712,713,714,803,804,805,806,807,808,809,810,811,812,813,814,907,908,909,910,911,912,1008,1009,1010,1011,1108,1109,1110,12101954-01,1.8,1.8,2.02,2.24,2.24,2.34,1.89,1.8,1.99,2.02,2.67,2.46,2.11,1.83,1.59,1.17,2.09,2.5,2.22,1.83,1.77,1.62,1.23,1.23,1.27,1.27,1.27,2.98,2.8,2.36,2.16,1.96,1.63,1.52,1.52,1.41,1.38,1.33,2.42,2.54,3.01,2.96,2.81,2.57,2.49,2.22,1.72,1.73,1.66,1.6,1.45,1.45,2.42,2.43,2.45,2.54,2.46,2.29,2.52,2.17,1.78,2.19,2.08,1.87,1.37,1.39,2.45,2.25,2.05,2.3,2.41,2.02,1.94,2.45,1.85,1.53,1.27,1.26,1.93,1.9,2.37,1.91,1.42,1.3,2.5,2.42,1.94,1.29,2.59,2.49,2.22,2.271954-02,4.27,4.27,4.13,3.98,3.9,4.18,4.26,4.27,4.26,4.18,4.1,3.98,3.8,3.9,4.69,3.81,3.83,3.48,3.34,3.24,4.16,3.68,3.13,4.22,4.21,2.53,2.39,4.39,4.01,3.99,4.52

.78,6.44,6.37,5.64,4.79,4.42,4.35,4.38,4.45,4.46,8.06,8.07,7.26,6.4,5.98,5.77,5.96,6.04,4.78,4.95,5.03,4.82,5.56,4.94,8.09,7.21,6.27,6.26,5.76,5.43,5.63,5.87,4.98,4.83,5.07,4.75,4.56,4.77,6.26,6.26,6.28,6.28,6.1,5.98,5.71,4.79,4.96,4.92,4.62,4.62,6.29,7,7.19,5.85,4.61,4.53,7.72,7.03,6.31,5.82,7.74,6.22,7.11,7.352009-06,6.97,8.74,8.71,8.71,7.2,8.82,8.51,8.61,8.17,6.94,8.84,8.29,8.89,7.77,7.4,7.28,8.69,8.47,9.28,8.57,8.32,8.22,7.2,6.18,5.78,5.72,5.61,8.33,8.43,8.33,8.46,8.75,7.61,6.96,7.27,7.17,6.75,6.49,8.45,9.17,8.69,7.91,7.72,7.74,8.67,9.22,7.08,8.02,8.57,7.5,8.18,7.46,8.4,8.25,7.83,7.82,7.47,7.2,7.04,8.91,7.52,7.43,8.25,7.16,6.48,6.87,7.82,7.84,8.4,8.5,8,8.94,8.03,6.66,6.98,6.83,5.95,5.95,8.51,8.9,8.94,6.16,6.33,6.47,9.17,8.12,7.1,6.53,9.1,7.13,7.46,7.792009-07,7.6,9.85,9.86,9.86,8.89,9.47,9.73,9.83,9.44,8.49,9.19,9.03,9.06,8.7,8.65,8.82,8.61,8.48,8.49,8.76,8.9,9.35,7.99,6.75,6.32,5.44,6.17,9.53,8.75,8.67,8.57,8.33,8.48,7.54,7.48,7.48,6.34,5.72,9.35,9.18,8.87,8.67,8.34,8.22,8.4,8.8

Structure kind of looks like a spreadsheet as expected; notice the unsuual character `^M`; this unprintable character is the *carriage return+line feed* control character for MS DOS (Windows) architecture.  The script below will actually strip and linefeed correctly, but sometimes all that is needed is to make a quick conversion using the system shell.

:::{note}
> Here are some simple system commands (on a Linux or MacOS) to handle the conversion for ASCII files
> - `sed -e 's/$/\r/' inputfile > outputfile`                # UNIX to DOS  (adding CRs)
> - `sed -e 's/\r$//' inputfile > outputfile`                # DOS  to UNIX (removing CRs)
> - `perl -pe 's/\r\n|\n|\r/\r\n/g' inputfile > outputfile`  # Convert to DOS
> - `perl -pe 's/\r\n|\n|\r/\n/g'   inputfile > outputfile`  # Convert to UNIX
> - `perl -pe 's/\r\n|\n|\r/\r/g'   inputfile > outputfile`  # Convert to old Mac

** Links to URLs with explaination in future revision **
:::

Now lets actually read the file into a list for some processing.  We will read it into a null list, and split on the commas (so we will be building a matrix of strings). Then we will print the first few rows and columns.

In [7]:
# now lets process the file
localfile = open('all_quads_gross_evaporation.csv','r') # open a connection for reading
aList = [] # null list to store read
rowNumA = 0 # counter to keep track of rows, 
for line in localfile:
    #aList.append([str(n) for n in line.strip().split()])
    aList.append([str(n) for n in line.strip().split(",")]) # process each line, strip whitespace, split on ","
    rowNumA += 1 # increment the counter
localfile.close() #close the connection - amatrix contains the file contents
# print((aList[0])) # print 1st row
for irow in range(0,10):
    print([aList[irow][jcol] for jcol in range(0,10)])  # implied loop constructor syntax

['YYYY-MM', '104', '105', '106', '107', '108', '204', '205', '206', '207']
['1954-01', '1.8', '1.8', '2.02', '2.24', '2.24', '2.34', '1.89', '1.8', '1.99']
['1954-02', '4.27', '4.27', '4.13', '3.98', '3.9', '4.18', '4.26', '4.27', '4.26']
['1954-03', '4.98', '4.98', '4.62', '4.25', '4.2', '5.01', '4.98', '4.98', '4.68']
['1954-04', '6.09', '5.94', '5.94', '6.07', '5.27', '6.31', '5.98', '5.89', '5.72']
['1954-05', '5.41', '5.09', '5.14', '4.4', '3.61', '5.57', '4.56', '4.47', '4.18']
['1954-06', '9.56', '11.75', '12.1', '9.77', '8.06', '9.47', '8.42', '8.66', '8.78']
['1954-07', '8.65', '11.12', '11.33', '11.12', '10.09', '9.44', '9.32', '9.42', '10.14']
['1954-08', '5.81', '7.68', '9.97', '11.34', '9.76', '7.15', '8.56', '8.59', '9.43']
['1954-09', '7.42', '10.41', '10.64', '8.68', '7.67', '7.39', '8.31', '8.65', '8.42']


Now suppose we are interested in column with the label 910, we need to find the position of the column, and lets just print the dates (column 0) and the evaporation values for cell 910 (column unknown).

We know the first row contains the column headings, so we can use a while loop to find the position like:

In [8]:
flag = True
c910 = 0
while flag:
    try:
        if aList[0][c910] == '910': # test if header is 910
            flag = False # switch flag to exit loop
        else :
            c910 += 1 # increment counter if not right header
    except:
        print('No column position found, resetting to 0')
        c910 = 0
        break
    
if c910 != 0:
    for irow in range(0,10): # activate to show first few rows
#    for irow in range(0,rowNumA): # activate to print entire list
        print(aList[irow][0],aList[irow][c910])  # implied loop constructor syntax    

YYYY-MM 910
1954-01 1.91
1954-02 3.53
1954-03 4.32
1954-04 4.51
1954-05 4.25
1954-06 6.85
1954-07 7.99
1954-08 7.88
1954-09 6.55


---

### A Method to get the actual file from a remote web server (SSL/TLS encrypted)

> - You know the FQDN to the file it will be in structure of "https://server-name/.../filename.ext"
> - The server is running SSL/TLS web services, i.e. `https://...`
> - The server has a CA certificate that is valid or possibly a self-signed certificate

**This section is saved for future semesters**

---

#### Reading data from a file.

Recall earlier we manually downlopaded files for reading as in :

> To continue our exploration, suppose we want to read from a file, and we know it is a data file - in this section the files we will use are `A.txt`, `B.txt`, and `x.txt` all located > at http://54.243.252.9/engr-1330-webroot/4-Databases/ to follow along download these files to the directory where your script is running.
>
> Our task is to determine if $x$ is a solution to $A \cdot x = B$
>
>From our problem solving protocol the algorithmic task is
>
> 1. Allocate objects to store file contents;
> 1. Read in A,B, and x from respective files;
> 2. Echo the inputs (pedagogical in this case);
> 2. Perform the matrix arithmetic Ax = RHS;
> 3. Test if RHS == B;
> 4. Report results;

Here we will insert the necessary script to automate the process.


In [9]:
import requests # Module to process http/https requests
remote_url="http://54.243.252.9/engr-1330-webroot/4-Databases/A.txt"  # set the url
rget = requests.get(remote_url, allow_redirects=True)  # get the remote resource, follow imbedded links
localfile = open('A.txt','wb') # open connection to a local file same name as remote
localfile.write(rget.content) # extract from the remote the contents,insert into the local file same name
localfile.close() # close connection to the local file

Now we read the file contents in a script

In [10]:
# Code to read A, X, and b - Notice we need somewhere for the data to go, hence the null lists
amatrix = [] # null list to store matrix read
rowNumA = 0
localfile = open("A.txt","r") # connect and read file for MATRIX A
for line in localfile:
    amatrix.append([float(n) for n in line.strip().split()])
    rowNumA += 1
localfile.close() # Disconnect the file
colNumA = len(amatrix[0]) # get the column count

In [11]:
print('A matrix')
for i in range(0,rowNumA,1):
    print ( (amatrix[i][0:colNumA]))

A matrix
[4.0, 1.5, 0.7, 1.2, 0.5]
[1.0, 6.0, 0.9, 1.4, 0.7]
[0.5, 1.0, 3.9, 3.2, 0.9]
[0.2, 2.0, 0.2, 7.5, 1.9]
[1.7, 0.9, 1.2, 2.3, 4.9]


---
## References
1. Learn Python in One Day and Learn It Well. Python for Beginners with Hands-on Project. (Learn Coding Fast with Hands-On Project Book -- Kindle Edition by LCF Publishing (Author), Jamie Chan [https://www.amazon.com/Python-2nd-Beginners-Hands-Project-ebook/dp/B071Z2Q6TQ/ref=sr_1_3?dchild=1&keywords=learn+python+in+a+day&qid=1611108340&sr=8-3](https://www.amazon.com/Python-2nd-Beginners-Hands-Project-ebook/dp/B071Z2Q6TQ/ref=sr_1_3?dchild=1&keywords=learn+python+in+a+day&qid=1611108340&sr=8-3)

---

2. Read a file line by line [https://www.geeksforgeeks.org/read-a-file-line-by-line-in-python/](https://www.geeksforgeeks.org/read-a-file-line-by-line-in-python/)

3. Read a file line by line (PLR approach) [https://www.pythonforbeginners.com/files/4-ways-to-read-a-text-file-line-by-line-in-python](https://www.pythonforbeginners.com/files/4-ways-to-read-a-text-file-line-by-line-in-python)

4. Reading and writing files [https://www.pythonforbeginners.com/files/reading-and-writing-files-in-python](https://www.pythonforbeginners.com/files/reading-and-writing-files-in-python)

5. Python Files I/O [https://www.tutorialspoint.com/python/python_files_io.htm](https://www.tutorialspoint.com/python/python_files_io.htm)

6. Working with files in Python [https://realpython.com/working-with-files-in-python/](https://realpython.com/working-with-files-in-python/)

7. File handling in Python [https://www.geeksforgeeks.org/file-handling-python/](https://www.geeksforgeeks.org/file-handling-python/)

8. File operations in Python [https://www.programiz.com/python-programming/file-operation](https://www.programiz.com/python-programming/file-operation)

---

9. How to read a text file from a URL in Python [https://www.kite.com/python/answers/how-to-read-a-text-file-from-a-url-in-python](https://www.kite.com/python/answers/how-to-read-a-text-file-from-a-url-in-python)

10. Downloading files from web using Python [https://www.tutorialspoint.com/downloading-files-from-web-using-python](https://www.tutorialspoint.com/downloading-files-from-web-using-python)

11. An Efficient Way to Read Data from the Web Directly into Python without having to download it to your hard drive [https://towardsdatascience.com/an-efficient-way-to-read-data-from-the-web-directly-into-python-a526a0b4f4cb](https://towardsdatascience.com/an-efficient-way-to-read-data-from-the-web-directly-into-python-a526a0b4f4cb)

---

12. Web Requests with Python (using http and/or https) [https://www.pluralsight.com/guides/web-scraping-with-request-python](https://www.pluralsight.com/guides/web-scraping-with-request-python)

13. Troubleshooting certificate errors (really common with https requests) [https://stackoverflow.com/questions/27835619/urllib-and-ssl-certificate-verify-failed-error](https://stackoverflow.com/questions/27835619/urllib-and-ssl-certificate-verify-failed-error)