### Get data files from a URL
This notebook shows how to obtain data files from public URLs.  
If you want to distribute files (web developers) the files need to be in the webroot.

Here we will do an example with a file that contains topographic data in XYZ format, without header information.

The first few lines of the remote file look like:

    74.90959724	93.21251922	0
    75.17907367	64.40278759	0
    94.9935575	93.07951286	0
    95.26234119	64.60091165	0
    54.04976655	64.21159095	0
    54.52914363	35.06934342	0
    75.44993558	34.93079513	0
    75.09317373	5.462959114	0
    74.87357468	10.43130083	0
    74.86249082	15.72938748	0

And importantly it is tab delimited.

The module to manipulate url in python is called ``urllib``

Google search to learn more, here we are using only a small component without exception trapping.
    

In [1]:
#Step 1: import needed modules to interact with the internet
from urllib.request import urlopen # import a method that will connect to a url and read file contents
import pandas #import pandas

This next code fragment sets a string called ``remote_url``; it is just a variable, name can be anything that honors python naming rules.
Then the ``urllib`` function ``urlopen`` with read and decode methods is employed, the result is stored in an object named ``elevationXYZ``

In [2]:
#Step 2: make the connection to the remote file (actually its implementing "bash curl -O http://fqdn/path ...")
remote_url = 'http://www.rtfmps.com/share_files/pip-corner-sumps.txt' # 
elevationXYZ = urlopen(remote_url).read().decode().split() # Gets the file contents as a single vector, comma delimited, file is not retained locally

At this point the object exists as a single vector with hundreds of elements. We now need to structure the content.  Here using python primatives, and knowing how the data are supposed to look, we prepare variables to recieve the structured results

In [3]:
#Step 3cPython primatives to structure the data, or use fancy modules (probably easy in numpy)
howmany = len(elevationXYZ) # how long is the vector?
nrow = int(howmany/3)
xyz = [[0 for j in range(3)] for j in range(nrow)] # null space to receive data define columnX

Now that everything is ready, we can extract from the object the values we want into ``xyz``

In [4]:
#Step4 Now will build xyz as a matrix with 3 columns
index = 0
for irow in range(0,nrow):
    xyz[irow][0]=float(elevationXYZ[index])
    xyz[irow][1]=float(elevationXYZ[index+1])
    xyz[irow][2]=float(elevationXYZ[index+2])
    index += 3 #increment the index

``xyz`` is now a 3-column float array and can now probably be treated as a data frame.
Here we use a ``pandas`` method to build the dataframe.

In [5]:
df = pandas.DataFrame(xyz)

Get some info, yep three columns (ordered triples to be precise!)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 774 entries, 0 to 773
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       774 non-null    float64
 1   1       774 non-null    float64
 2   2       774 non-null    float64
dtypes: float64(3)
memory usage: 18.3 KB


And some summary statistics (meaningless for these data), but now have taken data from the internet and prepared it for analysis.

In [7]:
df.describe()

Unnamed: 0,0,1,2
count,774.0,774.0,774.0
mean,52.064621,48.77006,2.364341
std,30.8834,32.886277,1.497413
min,-2.113554,-11.36096,0.0
25%,25.640786,21.809579,2.0
50%,55.795821,49.05995,2.0
75%,76.75229,75.015933,4.0
max,111.726727,115.123931,4.0


And lets look at the first few rows

In [10]:
df.head()

Unnamed: 0,0,1,2
0,74.909597,93.212519,0.0
1,75.179074,64.402788,0.0
2,94.993557,93.079513,0.0
3,95.262341,64.600912,0.0
4,54.049767,64.211591,0.0
