Download (right-click, save target as ...) this page as a jupyterlab notebook from: ES-12


Exercise Set 12: Practice with Pandas

LAST NAME, FIRST NAME

R00000000

ENGR 1330 ES 12 - Homework


Exercise 1

Profile your computer

Run the script below exactly as written

In [1]:
import sys
! hostname
! whoami
print(sys.executable)
atomickitty
sensei
/opt/jupyterhub/bin/python3

Exercise 2

This homework involves a winequality database, our first step is to get the database. Run the script below to get the database. If it does not work for you, then manually get the database from the URL in the script

In [2]:
######### CODE TO AUTOMATICALLY DOWNLOAD THE DATABASE ################
#! pip install requests #install packages into local environment
import requests # import needed modules to interact with the internet
# make the connection to the remote file (actually its implementing "bash curl -O http://fqdn/path ...")
remote_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv' # a csv file
response = requests.get(remote_url) # Gets the file contents puts into an object
output = open('winequality-red.csv', 'wb') # Prepare a destination, local
output.write(response.content) # write contents of object to named local file
output.close() # close the connection

2.1 Verify you have the database.

Some ways to verify:

  • Open the database in the file system; screen capture the database or
  • Use a system command like ! cat winequality-red.csv, observe the output.
    Windows users can try ! type winequality-red.csv to list the file contents
In [ ]:
! cat winequality-red.csv

2.2 Read the data as a data frame and print the first few rows. In a few lines explain what can be understood about the data from this.

In [6]:
# your code here

Put your explaination here

...

...

2.3 Use the appropriate function and get a summary of information on the data frame. Explain what you can learn from this summary report.

In [ ]:
# your code here

2.4 Are there any missing values in the data? Justify your answer.

In [ ]:
# your code here

2.5 Use the appropriate function and get the 5-number summary for the data frame. Explain what you can learn from this summary report for each column.

In [ ]:
# your code here

2.6 Rename the "quality (score _0to10)" column heading to "quality"

In [ ]:
# your code here

2.7 Make a subset of all the wines with a quality above 7. Name this subset "TopQ".

In [ ]:
# your code here

2.8 What percentage of wines in "TopQ" has an alcohol content less than 10%? What is this percentage out of the entire set of wine (the original data)?

In [ ]:
# your code here

2.9 Print the above subset of the dataframe, sorted by wine quality.

In [ ]:
# your code here

2.10 Define a function that labels the wines based on their quality according to the table below:

Quality Score Label.
q>= 7 Top
5<q<7 Average
q<=5 Low
In [ ]:
# your code here

2.11 Apply the function on the data frame and store the result in a new column "Qlabel".

In [ ]:
# your code here

2.12 Report the share of each quality label in percentage.

In [ ]:
# your code here

2.13 Plot a histogram of pH for all the Low quality wines. Explain what you can infer from this plot.

In [ ]:
# your code here

2.14 Make a similar histogram for pH for all the Top quality wines.

In [ ]:
# your code here

Bonus

Put the new histogram and the previous one next to each other and explain what you can infer by comparing them.