Download this page as a jupyter notebook at Lab 10-TH

Laboratory 10: Databases

LAST NAME, FIRST NAME

R00000000

ENGR 1330 Laboratory 10 - Homework

Pandas Cheat Sheet(s)

The Pandas library is a preferred tool for data scientists to perform data manipulation and analysis, next to matplotlib for data visualization and NumPy for scientific computing in Python.

The fast, flexible, and expressive Pandas data structures are designed to make real-world data analysis significantly easier, but this might not be immediately the case for those who are just getting started with it. Exactly because there is so much functionality built into this package that the options are overwhelming.

Hence summary sheets will be useful

Exercise 1: Reading a File into a Dataframe

Pandas has methods to read common file types, such as csv,xlsx, and json. Ordinary text files are also quite manageable. (We will study these more in Lesson 11)

Here are the steps to follow:

  1. Download the file CSV_ReadingFile.csv to your local computer
  2. Run the cell below - it connects to the file, reads it into the object `readfilecsv'
  3. Print the contents of the object `readfilecsv'

Exercise 2

Now that you have downloaded and read a file, lets do it again, but with feeling!

Download the file named concreteData.xls to your local computer.

The file is an Excel 97-2004 Workbook; you probably cannot inspect it within Anaconda (but maybe yes). File size is about 130K, we are going to rely on Pandas to work here!

Read the file into a dataframe object named 'concreteData' the method name is

  • object_name = pandas.read_excel(filename)
  • It should work as above if you replace the correct placeholders

Then perform the following activities.

  1. Read the file into an object
  1. Examine the first few rows of the dataframe and describe the structure (using words) in a markdown cell just after you run the descriptor method
  1. Simplify the column names to "Cement", "BlastFurnaceSlag", "FlyAsh", "Water", "Superplasticizer", "CoarseAggregate", "FineAggregate", "Age", "CC_Strength"
  1. Determine and report summary statistics for each of the columns.
  1. Then run the script below into your notebook (after the summary statistics), describe the output (using words) in a markdown cell.