Exploratory Data Analysis

Concepts

Before performing any advanced calculations, it is important to understand the data. For example image data may be in different structural formats that your code accepts - you have to supply the rotations yourself, the only way to figure out how to wrassle the images is to examine one (as data) to learn the structure. This effort is solidly in the domain of exploratory data analysis (EDA).

EDA was advocated by Tukey, J (1977) Exploratory Data Analysis and its goal is to provide initial insights into the data, and guidance of how to proceede with an analysis. EDA uses summary measures and visualization to understand data. There are no rigorous analyses carried out

The objective of EDA is to look for patterns and unique features in the dataset. We try to see things that we do not initially think about. How to do EDA is subjective (there are no set rules;there are no set of tools or procedures) for performing EDA. Knowing some particular data summary and visualization tools is about the best one can do - so we proceede with those.

EDA can be reasonably classified into

  1. Exploratory Analysis using Data Summaries

  • Descriptive Statistics

  • Record lengths

  • Array shapes

  1. Exploratory Analysis using Visual Summaries

  • Scatter plots

  • Box-Plots

  • Histograms

  • Contour Plots

  • Violin Plots

Tukey himself is credited with the invention of the modern Box-Plot although there is no doubt that it preceeded his introduction of it in his 1977 book (linked above).

lorem ipsum

topic

Subtopic

lorem ipsum