Loading, please wait...

A to Z Full Forms and Acronyms

What is Exploratory Data Analysis in Python

This article is a brief introduction of Exploratory Data Analysis, types of EDA, programming and non-programming tools that can be used to do EDA by data scientist.

 Exploratory Data Analysis |part-1

What is EDA? 

Exploratory Data Analysis (EDA) is a very important part of every data science project. It is the first and foremost step through which the dataset goes through. The work of EDA is to save us from data visualization. It helps to achieve a quick understanding of the data and gain insights about phenomena the data represents.

It is an iterative process where data scientists can ask questions, understands, and transforms the data. 

Key steps in EDA are:

  • Importing datasets.
  • Identifying the number of features i.e number of columns.
  • Identifying the number of observations i.e number of rows.
  • Checking if the dataset has empty cells.
  • Identifying the number of empty cells by columns or by features.
  • Exploring categorical features etc.

There are several ways of performing EDA.  

  • Non-Graphical or Graphical Non-graphical involves the calculation of summary statistics, whereas the graphical method summarizes the data in a pictorial and diagrammatical way.
  • Univariate or Multivariate Univariate looks at a single variable i.e data column at a time, whereas multivariate methods look at two or more variables at a time.

Based on the above methods, EDA are of four types

  • univariate non-graphical
  • multivariate non-graphical
  • univariate graphical
  • multivariate graphical.

There are ample of tools available in the market for programmers and non-programmers for EDA in the market. 

Some of the non-programming tools are:

  • Excel Spreadsheets
  • Trifacta 
  • Rapid Miner
  • Rattle GUI
  • Qlikview 
  • KNIME 
  • Tableau Public 
  • Data Wrapper,
  • Data Science Studio (DSS)
  • OpenRefine and many more.

Some of the programming tools or packages that focus on making EDA as easy, automatic, and efficient as possible are:

  • pandas-profiling (python)
  • summary tools (R)
  • explore (R)
  • dataMaid (R)

Through my article, you will learn pandas-profiling which is a package that generates profile reports from pandas data frames. In my upcoming article, I will explain what is pandas profiling, its features, why it is known as EDA booster.

And will create a profile create using pandas-profiling. So, stay tuned with us.

Thanks for reading this article, I hope you like the content.

A to Z Full Forms and Acronyms

Related Article