What is Pandas?

The name \’Pandas\’ comes from the econometrics term \’panel data\’ describing data sets that include observations over multiple time periods. Pandas was created by Wes McKinney in 2008.

Pandas is a fast, powerful, flexible and easy to use open-source data analysis and manipulation tool. Pandas is a Python library that is used to analyze data. Pandas work with data sets to analyzing, cleaning, exploring, and manipulating data.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories. Pandas can clean messy data sets, and make them readable and relevant. Relevant data is very important in data science.

What Can Pandas Do?

  • Data set cleaning, merging, and joining.
  • Easy handling of missing
  • Columns can be inserted and deleted from DataFrame and higher dimensional objects.
  • Can do calculation on aggregate functions

Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.

Installation of Pandas

If you have Python and PIP already installed on a system, then installation of Pandas is very easy.

Install it using this command:

C:\\Users\\Your Name>pip install pandas

If this command fails, then use a python distribution that already has Pandas installed like, Anaconda, Spyder etc.

Checking Pandas Version

The version string is stored under __version__ attribute.

Example

import pandas as pd
print(pd.__version__)

Import Pandas

Once Pandas is installed, import it in your applications by adding the import keyword:

import pandas

Now Pandas is imported and ready to use.

Example
import pandas
Record = {
  ‘Player_name’: [\"Sachin\", \"Dhawan\", \"Rohit\"],
  ‘Runs’: [83, 77, 56]
}
score = pandas. DataFrame (Record)
print(score)

Pandas as pd

Pandas is usually imported under the pd alias. Create an alias with the as keyword while importing:

import pandas as pd

Now the Pandas package can be referred to as pd instead of pandas.

import pandas as pd
Record = {
  \'Player_name\': [\"Sachin\", \"Dhawan\", \"Rohit\"],
  ‘Runs’: [83, 77, 56]
}
score = pd. DataFrame (Record)
print(score)