Importing An Excel File to Pandas Data Frame

Numbers in excel sheet. (Photo by Mika Baumeister on Unsplash)

Excel format (.xlsx) is one of the most common document formats when you’re dealing with data analysis. I generally prefer CSV document format, but sometimes you have to deal with Excel document too. One of the main differences between an Excel and CSV format is you can have up to several sheets in an Excel document, meanwhile in CSV document there is no sheet.

We are going to run through both reading Excel document from certain sheet only as well as reading Excel document from all available sheets.

Here is how you can import your data from your Excel file to pandas data frame.

Importing data from certain sheet only

Sometimes you only need to read certain sheet only. You can go straight putting the sheet name there, but in case you still don’t know all the available sheet names, you can print them all first.

Using xlrd

To do this method, you have to install xlrd first.

!pip install xlrd

After installing xlrd you can import and use it as follows. The variable sheets will contain all your Excel’s sheet names in a Python list.

import xlrd

fname = ""
xls = xlrd.open_workbook(fname, on_demand=True)
sheets = xls.sheet_names()

Then, you declare the list that will contain all the sheet names you want to read.

selected_sheets = ["sheets", "you", "want", "to", "read"]

After that, you can import certain sheet(s) from the Excel file.

import pandas as pd

d = []
for sheet in selected_sheets:
    temp = pd.read_excel(fname, sheet_name=sheet)
    d.append(temp)


df = pd.concat(d, axis=0, ignore_index=True)

Importing data from all sheets to one pandas data frame

If you want to read all sheets in Excel file to one huge pandas data frame, it’s even simpler as you only need pandas to do that.

import pandas as pd

sheets = pd.read_excel(fname, sheet_name=None)
df = pd.concat(sheets, axis=0, ignore_index=True)

The code above works by listing all sheets available and putting them in a list called sheets. Then you concatenate the data in the list into one data frame.

I hope you find this post useful.

References

https://stackoverflow.com/questions/12250024/how-to-obtain-sheet-names-from-xls-files-without-loading-the-whole-file

https://stackoverflow.com/questions/53877687/how-can-i-concat-multiple-dataframes-in-python

4 Comments Add yours

  1. Vipulkunwar says:

    Panda are great visualization tool. It’s very helpful tool with python.
    I’m python blogger,you can visit https://vipulkunwar503.code.blog

    Like

    1. catris25 says:

      Hey, there. A little correction, pandas isn’t a visualization tool, it’s a library for data analysis. You may want to use Matplotlib or Plotly for visualization. Thanks for visiting.

      Like

      1. Vipulkunwar says:

        Thanks, for correction

        Like

  2. Vipulkunwar says:

    It’s great,Catris.

    Like

Leave a Reply to Vipulkunwar Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s