Introducing Pandas DataFrame for Python data analysis

Pandas is an open source Python library for data analysis. It gives Python the ability to work with spreadsheet-like data for fast data loading, manipulating, aligning, and merging, among other functions. To give Python these enhanced features, Pandas introduces two new data types to Python: Series and DataFrame. The DataFrame represents your entire spreadsheet or rectangular data, whereas the Series is a single column of the DataFrame. A Pandas DataFrame can also be thought of as a dictionary or collection of Series objects.

Why should you use a programming language like Python and a tool like Pandas to work with data? It boils down to automation and reproducibility. If a particular set of analyses need to be performed on multiple data sets, a programming language has the ability to automate the analysis on those data sets. Although many spreadsheet programs have their own macro programming languages, many users do not use them. Furthermore, not all spreadsheet programs are available on all operating systems. Performing data analysis using a programming language forces the user to maintain a running record of all steps performed on the data. I, like many people, have accidentally hit a key while viewing data in a spreadsheet program, only to find out that my results no longer make any sense due to bad data. This is not to say that spreadsheet programs are bad or that they do not have their place in the data workflow; they do. Rather, my point is that there are better and more reliable tools out there.

Loading your first data set

When given a data set, you first load it and begin looking at its structure and contents. The simplest way of looking at a data set is to examine and subset specific rows and columns. You can see which type of information is stored in each column, and can start looking for patterns by aggregating descriptive statistics.

Because Pandas is not part of the Python standard library, you have to first tell Python to load (import) the library:

Leave a Reply

Your email address will not be published. Required fields are marked *