Pankaj
Parashar

Developer. Designer. Writer.

Hello. I'm Pankaj Parashar, 30yo frontend designer, developer from Mumbai, India. I make things for the web and write about them on my blog here.

Elsewhere,
Twitter | Github | Codepen

Introducing Pandas

The intent of this post is to introduce you to the Pandas library by performing common tasks like reading, writing files and manipulating data.

Published on Nov 08, 2015

Pandas is an open source Python library that provides an easy-to-use data structure to perform complex operations with the data. Following are some of the ways to get you up and running with Pandas,

$ pip install pandas

Once you have finished the installation, you could verify the package by running the following command,

>>> import pandas as pd
>>> pd.__version__
u'0.17.0'
  1. Construct a random Pandas dataframe (uses NumPy)
    >>> import numpy as np
    >>> df = pd.DataFrame(np.random.randn(2,1), columns=['ColA'])
    >>> df
          ColA
    0 -0.585067
    1 -1.387787
    
  2. Construct a dataframe with a list of tuples

    >>> data = [(1,2,3), (4,5,6),]
    >>> df = pd.DataFrame(data, columns=['ColA', 'ColB', 'ColC'])
    >>> df
       ColA  ColB  ColC
    0     1     2     3
    1     4     5     6
    
  3. Construct a dataframe from a CSV file

    >>> df = pd.read_csv('file.csv')
    >>> df
             Date    price  factor_1  factor_2
    0  2012-06-11  1600.20     1.255     1.548
    1  2012-06-12  1610.02     1.258     1.554
    
  4. Construct a dataframe from a Excel(.xlsx) file

    >>> xlsFile = pd.ExcelFile('file.xlsx')
    >>> xlsFile.sheet_names
    ['Sheet1', 'Sheet2']
    >>> df = xls_file.parse('Sheet1')
    >>> df
             Date    price  factor_1  factor_2
    0  2012-06-11  1600.20     1.255     1.548
    1  2012-06-12  1610.02     1.258     1.554
    
  5. Add, Remove and Rename a column in a dataframe

    >>> df
       ColA  ColB  ColC
    0     1     2     3
    1     4     5     6
    
    #: Add column(s)
    >>> df.insert(loc=3, column='ColD', value='NewVal')
    >>> df
       ColA  ColB  ColC    ColD
    0     1     2     3  NewVal
    1     4     5     6  NewVal
    
    #: Rename column(s)
    >>> df.rename(columns={'ColD': 'ColE'}, inplace=True)
    >>> df
       ColA  ColB  ColC    ColE
    0     1     2     3  NewVal
    1     4     5     6  NewVal
    
    #: Remove column(s)
    >>> df.drop(labels=['ColE'], axis=1, inplace=True)
    >>> df
       ColA  ColB  ColC
    0     1     2     3
    1     4     5     6
    
  6. Insert, Update and Delete rows from a dataframe

    >>> df
       ColA  ColB  ColC
    0     1     2     3
    1     4     5     6
    
    #: Insert row(s) at the end
    >>> idx = len(df)
    >>> df.loc[idx] = [7,8,9]
    >>> df
    ColA ColB ColC
    0    1    2    3
    1    4    5    6
    2    7    8    9
    
    #: Update row(s) based on criterion
    >>> df.loc[  df.ColB%2 == 0,  'ColC'  ] = 10
    >>> df
       ColA  ColB  ColC
    0     1     2    10
    1     4     5     6
    2     7     8    10
    
    #: Delete row(s) based on criterion
    >>> df = df[  df.ColC == 10  ]
    >>> df
       ColA  ColB  ColC
    0     1     2    10
    2     7     8    10
    

Of course, Pandas is a huge library and this article would never be considered complete. As I continue to learn new tricks in the Pandas library, I will update the article immediately. So make sure you bookmark this post for future reference!

Previous
Magic methods

Next
Password strength meter