Pankaj
Parashar developer

follow
email@pankajparashar.com

Introducing Pandas

The intent of this post is to introduce you to the Pandas library by performing common tasks like reading, writing files and manipulating data.

Nov 08, 2015

Pandas is an open source Python library that provides an easy-to-use data structure to perform complex operations with the data. Following are some of the ways to get you up and running with Pandas,

$ pip install pandas

Once you have finished the installation, you could verify the package by running the following command,

>>> import pandas as pd
>>> pd.__version__
u'0.17.0'
  1. Construct a random Pandas dataframe (uses NumPy)
    >>> import numpy as np
    >>> df = pd.DataFrame(np.random.randn(2,1), columns=['ColA'])
    >>> df
          ColA
    0 -0.585067
    1 -1.387787
    
  2. Construct a dataframe with a list of tuples

    >>> data = [(1,2,3), (4,5,6),]
    >>> df = pd.DataFrame(data, columns=['ColA', 'ColB', 'ColC'])
    >>> df
       ColA  ColB  ColC
    0     1     2     3
    1     4     5     6
    
  3. Construct a dataframe from a CSV file

    >>> df = pd.read_csv('file.csv')
    >>> df
             Date    price  factor_1  factor_2
    0  2012-06-11  1600.20     1.255     1.548
    1  2012-06-12  1610.02     1.258     1.554
    
  4. Construct a dataframe from a Excel(.xlsx) file

    >>> xlsFile = pd.ExcelFile('file.xlsx')
    >>> xlsFile.sheet_names
    ['Sheet1', 'Sheet2']
    >>> df = xls_file.parse('Sheet1')
    >>> df
             Date    price  factor_1  factor_2
    0  2012-06-11  1600.20     1.255     1.548
    1  2012-06-12  1610.02     1.258     1.554
    
  5. Add, Remove and Rename a column in a dataframe

    >>> df
       ColA  ColB  ColC
    0     1     2     3
    1     4     5     6
    
    #: Add column(s)
    >>> df.insert(loc=3, column='ColD', value='NewVal')
    >>> df
       ColA  ColB  ColC    ColD
    0     1     2     3  NewVal
    1     4     5     6  NewVal
    
    #: Rename column(s)
    >>> df.rename(columns={'ColD': 'ColE'}, inplace=True)
    >>> df
       ColA  ColB  ColC    ColE
    0     1     2     3  NewVal
    1     4     5     6  NewVal
    
    #: Remove column(s)
    >>> df.drop(labels=['ColE'], axis=1, inplace=True)
    >>> df
       ColA  ColB  ColC
    0     1     2     3
    1     4     5     6
    
  6. Insert, Update and Delete rows from a dataframe

    >>> df
       ColA  ColB  ColC
    0     1     2     3
    1     4     5     6
    
    #: Insert row(s) at the end
    >>> idx = len(df)
    >>> df.loc[idx] = [7,8,9]
    >>> df
    ColA ColB ColC
    0    1    2    3
    1    4    5    6
    2    7    8    9
    
    #: Update row(s) based on criterion
    >>> df.loc[  df.ColB%2 == 0,  'ColC'  ] = 10
    >>> df
       ColA  ColB  ColC
    0     1     2    10
    1     4     5     6
    2     7     8    10
    
    #: Delete row(s) based on criterion
    >>> df = df[  df.ColC == 10  ]
    >>> df
       ColA  ColB  ColC
    0     1     2    10
    2     7     8    10
    

Of course, Pandas is a huge library and this article would never be considered complete. As I continue to learn new tricks in the Pandas library, I will update the article immediately. So make sure you bookmark this post for future reference!

Did you enjoy reading this artice? I'd love to hear your thoughts. Feel free to send me a tweet or open an issue on Github to add your comments.

Next
Password strength meter

Previous
Magic methods

2012-2020
Light/Dark

Back to Top