Alexandra Gore

Pandas Tricks Part 1

Pandas Tips and Tricks for Beginners

Lesson 1: Unexpected behavior after vertically stacking data frames

In [26]:
import pandas as pd

Example

In [27]:
food = pd.DataFrame([["ramen",100],["strawbery",1000]],columns=["Item","Price"]); food
Out[27]:
Item Price
0 ramen 100
1 strawbery 1000
In [28]:
morefood = pd.DataFrame([["Roll Cake",300],["Rice",50]],columns=["Item","Price"]); morefood
Out[28]:
Item Price
0 Roll Cake 300
1 Rice 50
In [29]:
allthefood = pd.concat([food,morefood],sort=False)

Two rows are returned when the row with index zero is selected.

In [30]:
allthefood.loc[0,:]
Out[30]:
Item Price
0 ramen 100
0 Roll Cake 300

Hmm? Σ(・ิ¬・ิ)

Solution

In [24]:
allthefood = pd.concat([food,morefood]).reset_index(drop=True)

Be sure to reset the index to avoid unexpected behavior later on when vertically stacking data frames.

In [25]:
allthefood.loc[0,:]
Out[25]:
Item     ramen
Price      100
Name: 0, dtype: object

Pandas Tips and Tricks for Beginners

Lesson 2: Avoiding the "SettingWithCopyWarning"

How to avoid the SettingWithCopyWarning when updating the values of a column based on other values in the data frame
In [41]:
import pandas as pd
import numpy as np

Example

In [51]:
mycats = pd.DataFrame([["Russian Blue","male",1,"Chekhov",np.nan],["Bengal","female",.5,"Nina",np.nan]],columns=["Breed","Sex","Age","Name","Favorite napping spot"]); mycats
Out[51]:
Breed Sex Age Name Favorite napping spot
0 Russian Blue male 1.0 Chekhov NaN
1 Bengal female 0.5 Nina NaN

Let's imagine that we receive some new data, namely that kittens (under 1 years old) prefer to nap on the couch.

Let's update the dataframe with this information.

In [53]:
mycats[mycats.Age<1]['Favorite napping spot'] = 'couch'
/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

Ach! (╯︵╰,)

Solution

In [57]:
mycats.loc[mycats.Age<1,'Favorite napping spot'] = 'couch'
In [58]:
mycats
Out[58]:
Breed Sex Age Name Favorite napping spot
0 Russian Blue male 1.0 Chekhov NaN
1 Bengal female 0.5 Nina couch

When the error messages says to:

Try using .loc[row_indexer,col_indexer] = value instead

This is exactly what you should do

The row indexer here is a series containing boolean values that is the same length as the data frame.

The column indexer here is the column to set new values for.

In the example above, the row indexer is mycats.Age<1

In [62]:
mycats.Age<1
Out[62]:
0    False
1     True
Name: Age, dtype: bool