Here, I'm going to extend the last article I have written some time ago.
You guys can read it in here Most wanted pandas functions (Part 01)
1. Applying a function.
When dealing with the data, we often have to use functions. There are few ways to use a function over a DataFrame. We should avoid iterating approaches since it is slow going through the panda's rows.
We can use .apply() in this case.
# Add $ to values of the Fare column.
def add_dolor_mark(money_str):
money_str = '$ '+str(money_str)
return money_str
df['Fare'] = df['Fare'].apply(add_dolor_mark)
df.head()
2. Renaming columns.
Sometimes it is necessary to rename columns of our DataFrame.
We can pass dictionary {‘column_1’ : ’new_column_1’, ‘column_2’ : ’new_column_2’} to .rename()
Let's try to rename Embarked into Port and Fare into Price.
df.rename(columns = {'Embarked':'Port', 'Fare':'Price'}, inplace=True)
df.head()
3. Adding prefixes or suffixes to column names.
In this case we can use add_prefix() and add_suffix()
df = df.add_prefix('X_')
df.head()
4. Conditional replacing.
“Where” can be used to replace the values in rows or columns based on a condition. The way where is works a bit different. It is crucial to identify the way “where” works. "Where" select values based on the condition, then the remaining values are replaced with the specified value.
So "where" is behaving like a masking operation.
# we need to replace survied people with True
df['X_Survived'] = df['X_Survived'].where(df['X_Survived'] == 0 , True)
df.head()
5. Melting
This is a bit infamous function. But very useful. We can use melt to convert wide DataFrame to narrow ones.
Let's assume we have measured the temperature of 5 cities and for a week. And the data is looking like this.
(we can create a sample DataFrame as follows)
import pandas as pd
temp = {'City': ['Ofuna','Tokyo','Yokohama','Okayama', 'Tsukuba'],
'Day_1': [22,23,21,20,19],
'Day_2': [23,22,24,29,18],
'Day_3': [25,24,23,22,23],
'Day_4': [21,20,25,21,20],
'Day_5': [20,19,25,22,21],
}
df = pd.DataFrame(temp, columns = ['City', 'Day_1', 'Day_2', 'Day_3', 'Day_4', 'Day_5’])
df = df.melt(id_vars=['City'])
df
6. Save as csv
In some situations, we have to save processed DataFrame. In that case, we can use the to_csv function.
df.to_csv('myfile.csv')
7. Query
Among the several filtering methods query is another filtering method that we can quickly and easily access to the subsections of the DataFrame.
#letes assume that we want to get list of age of 10 and 15
seek_age = [10,15]
df.query("X_Age in @seek_age")
8. Memory usage
How much memory is used by a pandas DataFrame is really beneficial when dealing with large DataFrames. So we can avoid trouble with dead kernel due to out of memory
df.memory_usage()
For total Memory usage we can use,
df.memory_usage().sum()
9. Profile report
It generates reports from a pandas DataFrame. The pandas df.describe() function is handy but a little basic for good exploratory data analysis. pandas_profiling extends the capability for quick data analysis.
from pandas_profiling import ProfileReport
profile = ProfileReport(df, title='Pandas Profiling Report', style={'full_width':True}, correlations={'kendall': False})
profile
10. Explode
Let's think that our data set holds multiple entries in a single row. But for our analysis, we need to analyze them on separate rows. In that case, we can use the explode function.
(First lets create example DataFrame)
df_ = pd.DataFrame({'id':['a','b','c','d'],
'humidity %':[62,55,[29,39,81],77],
'day':[1,2,3,4]})
df_
df_.explode('humidity %')
11. Crosstab
It is used to group 2 or more variables and make a summary table. It is used to get insight of columns. Please note that crosstab() works with categorical data.
For example, let's think that we need to get how many people survived/not-survived according to their ports.
#First lets replace True with 1 in X_Survived column.
df['X_Survived'] = df['X_Survived'].where(df['X_Survived'] == 0 , 1)
port_survived = pd.crosstab(df["X_Port"],df["X_Survived"])
port_survived
There are more many useful functions for analyzing data. I will introduce few more functions in the next article.
Hope this will help you....
*本記事は @qualitia_cdevの中の一人、@nuwanさんが書いてくれました。
*This article is written by @nuwan a member of @qualitia_cdev.