Skip to main content

Most Commonly Used Python Pandas Methods

Pandas is a powerful Python library for data manipulation and analysis. It provides a wide range of methods that can be used to perform a variety of tasks, including data cleaning, data exploration, and data visualization.

In this blog post, we will discuss some of the most commonly used Pandas methods, along with examples of how to use them.


1. head()

The head() method returns the first n rows of a DataFrame. This can be useful for getting a quick overview of the data in your DataFrame.


import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie'], 'age': [20, 30, 40], 'city': ['New York', 'Boston', 'Chicago']})

# Print the first 2 rows of the DataFrame
print(df.head(2))


Output:

   name  age city
0  Alice 20  New York
1  Bob   30  Boston


2. tail()

The tail() method returns the last n rows of a DataFrame. This can be useful for getting a quick overview of the data at the end of your DataFrame.

# Print the last 2 rows of the DataFrame
print(df.tail(2))


Output:
   name    age city
1  Bob     30  Boston
2  Charlie 40  Chicago

3. info()

The info() method provides a concise summary of the DataFrame, including the number of rows and columns, the data types of each column, and a summary of the missing values in each column.


# Print a summary of the DataFrame
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name     3 non-null     object
 1   age      3 non-null     int64 
 2   city     3 non-null     object
dtypes: int64(1), object(2)

4. describe()

The describe() method provides a statistical summary of the DataFrame, including the mean, median, standard deviation, minimum, and maximum values for each numeric column.

# Print a statistical summary of the DataFrame
print(df.describe())


Output:

       age
count  3.000000
mean   30.000000
std    10.000000
min    20.000000
max    40.000000

5. sort_values()

The sort_values() method sorts the DataFrame by one or more columns. You can specify the sorting order (ascending or descending) for each column.

# Sort the DataFrame by the 'age' column in ascending order
print(df.sort_values('age'))


Output:

   name   age city
0  Alice   20  New York
1  Bob     30  Boston
2  Charlie 40  Chicago

6. groupby()

The groupby() method groups the DataFrame by one or more columns. You can then perform aggregation operations (such as mean(), sum(), and count()) on each group.

# Group the DataFrame by the 'city' column and calculate the mean age for each group
print(df.groupby('city')['age'].mean())

Output:

city	 age
Boston   30.0
Chicago  40.0
New York 20.0
Name: age, dtype: float644

7. merge()

The merge() method merges two DataFrames based on one or more common columns. You can specify the type of merge (inner, outer, left, or right) to control how the rows from the two DataFrames are combined.

# Create a second DataFrame
df2 = pd.DataFrame({'city': ['New York', 'Boston'], 'population': [1000000, 600000]})

# Merge the two DataFrames on the 'city' column
print(df.merge(df2, on='city'))


Output:

   name  age city     population
0  Alice 20  New York 1000000
1  Bob   30  Boston   600000

8. pivot_table()

The pivot_table() method creates a pivot table from a DataFrame. A pivot table is a crosstabulation of the data in a DataFrame, with the values in the cells of the table being the results of aggregation operations (such as mean(), sum(), and count()) performed on the data in the DataFrame.

# Create a pivot table with the 'city' column as the rows, the 'age' column as the columns, and the 'population' column as the values
print(df.pivot_table(index='city', columns='age', values='population'))


Conclusion

These are just a few of the most commonly used Pandas methods. By understanding how to use these methods, you can easily clean, explore, and analyze data in Python.

Comments

Archive

Show more

Topics

Show more