Skip to main content

Most Commonly Used Python Pandas Methods

Pandas is a powerful Python library for data manipulation and analysis. It provides a wide range of methods that can be used to perform a variety of tasks, including data cleaning, data exploration, and data visualization.

In this blog post, we will discuss some of the most commonly used Pandas methods, along with examples of how to use them.


1. head()

The head() method returns the first n rows of a DataFrame. This can be useful for getting a quick overview of the data in your DataFrame.


import pandas as pd # Create a DataFrame df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie'], 'age': [20, 30, 40], 'city': ['New York', 'Boston', 'Chicago']}) # Print the first 2 rows of the DataFrame print(df.head(2))


Output: name age city 0 Alice 20 New York 1 Bob 30 Boston


2. tail()

The tail() method returns the last n rows of a DataFrame. This can be useful for getting a quick overview of the data at the end of your DataFrame.

# Print the last 2 rows of the DataFrame print(df.tail(2))


Output:
   name    age city
1  Bob     30  Boston
2  Charlie 40  Chicago

3. info()

The info() method provides a concise summary of the DataFrame, including the number of rows and columns, the data types of each column, and a summary of the missing values in each column.


# Print a summary of the DataFrame
print(df.info())

<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 3 non-null object 1 age 3 non-null int64 2 city 3 non-null object dtypes: int64(1), object(2)


4. describe()

The describe() method provides a statistical summary of the DataFrame, including the mean, median, standard deviation, minimum, and maximum values for each numeric column.

# Print a statistical summary of the DataFrame
print(df.describe())


Output: age count 3.000000 mean 30.000000 std 10.000000 min 20.000000 max 40.000000


5. sort_values()

The sort_values() method sorts the DataFrame by one or more columns. You can specify the sorting order (ascending or descending) for each column.

# Sort the DataFrame by the 'age' column in ascending order
print(df.sort_values('age'))


Output:

   name   age city
0  Alice   20  New York
1  Bob     30  Boston
2  Charlie 40  Chicago

6. groupby()

The groupby() method groups the DataFrame by one or more columns. You can then perform aggregation operations (such as mean(), sum(), and count()) on each group.

# Group the DataFrame by the 'city' column and calculate the mean age for each group print(df.groupby('city')['age'].mean())


Output:

city	 age
Boston   30.0
Chicago  40.0
New York 20.0

Name: age, dtype: float644


7. merge()

The merge() method merges two DataFrames based on one or more common columns. You can specify the type of merge (inner, outer, left, or right) to control how the rows from the two DataFrames are combined.

# Create a second DataFrame df2 = pd.DataFrame({'city': ['New York', 'Boston'], 'population': [1000000, 600000]}) # Merge the two DataFrames on the 'city' column print(df.merge(df2, on='city'))


Output: name age city population 0 Alice 20 New York 1000000 1 Bob 30 Boston 600000


8. pivot_table()

The pivot_table() method creates a pivot table from a DataFrame. A pivot table is a crosstabulation of the data in a DataFrame, with the values in the cells of the table being the results of aggregation operations (such as mean(), sum(), and count()) performed on the data in the DataFrame.

# Create a pivot table with the 'city' column as the rows, the 'age' column as the columns, and the 'population' column as the values print(df.pivot_table(index='city', columns='age', values='population'))


Conclusion

These are just a few of the most commonly used Pandas methods. By understanding how to use these methods, you can easily clean, explore, and analyze data in Python.

Comments

Archive

Show more

Topics

Show more