Python
Published in Python
avatar
3 minutes read

Iterating over Rows in a DataFrame in Pandas

Iterating over Rows in a DataFrame in Pandas

In Pandas, you can iterate over rows in a DataFrame using various methods. Each row in a DataFrame is represented as a Series, and there are different ways to access the rows for processing or analysis.

#1. Using iterrows()

The iterrows() method allows you to iterate over the DataFrame and get the index and data for each row as a Series.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'London', 'Tokyo']}

df = pd.DataFrame(data)

# Iterating over rows using iterrows()
for index, row in df.iterrows():
    print(f"Index: {index}, Data: {row}")

Output:

Index: 0, Data: Name       Alice
Age           25
City    New York
Name: 0, dtype: object
Index: 1, Data: Name       Bob
Age        30
City    London
Name: 1, dtype: object
Index: 2, Data: Name     Charlie
Age            35
City    Tokyo
Name: 2, dtype: object

#2. Using itertuples()

The itertuples() method is a faster alternative to iterrows(). It returns an iterator of namedtuples that represent each row.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'London', 'Tokyo']}

df = pd.DataFrame(data)

# Iterating over rows using itertuples()
for row in df.itertuples(index=False):
    print(row)

Output:

Pandas(Name='Alice', Age=25, City='New York')
Pandas(Name='Bob', Age=30, City='London')
Pandas(Name='Charlie', Age=35, City='Tokyo')

#3. Vectorized Operations

In most cases, using vectorized operations is more efficient than iterating over rows in a DataFrame. Pandas is designed to work well with vectorized operations, so whenever possible, try to avoid using explicit loops.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'London', 'Tokyo']}

df = pd.DataFrame(data)

# Performing a vectorized operation on the 'Age' column
df['AgeSquared'] = df['Age'] ** 2

print(df)

Output:

      Name  Age      City  AgeSquared
0    Alice   25  New York         625
1      Bob   30    London         900
2  Charlie   35     Tokyo        1225

Using vectorized operations is more efficient and often easier to read and maintain compared to explicit iteration over rows.

0 Comment