In Pandas, you can iterate over rows in a DataFrame using various methods. Each row in a DataFrame is represented as a Series, and there are different ways to access the rows for processing or analysis.
#1. Using iterrows()
The iterrows()
method allows you to iterate over the DataFrame and get the index and data for each row as a Series.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# Iterating over rows using iterrows()
for index, row in df.iterrows():
print(f"Index: {index}, Data: {row}")
Output:
Index: 0, Data: Name Alice
Age 25
City New York
Name: 0, dtype: object
Index: 1, Data: Name Bob
Age 30
City London
Name: 1, dtype: object
Index: 2, Data: Name Charlie
Age 35
City Tokyo
Name: 2, dtype: object
#2. Using itertuples()
The itertuples()
method is a faster alternative to iterrows()
. It returns an iterator of namedtuples that represent each row.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# Iterating over rows using itertuples()
for row in df.itertuples(index=False):
print(row)
Output:
Pandas(Name='Alice', Age=25, City='New York')
Pandas(Name='Bob', Age=30, City='London')
Pandas(Name='Charlie', Age=35, City='Tokyo')
#3. Vectorized Operations
In most cases, using vectorized operations is more efficient than iterating over rows in a DataFrame. Pandas is designed to work well with vectorized operations, so whenever possible, try to avoid using explicit loops.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# Performing a vectorized operation on the 'Age' column
df['AgeSquared'] = df['Age'] ** 2
print(df)
Output:
Name Age City AgeSquared
0 Alice 25 New York 625
1 Bob 30 London 900
2 Charlie 35 Tokyo 1225
Using vectorized operations is more efficient and often easier to read and maintain compared to explicit iteration over rows.
0 Comment