Pandas: Avoid inplace¶
Sourcery suggestion id: pandas-avoid-inplace
¶
Available starting with version 1.1.0
Description¶
Don't use inplace
for methods that always create a copy under the hood.
Before¶
import pandas as pd
df = pd.DataFrame(
[["Python", 190], ["JavaScript", 33],],
columns=["Language", "Number of rules"],
)
df.sort_values("Language", inplace=True)
After¶
import pandas as pd
df = pd.DataFrame(
[["Python", 190], ["JavaScript", 33],],
columns=["Language", "Number of rules"],
)
df = df.sort_values("Language")
Before¶
import pandas as pd
df = pd.DataFrame(
[["Python", 190], ["JavaScript", 33],],
columns=["Language", "Number of rules"],
)
df.copy().sort_values("Language", inplace=True)
After¶
import pandas as pd
df = pd.DataFrame(
[["Python", 190], ["JavaScript", 33],],
columns=["Language", "Number of rules"],
)
df.copy().sort_values("Language")
Explanation¶
Some DataFrame
methods can never operate inplace. Their operation (like reordering rows) requires copying, so they create a copy even if you provide inplace=True
.
For these methods, inplace
doesn't bring a performance gain.
It's only a "syntactic sugar for reassigning the new result to the calling DataFrame/Series."
Drawbacks of using inplace
:
- You can't use method chaining with
inplace=True
- The
inplace
keyword complicates type annotations (because the return value depends on the value ofinplace
)- Using
inplace=True
gives code that mutates the state of an object and thus has side-effects. That can introduce subtle bugs and is harder to debug.
This PDEP suggests to deprecate the inplace
option for methods that can never operate inplace.
Best practice: Explicitly reassign the result to the caller DataFrame
.
E.g.
df = df.sort_values("language")
In cases, where the caller isn't a variable but an expression, inplace
doesn't have an effect anyway.
df.copy().sort_values("Language", inplace=True)
copy
creates a new DataFrame
object, which isn't assigned to any variable.
inplace
doesn't change the df
object, but this copy result object instead.
In this case, the only effect of inplace
is that the expression returns None
instead of a new DataFrame
.
Thus, it should be omitted for clarity.
df.copy().sort_values("Language")
DataFrame Methods Affected¶
These DataFrame
methods always create a copy under the hood even if you provide the inplace
keyword.
In PDEP-8, they are mentioned as "Group 4" methods.
dropna
drop_duplicates
sort_values
sort_index
eval
query