Bug: impossible to delete infinite values from DataFrame
Bug: impossible to delete infinite values from DataFrame
This is my DataFrame df
:
df
col1 col2
-0.441406 2.523047
-0.321105 1.555589
-0.412857 2.223047
-0.356610 2.513048
When I check df
, I see that there are some infinite values.
df
np.any(np.isnan(df))
np.all(np.isfinite(df))
False
True
What is the difference between NaN and infinite? Also, how can I delete all infinite values to get True in np.all(np.isfinite(X))
?
np.all(np.isfinite(X))
This is what I tried:
df = df.replace([np.inf, -np.inf], np.nan).dropna(how="all")
But still the check of infinite
gives me True.
infinite
Moreover, .apply(lambda s: s[np.isfinite(s)].dropna()).count()
gives me the same number of rows of all columns as simply df.shape
, which indicates the lack of infinite values. But in this case why np.all(np.isfinite(df))
returns True?
.apply(lambda s: s[np.isfinite(s)].dropna()).count()
df.shape
np.all(np.isfinite(df))
1 Answer
1
Your question is similar to dropping infinite values from dataframes in pandas?,
did you try:
df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all")
np.nan
is not considered as finite
, you may replace np.nan
by any finite number
see that code for example:
np.nan
finite
np.nan
number
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=list('ABC'))
df.loc[0] = [1,np.inf,-np.inf]
print df
print np.all(np.isfinite(df))
df_nan = df.replace([np.inf, -np.inf], np.nan).dropna(subset=df.columns, how="all")
print df_nan
print np.all(np.isfinite(df_nan))
df_0 = df.replace([np.inf, -np.inf], 0).dropna(subset=df.columns, how="all")
print df_0
print np.all(np.isfinite(df_0))
Result:
A B C
0 1.0 inf -inf
False
A B C
0 1.0 NaN NaN
False
A B C
0 1.0 0.0 0.0
True
Not exactly the same because :
.dropna(subset=["col1", "col2"], how="all")
!= .dropna()
– A STEFANI
Jun 28 at 15:46
.dropna(subset=["col1", "col2"], how="all")
.dropna()
Should I mention all the columns? Can I do
.dropna(subset=df.columns, how="all")
?– ScalaBoy
Jun 28 at 15:58
.dropna(subset=df.columns, how="all")
I added the screenshot of my Jupyter Notebook to my post.
– ScalaBoy
Jun 28 at 16:03
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
How is it different from what I posted in my question? This is exactly what I tried and it didn't work.
– ScalaBoy
Jun 28 at 15:38