Skip to content

how to ignore null values in DataFrame when comparing columns

I am new to Pandas and learning. I am reading excel to DataFrame and comparing columns and highlight the column that’s not same. For example if Column A is not same as Column B then highlight the Column B. However I have some null values in Column A and Column B. When I execute the code, I don’t want to highlight the null values in Column B. How can I do that? Below is my code:

file = Path(path to excel)
df = pd.read_excel(file)
def color(x):
   c1 = 'background-color: red'
   m1 = x['AMOUNT A'] != x['AMOUNT B']
   m2 = x['AMOUNT C'] != x['AMOUNT D']
   df = pd.DataFrame('',index=x.index, columns=x.columns)
   df['AMOUNT B'] = np.select([m1],[c1], default='')
   df['AMOUNT D'] = np.select([m2],[c1], default='')
   return df
writer = pd.ExcelWriter(path to excel)
df.style.apply(color,axis=None).to_excel(writer, 'data', index=False)

df before color function:

    AMOUNT A  AMOUNT B  AMOUNT C  AMOUNT D
0     100.0     200.0       NaN       NaN
1     200.0     200.0      45.0      25.0
2     100.0     500.0       NaN       NaN
3       NaN       NaN       NaN       NaN
4       NaN       NaN       NaN       NaN
5     200.0       1.0       NaN       NaN

Output after running script:

enter image description here

Expected output:

enter image description here

Answer

The issue here is NaN == NaN will return False

def color(x):
   c1 = 'background-color: red'
   m1 = x['AMOUNT A'].fillna('') != x['AMOUNT B'].fillna('')
   m2 = x['AMOUNT C'].fillna('') != x['AMOUNT D'].fillna('')
   df = pd.DataFrame('',index=x.index, columns=x.columns)
   df['AMOUNT B'] = np.select([m1],[c1], default='')
   df['AMOUNT D'] = np.select([m2],[c1], default='')
   return df

More Info

np.nan == np.nan
Out[527]: False