Skip to content

How to rename a header and add values (to this column) based on other header name?

I have multiple Pandas dataframes like this one (for different years):

df1=

        Unnamed: 0           b      c     Monthly Flow (2018)
1              nan   -0.041619  43.91               -0.041619
2              nan    0.011913  43.91               -0.041619
3              nan   -0.048801  43.91               -0.041619
4              nan    0.002857  43.91               -0.041619
5              nan    0.002204  43.91               -0.041619
6              nan   -0.007692  43.91               -0.041619
7              nan   -0.014992  43.91               -0.041619
8              nan   -0.035381  43.91               -0.041619

And I would like to assign to the nan the year in the Monthly Flow (2018) column, thus achieving this output:

       Year           b      c     Monthly Flow (2018)
1      2018   -0.041619  43.91               -0.041619
2      2018    0.011913  43.91               -0.041619
3      2018   -0.048801  43.91               -0.041619
4      2018    0.002857  43.91               -0.041619
5      2018    0.002204  43.91               -0.041619
6      2018   -0.007692  43.91               -0.041619
7      2018   -0.014992  43.91               -0.041619
8      2018   -0.035381  43.91               -0.041619

I know how to replace these nan by a specific year, one dataframe at a time.

But, since I have a lot of dataframes (and will have more in the future), I would like to know a way to do this automatically, for example by extracting the year value from column Monthly Flow (2018).

Answer

Assuming Monthly flow is always the 5th column, you can do it like this:

import re
df = df.rename(columns={'Unnamed: 0': 'Year'})
df.iloc[:,0] = re.search('d{4}', df.columns[4]).group(0)

Explanation:

re.search looks for 4 numbers in a row and extracts them from the fifth column.

I rename the Unnamed column as Year.

Working code:

import pandas as pd
import numpy as np
import re
df = pd.DataFrame({'Unnamed: 0': {0: np.nan},
 'a': {0: 1},
 'a2': {0: 1},
 'a3': {0: 1},
 'Monthly Flow (2018)': {0: 'b'}})
df = df.rename(columns={'Unnamed: 0': 'Year'})
df.iloc[:,0] = re.search('d{4}', df.columns[4]).group(0)