How to make a list from a range within a dataframe?

I am trying to create a list from a dataframe from a range.

Here is my column of strings:

df['ID'] =['' ,'2','4', '','8', '','16-18','25', '30-31'] #spaces with no values represent null

I would like to create an output like this:

df['ID'] = [' ', 'ID 2', 'ID 4', 'ID 8',' ', ['ID 16','ID 17', 'ID 18'], 'ID 25',['ID 30','ID 31']]

Can someone please help?

2 Answers
2

IIUC

df.ID.str.split('-').apply(lambda x : x[0] if len(x)<=1 else list(range(int(x[0]),int(x[1])+1))) Out[182]: 0 1 2 2 4 3 4 8 5 6 [16, 17, 18] 7 25 8 [30, 31] Name: ID, dtype: object

thanks, but I get the error: TypeError: object of type 'float' has no len()
– richierich
2 days ago

df = pd.DataFrame() df['ID'] =[ np.nan, 2,4, np.nan,8, np.nan ,'16-18',25, '30-31']

Then first build your "ID" for the ranges

"ID"

s = df.ID.str.split("-") s2 = s[s.notna()].apply(lambda x: ("ID "+pd.Series(list(range(int(x[0]), int(x[1])+1))).astype(str)).tolist())

and then for the regular cases (except NaNs and ranges)

non_na = df.ID[df.ID.notna()] non_na_range = non_na[~non_na.index.isin(s2.index)] s3 = "ID " + non_na_range.astype(str)

Then assign

df.loc[s2.index, "ID"] = s2 df.loc[s3.index, "ID"] = s3

Output

ID 0 NaN 1 ID 2 2 ID 4 3 NaN 4 ID 8 5 NaN 6 [ID 16, ID 17, ID 18] 7 ID 25 8 [ID 30, ID 31]

Everything runs well unti I get to the last line. I get thsi error: ValueError: cannot reindex from a duplicate axis. Also, all of my values from my ID column in my df is a string and not a float or int.
– richierich
2 days ago

first line outputs nan for all values and does not split the string
– richierich
2 days ago

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Search This Blog

Mgiyuk