How to make a list from a range within a dataframe?
How to make a list from a range within a dataframe?
I am trying to create a list from a dataframe from a range.
Here is my column of strings:
df['ID'] =['' ,'2','4', '','8', '','16-18','25', '30-31']
#spaces with no values represent null
I would like to create an output like this:
df['ID'] = [' ', 'ID 2', 'ID 4', 'ID 8',' ', ['ID 16','ID 17', 'ID 18'],
'ID 25',['ID 30','ID 31']]
Can someone please help?
2 Answers
2
IIUC
df.ID.str.split('-').apply(lambda x : x[0] if len(x)<=1 else list(range(int(x[0]),int(x[1])+1)))
Out[182]:
0
1 2
2 4
3
4 8
5
6 [16, 17, 18]
7 25
8 [30, 31]
Name: ID, dtype: object
df = pd.DataFrame()
df['ID'] =[ np.nan, 2,4, np.nan,8, np.nan ,'16-18',25, '30-31']
Then first build your "ID"
for the ranges
"ID"
s = df.ID.str.split("-")
s2 = s[s.notna()].apply(lambda x: ("ID "+pd.Series(list(range(int(x[0]), int(x[1])+1))).astype(str)).tolist())
and then for the regular cases (except NaNs and ranges)
non_na = df.ID[df.ID.notna()]
non_na_range = non_na[~non_na.index.isin(s2.index)]
s3 = "ID " + non_na_range.astype(str)
Then assign
df.loc[s2.index, "ID"] = s2
df.loc[s3.index, "ID"] = s3
Output
ID
0 NaN
1 ID 2
2 ID 4
3 NaN
4 ID 8
5 NaN
6 [ID 16, ID 17, ID 18]
7 ID 25
8 [ID 30, ID 31]
Everything runs well unti I get to the last line. I get thsi error: ValueError: cannot reindex from a duplicate axis. Also, all of my values from my ID column in my df is a string and not a float or int.
– richierich
2 days ago
first line outputs nan for all values and does not split the string
– richierich
2 days ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
thanks, but I get the error: TypeError: object of type 'float' has no len()
– richierich
2 days ago