Force the number of columns when splitting one column with str.split (panda)


Force the number of columns when splitting one column with str.split (panda)



I don't know if the procedure is possible with str.split. But for example, I have the following column in a dataframe df:


Column
0 a-b-c-d-e-f-g-h-i-j
1 a-a-b-b-c-c
2 a-a-b-b



I know that if I do


df['Column'].str.split('-', expand=True)



then I will have a result like the following:


0 1 2 3 4 5 6 7 8 9
0 a b c d e f g h i j
1 a a b b c c None None None None
2 a a b b None None None None None None



which creates a number of columns depending on the highest number of elements when the splitting is done.



I want to know if is possible to always have 10 columns regardless of the number of elements as long as it is between 0 and 10 and fill the remaining columns with 'None' like here.



So something which would turn the following column:


Column
0 a-b-c-d-e-f-g-h
1 a-a-b-b-c-c
2 a-a-b-b



into:


0 1 2 3 4 5 6 7 8 9
0 a b c d e f g h None None
1 a a b b c c None None None None
2 a a b b None None None None None None





You could create a dataframe first with the desired number of columns, and then append the data to it. You should be able to append the str.split
– Sagasaki
2 days ago




2 Answers
2



reindex afterwards
With improved implementation from user3483203


reindex


df.Column.str.split('-', expand=True).reindex(columns=range(10))

0 1 2 3 4 5 6 7 8 9
0 a b c d e f g h i j
1 a a b b c c None None None None
2 a a b b None None None None None None



A comprehension approach


pd.DataFrame([
(lambda l: l + [None] * (10 - len(l)))(x.split('-'))
for x in df.Column
], df.index)

0 1 2 3 4 5 6 7 8 9
0 a b c d e f g h None None
1 a a b b c c None None None None
2 a a b b None None None None None None





Does it create columns with all None when we have <10 elements ?
– Harv Ipan
2 days ago


None





Yeah, I didn't get that at first.
– piRSquared
2 days ago





Would simply using df.Column.str.split('-', expand=True).reindex(columns=range(10)) work?
– user3483203
2 days ago



df.Column.str.split('-', expand=True).reindex(columns=range(10))





Yes, that would work
– piRSquared
2 days ago





I have tried both and think df.Column.str.split('-', expand=True).reindex(columns=range(10)) is indeed more straightforward. I am going to accept this answer as both solutions are working. Thank you both.
– Sd Junk
2 days ago



df.Column.str.split('-', expand=True).reindex(columns=range(10))



You can use below:


pd.concat([pd.DataFrame([np.nan]*10).T, pd.DataFrame(df['Column'].str.split('-').tolist())], ignore_index=True).iloc[1:]



Output:


0 1 2 3 4 5 6 7 8 9
1 a b c d e f g h NaN NaN
2 a a b b c c NaN NaN NaN NaN
3 a a b b NaN NaN NaN NaN NaN NaN






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Opening a url is failing in Swift

Export result set on Dbeaver to CSV