Force the number of columns when splitting one column with str.split (panda)
Force the number of columns when splitting one column with str.split (panda)
I don't know if the procedure is possible with str.split. But for example, I have the following column in a dataframe df:
Column
0 a-b-c-d-e-f-g-h-i-j
1 a-a-b-b-c-c
2 a-a-b-b
I know that if I do
df['Column'].str.split('-', expand=True)
then I will have a result like the following:
0 1 2 3 4 5 6 7 8 9
0 a b c d e f g h i j
1 a a b b c c None None None None
2 a a b b None None None None None None
which creates a number of columns depending on the highest number of elements when the splitting is done.
I want to know if is possible to always have 10 columns regardless of the number of elements as long as it is between 0 and 10 and fill the remaining columns with 'None' like here.
So something which would turn the following column:
Column
0 a-b-c-d-e-f-g-h
1 a-a-b-b-c-c
2 a-a-b-b
into:
0 1 2 3 4 5 6 7 8 9
0 a b c d e f g h None None
1 a a b b c c None None None None
2 a a b b None None None None None None
2 Answers
2
reindex
afterwards
With improved implementation from user3483203
reindex
df.Column.str.split('-', expand=True).reindex(columns=range(10))
0 1 2 3 4 5 6 7 8 9
0 a b c d e f g h i j
1 a a b b c c None None None None
2 a a b b None None None None None None
A comprehension approach
pd.DataFrame([
(lambda l: l + [None] * (10 - len(l)))(x.split('-'))
for x in df.Column
], df.index)
0 1 2 3 4 5 6 7 8 9
0 a b c d e f g h None None
1 a a b b c c None None None None
2 a a b b None None None None None None
Does it create columns with all
None
when we have <10 elements ?– Harv Ipan
2 days ago
None
Yeah, I didn't get that at first.
– piRSquared
2 days ago
Would simply using
df.Column.str.split('-', expand=True).reindex(columns=range(10))
work?– user3483203
2 days ago
df.Column.str.split('-', expand=True).reindex(columns=range(10))
Yes, that would work
– piRSquared
2 days ago
I have tried both and think
df.Column.str.split('-', expand=True).reindex(columns=range(10))
is indeed more straightforward. I am going to accept this answer as both solutions are working. Thank you both.– Sd Junk
2 days ago
df.Column.str.split('-', expand=True).reindex(columns=range(10))
You can use below:
pd.concat([pd.DataFrame([np.nan]*10).T, pd.DataFrame(df['Column'].str.split('-').tolist())], ignore_index=True).iloc[1:]
Output:
0 1 2 3 4 5 6 7 8 9
1 a b c d e f g h NaN NaN
2 a a b b c c NaN NaN NaN NaN
3 a a b b NaN NaN NaN NaN NaN NaN
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You could create a dataframe first with the desired number of columns, and then append the data to it. You should be able to append the str.split
– Sagasaki
2 days ago