Operations on the amount of values using python pandas


Operations on the amount of values using python pandas



I have 2 dataframes 'transactions' and 'offsets'



offsets:


Contact Account Name
0 TODD HOWARD
1 TODD HOWARD
2 JEFF COX
3 JEFF COX
4 TODD HOWARD
5 JEFF COX
6 MIKE BALDWIN



transactions:


Contact Account Name
0 TODD HOWARD
1 TODD HOWARD
2 JEFF COX
3 JEFF COX
4 TODD HOWARD
5 JEFF COX
6 TODD HOWARD
7 MIKE BALDWIN
8 MIKE BALDWIN
9 JEFF COX
10 JC WHITE



What it want to do:
1) Is to count each unique values. For this purpose I've used:


df1 = offsets.groupby('Contact Account Name').size()
df2 = transactions.groupby('Contact Account Name').size()



And I've got



df1:


Contact Account Name
TODD HOWARD 3
JEFF COX 3
MIKE BALDWIN 1



df2:


Contact Account Name
JC WHITE 1
TODD HOWARD 4
JEFF COX 4
MIKE BALDWIN 2



2) I want to merge both dataframes. I've tried merge but it didn't work.


merge



3) I want to create another dataframe and to calculate percentage of the offsets out of overall transactions.



What result i want to see at the end:


Contact Account Name Offset Percentage
TODD HOWARD 75
JEFF COX 75
MIKE BALDWIN 50
JC WHITE 100



Thanks in advance!




1 Answer
1



Output of aggregation is Series, so is possible divide by div with multiple by mul and last reset_index:


Series


div


mul


reset_index


df = df1.div(df2, fill_value=1).mul(100).reset_index(name='Offset Percentage')
print (df)
Contact Account Name Offset Percentage
0 JC WHITE 100.0
1 JEFF COX 75.0
2 MIKE BALDWIN 50.0
3 TODD HOWARD 75.0



Similar solution with value_counts:


value_counts


df1 = offsets['Contact Account Name'].value_counts()
df2 = transactions['Contact Account Name'].value_counts()

df = (df1.div(df2, fill_value=1)
.mul(100)
.rename_axis('Contact Account Name')
.reset_index(name='Offset Percentage'))
print (df)
Contact Account Name Offset Percentage
0 JC WHITE 100.0
1 JEFF COX 75.0
2 MIKE BALDWIN 50.0
3 TODD HOWARD 75.0



If need join both Series together call concat:


concat


df = pd.concat([df2, df1], axis=1, keys=('Offset Percentage','b'))
df['Offset Percentage'] = df.b.div(df['Offset Percentage'], fill_value=1).mul(100)
df = df.drop('b', 1).rename_axis('Contact Account Name').reset_index()
print (df)
Contact Account Name Offset Percentage
0 JC WHITE 100.0
1 JEFF COX 75.0
2 MIKE BALDWIN 50.0
3 TODD HOWARD 75.0






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Export result set on Dbeaver to CSV

The forked VM terminated without saying properly goodbye. VM crash or System.exit called