Remove categories having count as 0 in pandas groupby
Remove categories having count as 0 in pandas groupby
I want to remove categories having count as 0 after pandas value_counts function()
My data is as follows:
categories:
Index(['Average', 'Good', 'Poor', ,'VeryGood', 'VeryPoor'],
dtype='object')
Output of value counts:
score Frequency
VG 21
G 15
A 63
P 27
VP 0
My result should be as
score Frequency
VG 21
G 15
A 63
P 27
I want to store this in a dataframe and plot bargraph of this. I don't want to show VP in graph as it's count is 0 and hence eliminate that category
My code:
quality_scores=quality.SCORE.value_counts()
quality_scores=pd.Series.to_frame(quality_scores)
quality_scores=quality_scores.rename(columns={'SCORE':
'Frequency'})
quality_scores['Score']=quality_scores.index
quality_scores=quality_scores.reset_index(drop=True)
quality_scores = quality_scores[quality_scores.Frequency != 0]
quality_scores
I am editing the answer based on the comments:
I got the answers correct when I print the dataframe. However, when I check categories using quality_scores['Score'].cat.categories, I still see the VP category which shouldn't be displayed.
Also, in the graph, I don't expect to see the VP category but its displayed on axis.
The following is code for the graph:
plt.figure(figsize=(15,7))
quality_graph=sns.barplot(y=quality_scores["Frequency"],
x=quality_scores["Score"])
quality_graph.set_xlabel('Frequency')
quality_graph.set_title('Score Distribution of Quality
Measure:',fontsize=25)
plt.savefig('graphsQuality_Measure.png')
If you could see there are many blank categories on the graph. This are not actually present in the quality_scores dataframe.
quality_scores = quality_scores[quality_scores.Frequency != 0]
i didn't get you.
– Marcelo BD
Jun 29 at 15:38
In your question, you say you want to filter out frequency == 0. Isn't that what you're doing already?
– jpp
Jun 29 at 15:38
yes but its not working. when i plot the grapg it still shows the VP with count 0
– Marcelo BD
Jun 29 at 15:55
Could you add to your question 1) sample code, and 2) the result you're getting that you're not expecting? What is the value of quality_scores after you run the second-to-last line in your code, "quality_scores = quality_scores[quality_scores.Frequency != 0]"?
– David Gaertner
Jun 29 at 15:55
1 Answer
1
Remember that case matters: "SCORE" and "Score" are not the same. You created two columns, one called "SCORE" and the other called "Score".
I ran the following code, and it worked as expected.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
grades = ['VG','G','A','P','VP']
counts = [21,15,63,27,0]
d = { 'Score' : grades, 'Frequency': counts }
quality_scores = pd.DataFrame(data = d)
quality_scores=quality_scores.reset_index(drop=True)
quality_scores = quality_scores[quality_scores.Frequency != 0]
plt.figure(figsize=(15,7))
quality_graph=sns.barplot(y=quality_scores['Frequency'], x=quality_scores['Score'])
quality_graph.set_xlabel('Frequency')
quality_graph.set_title('Score Distribution of Quality Measure:',fontsize=25)
plt.savefig('Quality_Measure.png')
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
What's wrong with what you're doing now, i.e.
quality_scores = quality_scores[quality_scores.Frequency != 0]
?– jpp
Jun 29 at 15:29