In a SVM model, will results be viable when I decrease the test size to 0.06


In a SVM model, will results be viable when I decrease the test size to 0.06



I used support vector machine model for classification using iris data set. I used train test split function to split the data-set into training and testing subsets.



when test_size was 0.3 the accuracy was low and then I decreased the size of testing subset to 0.06 and now the accuracy is 1 ie. 100%. obviously, the reason is clear, its because with testing data the amount of noise and fluctuations as decreases.



My question is- we want our model to be efficient but what value of test_size is acceptable for that. at what value of test_size will the result be viable.



here is some line of code from my program-


from sklearn import datasets
from sklearn import svm
import numpy as np
from sklearn import metrics

iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
C=1.0

from sklearn.cross_validation import train_test_split
x_train, x_test, y_train ,y_test = train_test_split(X,y,test_size=0.06, random_state=4)

svc = svm.SVC(kernel='linear', C=C).fit(x_train,y_train)
y_pred = svc.predict(x_test)
print(metrics.accuracy_score(y_test,y_pred))

lin_svc = svm.LinearSVC(C=C).fit(x_train,y_train)
y_pred = lin_svc.predict(x_test)
print(metrics.accuracy_score(y_test,y_pred))

rbf_svc = svm.SVC(kernel='rbf', gamma=0.7, C=C).fit(x_train,y_train)
y_pred =rbf_svc.predict(x_test)
print(metrics.accuracy_score(y_test,y_pred))

poly_svc = svm.SVC(kernel='poly',degree=3, C=C).fit(x_train,y_train)
y_pred = poly_svc.predict(x_test)
print(metrics.accuracy_score(y_test,y_pred))



result is 100% accuracy for all 4 cases.





There are literally hundreds of tutorials on the iris dataset which can give you useful ideas. The general rule of thumb for the size of test set is between 20-30%; the results will certainly be no viable for a size of 6%, but you obviously know this already. Also, the question is off-topic here - it is better suited for the Cross Validated sister site
– desertnaut
Jun 29 at 8:56






In general 0.2 or 0.3 values used in test_size parameter. If you are getting low accuracy with these, try to tune parameters or do some feature engineering. If you use test_size that low(0.06), your model is obviously overfitting.
– Akshay Nevrekar
Jun 29 at 8:58


test_size





You can do the cross-validation so that different data is chosen as test data in each fold, and then average the scores.
– Vivek Kumar
3 hours ago









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Opening a url is failing in Swift

Export result set on Dbeaver to CSV