Read CSV file with features and labels in the same row in Tensorflow
Read CSV file with features and labels in the same row in Tensorflow
I have a .csv file with around 5000 rows and 3757 columns. The first 3751 columns of each row are the features and the last 6 columns are the labels. Each row is a set of features-labels pair.
I'd like to know if there are built-in functions or any fast ways that I can:
Basically I want to train a DNN model with 3751 features and 1 label and I'd like the output of the parsing function be fed into the following function for training:
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": np.array(training_set.data)},
y=np.array(training_set.target),
num_epochs=None,
shuffle=True)
I know some functions like "tf.contrib.learn.datasets.base.load_csv_without_header" can do similar things but it is already deprecated.
2 Answers
2
You could look into tf.data.Dataset
's input pipelines (LINK). What you basically do is you can read a csv file, possibly batch/shuffle/map it somehow and create an iterator over the dataset. Whenever you evaluate iterator.get_next()
, you get a number of lines from your csv which is equal to batch size. Concerning your separation of features and labels, you can then simply access single elements of the batch with standard python syntax, e.g. features = batch[:-6]
and label = batch[-1]
and feed them to whatever function you like.
tf.data.Dataset
iterator.get_next()
features = batch[:-6]
label = batch[-1]
On the tensorflow site, there's an in-depth tutorial about how to use these input pipelines (LINK).
train_features_interim =
train_labels_interim =
with open(train_file) as f:
csvreader = csv.reader(f)
for row in csvreader:
train_features_interim.append(row)
train_features_interim = pd.DataFrame(train_features_interim)
a = len(train_features_interim.columns)-6
train_labels_interim = train_features_interim.iloc[:, a:a+1] #train one label first
train_features_interim = train_features_interim.iloc[:, :a]
train_features_numpy = np.asarray(train_features_interim, dtype=np.float32)
train_labels_numpy = np.asarray(train_labels_interim, dtype=np.float32)
I have this working now. Though it is not very clean but it works. I can tune the "a:a+1" part to decide how many( or which) columns I'd like to put inot train_labels_interim.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Comments
Post a Comment