Converting a .csv.gz to .csv in Python 2.7


Converting a .csv.gz to .csv in Python 2.7



I have read the documentation and a few additional posts on SO and other various places, but I can't quite figure out this concept:



When you call csvFilename = gzip.open(filename, 'rb') then reader = csv.reader(open(csvFilename)), is that reader not a valid csv file?


csvFilename = gzip.open(filename, 'rb')


reader = csv.reader(open(csvFilename))


reader



I am trying to solve the problem outlined below, and am getting a coercing to Unicode: need string or buffer, GzipFile found error on line 41 and 7 (highlighted below), leading me to believe that the gzip.open and csv.reader do not work as I had previously thought.


coercing to Unicode: need string or buffer, GzipFile found



Problem I am trying to solve



I am trying to take a results.csv.gz and convert it to a results.csv so that I can turn the results.csv into a python dictionary and then combine it with another python dictionary.


results.csv.gz


results.csv


results.csv



File 1:


alertFile = payload.get('results_file')
alertDataCSV = rh.dataToDict(alertFile) # LINE 41
alertDataTotal = rh.mergeTwoDicts(splunkParams, alertDataCSV)



Calls File 2:


import gzip
import csv

def dataToDict(filename):
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename)) # LINE 7
alertData={}
for row in reader:
alertData[row[0]]=row[1:]
return alertData

def mergeTwoDicts(dictA, dictB):
dictC = dictA.copy()
dictC.update(dictB)
return dictC



*edit: also forgive my non-PEP style of naming in Python





Btw, if you're not using the whole dict you may use ChainMap and save the copying
– Bharel
Jun 29 at 19:18


ChainMap





Appreciate the help, but I believe ChainMap is a Python 3 feature, which sadly I am unable to use on this particular project.
– deversEatsALot
Jun 29 at 19:29




1 Answer
1



gzip.open returns a file-like object (same as what plain open returns), not the name of the decompressed file. Simply pass the result directly to csv.reader and it will work (the csv.reader will receive the decompressed lines). csv does expect text though, so on Python 3 you need to open it to read as text (on Python 2 'rb' is fine, the module doesn't deal with encodings, but then, neither does the csv module). Simply change:


gzip.open


open


csv.reader


csv.reader


csv


'rb'


csv


csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))



to:


# Python 2
csvFile = gzip.open(filename, 'rb')
reader = csv.reader(csvFile) # No reopening involved

# Python 3
csvFile = gzip.open(filename, 'rt', newline='') # Open in text mode, not binary, no line ending translation
reader = csv.reader(csvFile) # No reopening involved





Note: If the file encoding might not match your locale encoding, make sure to pass encoding='utf-8' (or 'latin-1', or utf-16' or whatever the file's encoding really is) to gzip.open to make sure it is decoding correctly.
– ShadowRanger
Jun 29 at 19:07


encoding='utf-8'


'latin-1'


utf-16'


gzip.open





Thank you so much, I felt stupid asking but I had already wasted so much time trying to figure out what was wrong. (and yes, my naming conventions suck)
– deversEatsALot
Jun 29 at 19:10






Just realized the question is for Python 2, so I've made a few more updates to address both.
– ShadowRanger
Jun 29 at 19:11






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Export result set on Dbeaver to CSV

Opening a url is failing in Swift