Converting a .csv.gz to .csv in Python 2.7
Converting a .csv.gz to .csv in Python 2.7
I have read the documentation and a few additional posts on SO and other various places, but I can't quite figure out this concept:
When you call csvFilename = gzip.open(filename, 'rb')
then reader = csv.reader(open(csvFilename))
, is that reader
not a valid csv file?
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))
reader
I am trying to solve the problem outlined below, and am getting a coercing to Unicode: need string or buffer, GzipFile found
error on line 41 and 7 (highlighted below), leading me to believe that the gzip.open and csv.reader do not work as I had previously thought.
coercing to Unicode: need string or buffer, GzipFile found
Problem I am trying to solve
I am trying to take a results.csv.gz
and convert it to a results.csv
so that I can turn the results.csv
into a python dictionary and then combine it with another python dictionary.
results.csv.gz
results.csv
results.csv
File 1:
alertFile = payload.get('results_file')
alertDataCSV = rh.dataToDict(alertFile) # LINE 41
alertDataTotal = rh.mergeTwoDicts(splunkParams, alertDataCSV)
Calls File 2:
import gzip
import csv
def dataToDict(filename):
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename)) # LINE 7
alertData={}
for row in reader:
alertData[row[0]]=row[1:]
return alertData
def mergeTwoDicts(dictA, dictB):
dictC = dictA.copy()
dictC.update(dictB)
return dictC
*edit: also forgive my non-PEP style of naming in Python
ChainMap
Appreciate the help, but I believe ChainMap is a Python 3 feature, which sadly I am unable to use on this particular project.
– deversEatsALot
Jun 29 at 19:29
1 Answer
1
gzip.open
returns a file-like object (same as what plain open
returns), not the name of the decompressed file. Simply pass the result directly to csv.reader
and it will work (the csv.reader
will receive the decompressed lines). csv
does expect text though, so on Python 3 you need to open it to read as text (on Python 2 'rb'
is fine, the module doesn't deal with encodings, but then, neither does the csv
module). Simply change:
gzip.open
open
csv.reader
csv.reader
csv
'rb'
csv
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))
to:
# Python 2
csvFile = gzip.open(filename, 'rb')
reader = csv.reader(csvFile) # No reopening involved
# Python 3
csvFile = gzip.open(filename, 'rt', newline='') # Open in text mode, not binary, no line ending translation
reader = csv.reader(csvFile) # No reopening involved
Note: If the file encoding might not match your locale encoding, make sure to pass
encoding='utf-8'
(or 'latin-1'
, or utf-16'
or whatever the file's encoding really is) to gzip.open
to make sure it is decoding correctly.– ShadowRanger
Jun 29 at 19:07
encoding='utf-8'
'latin-1'
utf-16'
gzip.open
Thank you so much, I felt stupid asking but I had already wasted so much time trying to figure out what was wrong. (and yes, my naming conventions suck)
– deversEatsALot
Jun 29 at 19:10
Just realized the question is for Python 2, so I've made a few more updates to address both.
– ShadowRanger
Jun 29 at 19:11
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Btw, if you're not using the whole dict you may use
ChainMap
and save the copying– Bharel
Jun 29 at 19:18