Python Japanese code encoding:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 0: invalid start byte
Python Japanese code encoding:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 0: invalid start byte
I use the following code(python3.6) to insert csv data into the database:
# -*- coding: UTF-8 -*-
import pandas as pd
import pymssql
def insert_report_pn_dictionary(server, user, password, database):
dict_list =
tw_df = pd.read_csv(r'D:/dict.csv', encoding='utf-8')
word_list = list(tw_df['WORD'])
pn_list = list(tw_df['PN'])
pn_dict = dict(zip(word_list, pn_list))
for key, value in pn_dict.items():
dict_list.append((key, value))
try:
conn = pymssql.connect(server, user, password, database)
cur = conn.cursor()
sql = ' insert into report_pn_dictionary (dict_keyword,dict_pn) '
' values(%s, %s) '
cur.executemany(sql, dict_list)
conn.commit()
except pymssql.Error as ex:
raise ex
except Exception as ex:
raise ex
finally:
conn.close()
if __name__=="__main__":
server = '10.10.10.10'
user = 'test'
password = 'test'
database = 'DBTest'
insert_report_pn_dictionary(server, user, password, database)
But there is a error. The error message is:
E:Anaconda3python.exe E:/test_opencv/TRkeywordpn/set_dictionary.py
Traceback (most recent call last):
File "E:/test_opencv/TRkeywordpn/set_dictionary.py", line 33, in <module>
insert_report_pn_dictionary(server, user, password, database)
File "E:/test_opencv/TRkeywordpn/set_dictionary.py", line 7, in insert_report_pn_dictionary
tw_df = pd.read_csv(r'D:/dict.csv', encoding='utf-8')
File "E:Anaconda3libsite-packagespandasioparsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "E:Anaconda3libsite-packagespandasioparsers.py", line 401, in _read
data = parser.read()
File "E:Anaconda3libsite-packagespandasioparsers.py", line 939, in read
ret = self._engine.read(nrows)
File "E:Anaconda3libsite-packagespandasioparsers.py", line 1508, in read
data = self._reader.read(nrows)
File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)
File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691)
File "pandasparser.pyx", line 947, in pandas.parser.TextReader._read_rows (pandasparser.c:11728)
File "pandasparser.pyx", line 1049, in pandas.parser.TextReader._convert_column_data (pandasparser.c:13162)
File "pandasparser.pyx", line 1108, in pandas.parser.TextReader._convert_tokens (pandasparser.c:14116)
File "pandasparser.pyx", line 1206, in pandas.parser.TextReader._convert_with_dtype (pandasparser.c:16172)
File "pandasparser.pyx", line 1222, in pandas.parser.TextReader._string_convert (pandasparser.c:16400)
File "pandasparser.pyx", line 1458, in pandas.parser._string_box_utf8 (pandasparser.c:22072)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 0: invalid start byte
My CSV content likes this:
WORD PN
優れる 1
良い 0.999995
喜ぶ 0.999979
褒める 0.999979
めでたい 0.999645
賢い 0.999486
善い 0.999314
I have more than 50 thousand lines data.
How to modify my code?I think this is the problem of Japanese coding.
SHIFT_JIS
@bobrobbob I modified it as you said,it worked,thanks!
– MadFrog
12 hours ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
your csv is not in utf-8. try to guess which encoding it uses.
SHIFT_JIS
may be the one– bobrobbob
Jun 29 at 9:03