Convert python nested JSON-like data to dataframe
Convert python nested JSON-like data to dataframe
My records looks like this and I need to write it to a csv file:
my_data={"data":[{"id":"xyz","type":"book","attributes":{"doc_type":"article","action":"cut"}}]}
which looks like json, but the next record starts with "data"
and not "data1"
which forces me to read each record separately. Then, I convert it to a dict using eval()
, to iterate thru keys and values for a certain path to get to the values I need. Then, I generate a list of keys and values based on the keys I need. Then, a pd.dataframe()
converts that list into a dataframe which I know how to convert to csv. My code that works is below. But I am sure there are better ways to do this. Mine scales poorly. Thx.
"data"
"data1"
eval()
pd.dataframe()
counter=1
k=
v=
res=
m=0
for line in f2:
jline=eval(line)
counter +=1
for items in jline:
k.append(jline[u'data'][0].keys())
v.append(jline[u'data'][0].values())
print 'keys are:', k
i=0
j=0
while i <3 :
while j <3:
if k[i][j]==u'id':
res.append(v[i][j])
j += 1
i += 1
#res is my result set
del k[:]
del v[:]
'data'
data
1 Answer
1
Changing my_data to be:
my_data = [{"id":"xyz","type":"book","attributes":{"doc_type":"article","action":"cut"}}, # Data One
{"id":"xyz2","type":"book","attributes":{"doc_type":"article","action":"cut"}}, # Data Two
{"id":"xyz3","type":"book","attributes":{"doc_type":"article","action":"cut"}}] # Data Three
You can dump this directly into a dataframe as so:
mydf = pd.DataFrame(my_data)
It's not clear what your data path would be, but if you are looking for specific combinations of id
, type
, etc. You could explicitly search
id
type
def find_my_way(data, pattern):
# pattern = {'id':'someid', 'type':'sometype'...}
res =
for row in data:
if row.get('id') == pattern.get('id'):
res.append(row)
return row
mydf = pd.DataFrame(find_my_way(mydata, pattern))
EDIT:
Without going into how the api works, in pseudo-code, you'll want to do something like the following:
my_objects =
while calls < maximum:
my_data = call_the_api(params)
data = my_data.get('data')
if not data:
continue
if isinstance(data, list):
my_objects = [*data, *my_objects]
elif isinstance(data, {}):
my_objects = [{**data}, *my_objects]
# This will unpack the data response into a list that you can then load into a DataFrame with the attributes from the api as the columns
df = pd.DataFrame(my_objects)
That should get you all of the data (unfiltered) into a DataFrame
Two issues: 1. It looks like you modified my data set and removed '{"data": ...}'. How can I do this flattening by code? 2. My objective is not searching. I'd like to extract the data exhaustively and write to csv. How would you do that?
– Y. Eman
2 days ago
I'm not sure I understand "extract the data exhaustively", do you just want all of the data into a csv without filtering? I did remove 'data', since you want that key to point to multiple values, per "but the next record starts with "data" and not "data1" ." Unless I misunderstood, and you want different keys
– C.Nivs
2 days ago
Sorry, I am not clear. Yes, I want all the data without filtering to be written into a csv. Each record has an id. I would like a csv with one row per record and columns loaded with the values. The keys will be my column headers. The source of the data is from here: github.com/dowjones/dj-dna-streams-python
– Y. Eman
2 days ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
It looks like it may be completely unnecessary for you to use the
'data'
key. Because of how dictionaries work, you cannot havedata
map to more than one value. If you just use a list, you can just rely on positional indices and the dataframe can index them for you– C.Nivs
2 days ago