Convert python nested JSON-like data to dataframe

My records looks like this and I need to write it to a csv file:

my_data={"data":[{"id":"xyz","type":"book","attributes":{"doc_type":"article","action":"cut"}}]}

which looks like json, but the next record starts with "data" and not "data1" which forces me to read each record separately. Then, I convert it to a dict using eval(), to iterate thru keys and values for a certain path to get to the values I need. Then, I generate a list of keys and values based on the keys I need. Then, a pd.dataframe() converts that list into a dataframe which I know how to convert to csv. My code that works is below. But I am sure there are better ways to do this. Mine scales poorly. Thx.

"data"

"data1"

eval()

pd.dataframe()

counter=1 k= v= res= m=0 for line in f2: jline=eval(line) counter +=1 for items in jline: k.append(jline[u'data'][0].keys()) v.append(jline[u'data'][0].values()) print 'keys are:', k i=0 j=0 while i <3 : while j <3: if k[i][j]==u'id': res.append(v[i][j]) j += 1 i += 1 #res is my result set del k[:] del v[:]

It looks like it may be completely unnecessary for you to use the 'data' key. Because of how dictionaries work, you cannot have data map to more than one value. If you just use a list, you can just rely on positional indices and the dataframe can index them for you
– C.Nivs
2 days ago

'data'

data

1 Answer
1

Changing my_data to be:

my_data = [{"id":"xyz","type":"book","attributes":{"doc_type":"article","action":"cut"}}, # Data One {"id":"xyz2","type":"book","attributes":{"doc_type":"article","action":"cut"}}, # Data Two {"id":"xyz3","type":"book","attributes":{"doc_type":"article","action":"cut"}}] # Data Three

You can dump this directly into a dataframe as so:

mydf = pd.DataFrame(my_data)

It's not clear what your data path would be, but if you are looking for specific combinations of id, type, etc. You could explicitly search

id

type

def find_my_way(data, pattern): # pattern = {'id':'someid', 'type':'sometype'...} res = for row in data: if row.get('id') == pattern.get('id'): res.append(row) return row mydf = pd.DataFrame(find_my_way(mydata, pattern))

EDIT:

Without going into how the api works, in pseudo-code, you'll want to do something like the following:

my_objects = while calls < maximum: my_data = call_the_api(params) data = my_data.get('data') if not data: continue if isinstance(data, list): my_objects = [*data, *my_objects] elif isinstance(data, {}): my_objects = [{**data}, *my_objects] # This will unpack the data response into a list that you can then load into a DataFrame with the attributes from the api as the columns df = pd.DataFrame(my_objects)

That should get you all of the data (unfiltered) into a DataFrame

Two issues: 1. It looks like you modified my data set and removed '{"data": ...}'. How can I do this flattening by code? 2. My objective is not searching. I'd like to extract the data exhaustively and write to csv. How would you do that?
– Y. Eman
2 days ago

I'm not sure I understand "extract the data exhaustively", do you just want all of the data into a csv without filtering? I did remove 'data', since you want that key to point to multiple values, per "but the next record starts with "data" and not "data1" ." Unless I misunderstood, and you want different keys
– C.Nivs
2 days ago

Sorry, I am not clear. Yes, I want all the data without filtering to be written into a csv. Each record has an id. I would like a csv with one row per record and columns loaded with the values. The keys will be my column headers. The source of the data is from here: github.com/dowjones/dj-dna-streams-python
– Y. Eman
2 days ago

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Search This Blog

Mgiyuk