Reformat data using awk


Reformat data using awk



I have a dataset that contains rows of UUIDs followed by locations and transaction IDs. The UUIDs are separated by a semi-colon (';') and the transactions are separated by tabs, like the following:


01234;LOC_1=ABC LOC_1=BCD LOC_2=CDE
56789;LOC_2=DEF LOC_3=EFG



I know all of the location codes in advance. What I want to do is transform this data into a format I can load into SQL/Postgres for analysis, like this:


01234;LOC_1=ABC
01234;LOC_1=BCD
01234;LOC_2=CDE
56789;LOC_2=DEF
56789;LOC_3=EFG



I'm pretty sure I can do this easily using awk (or similar) by looking up location IDs from a file (ex. LOC_1) and matching any instance of the location ID and printing that out next to the UUID. I haven't been able to get it right yet, and any help is much appreciated!



My locations file is named location and my dataset is data. Note that I can edit the original file or write the results to a new file, either is fine.


location


data





Please avoid "Give me the codez" questions that have been asked and answered so many times you have to make an effort to avoid finding an answer. Also see How much research effort is expected of Stack Overflow users?
– jww
Jun 29 at 18:57




5 Answers
5



You comment on knowing the locations and the mapping file makes me suspicious what your example seems to have done isn't exactly what is being asked - but it seems like you're wanting to reformat each set of tab delimited LOC= values into a row with their UUID in front.


LOC=



If so, this will do the trick:


awk ' BEGIN {OFS=FS=";"} {split($2,locs,"t"); for (n in locs) { print $1,locs[n]}}'



Given:


$ cat -A data.txt
01234;LOC_1=ABC^ILOC_1=BCD^ILOC_2=CDE$
56789;LOC_2=DEF^ILOC_3=EFG$



Then:


$ awk ' BEGIN {OFS=FS=";"} {split($2,locs,"t"); for (n in locs) { print $1,locs[n]}}' data.txt
01234;LOC_1=ABC
01234;LOC_1=BCD
01234;LOC_2=CDE
56789;LOC_2=DEF
56789;LOC_3=EFG



The BEGIN {OFS=FS=";"} block sets the input and output delimiter to ;.


BEGIN {OFS=FS=";"}



For each row, we then split the second field into an array named locs, splitting on tab, via - split($2,locs,"t")


split($2,locs,"t")



And then loop through locs printing the UUID and each loc value - for (n in locs) { print $1,locs[n]}


for (n in locs) { print $1,locs[n]}





Worked like a charm, thanks!
– user3456269
Jun 29 at 18:24





Hopefully you dont care about the order of the output since this will scramble it in a random order. You MAY have got lucky and got the order you expected for your posted sample input but that'd just be luck.
– Ed Morton
Jun 29 at 18:37





@EdMorton is correct as usual =). Capturing splits output (e.g. n=split($2,locs,"t")) and modifying the loop to iterate over that - for (i=1;i<=n;i++) { print $1,locs[i]}},would give you ordered results if desired.
– zzevannn
Jun 29 at 18:42


n=split($2,locs,"t")


for (i=1;i<=n;i++) { print $1,locs[i]}}





Not "someone" - @jww. I'm upvoting them all as usual now to compensate. Let me know which other questions he's crapped all over lately and I'll go upvote them too.
– Ed Morton
2 days ago






Done, all referenced Q&A blindly upvoted to compensate for @jww's blind downvoting.
– Ed Morton
yesterday



awk without using split: use semicolon or tab as the field separator


split


awk -F'[;t]' -v OFS=';' '{for (i=2; i<=NF; i++) print $1,$i}' file





Yup, it's that simple to do the job robustly.
– Ed Morton
Jun 29 at 18:38



I don't think you need to match against a known list of locations; you should be able to just print each line as you go:


$ awk '{print $1; split($1,a,";"); for (i=2; i<=NF; ++i) print a[1] ";" $i}' file
01234;LOC_1=ABC
01234;LOC_1=BCD
01234;LOC_2=CDE
56789;LOC_2=DEF
56789;LOC_3=EFG



How about without loop or without split one as follows.(considering that Input_file is same as shown samples only)


awk 'BEGIN{FS=OFS=";"}{gsub(/[[:space:]]+/,"n"$1 OFS)} 1' Input_file



This might work for you (GNU sed):


sed -r 's/((.*;)S+)s+(S+)/1n23/;P;D' file



Repeatedly replace the white space between locations with a newline, followed by the UUID and a ;, printing/deleting each line as it appears.


;






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Opening a url is failing in Swift

Export result set on Dbeaver to CSV