Reformat data using awk

I have a dataset that contains rows of UUIDs followed by locations and transaction IDs. The UUIDs are separated by a semi-colon (';') and the transactions are separated by tabs, like the following:

01234;LOC_1=ABC LOC_1=BCD LOC_2=CDE 56789;LOC_2=DEF LOC_3=EFG

I know all of the location codes in advance. What I want to do is transform this data into a format I can load into SQL/Postgres for analysis, like this:

01234;LOC_1=ABC 01234;LOC_1=BCD 01234;LOC_2=CDE 56789;LOC_2=DEF 56789;LOC_3=EFG

I'm pretty sure I can do this easily using awk (or similar) by looking up location IDs from a file (ex. LOC_1) and matching any instance of the location ID and printing that out next to the UUID. I haven't been able to get it right yet, and any help is much appreciated!

My locations file is named location and my dataset is data. Note that I can edit the original file or write the results to a new file, either is fine.

location

data

Please avoid "Give me the codez" questions that have been asked and answered so many times you have to make an effort to avoid finding an answer. Also see How much research effort is expected of Stack Overflow users?
– jww
Jun 29 at 18:57

5 Answers
5

You comment on knowing the locations and the mapping file makes me suspicious what your example seems to have done isn't exactly what is being asked - but it seems like you're wanting to reformat each set of tab delimited LOC= values into a row with their UUID in front.

LOC=

If so, this will do the trick:

awk ' BEGIN {OFS=FS=";"} {split($2,locs,"t"); for (n in locs) { print $1,locs[n]}}'

Given:

$ cat -A data.txt 01234;LOC_1=ABC^ILOC_1=BCD^ILOC_2=CDE$ 56789;LOC_2=DEF^ILOC_3=EFG$

Then:

$ awk ' BEGIN {OFS=FS=";"} {split($2,locs,"t"); for (n in locs) { print $1,locs[n]}}' data.txt 01234;LOC_1=ABC 01234;LOC_1=BCD 01234;LOC_2=CDE 56789;LOC_2=DEF 56789;LOC_3=EFG

The BEGIN {OFS=FS=";"} block sets the input and output delimiter to ;.

BEGIN {OFS=FS=";"}

For each row, we then split the second field into an array named locs, splitting on tab, via - split($2,locs,"t")

split($2,locs,"t")

And then loop through locs printing the UUID and each loc value - for (n in locs) { print $1,locs[n]}

for (n in locs) { print $1,locs[n]}

Worked like a charm, thanks!
– user3456269
Jun 29 at 18:24

Hopefully you dont care about the order of the output since this will scramble it in a random order. You MAY have got lucky and got the order you expected for your posted sample input but that'd just be luck.
– Ed Morton
Jun 29 at 18:37

@EdMorton is correct as usual =). Capturing splits output (e.g. n=split($2,locs,"t")) and modifying the loop to iterate over that - for (i=1;i<=n;i++) { print $1,locs[i]}},would give you ordered results if desired.
– zzevannn
Jun 29 at 18:42

n=split($2,locs,"t")

for (i=1;i<=n;i++) { print $1,locs[i]}}

Not "someone" - @jww. I'm upvoting them all as usual now to compensate. Let me know which other questions he's crapped all over lately and I'll go upvote them too.
– Ed Morton
2 days ago

Done, all referenced Q&A blindly upvoted to compensate for @jww's blind downvoting.
– Ed Morton
yesterday

awk without using split: use semicolon or tab as the field separator

split

awk -F'[;t]' -v OFS=';' '{for (i=2; i<=NF; i++) print $1,$i}' file

Yup, it's that simple to do the job robustly.
– Ed Morton
Jun 29 at 18:38

I don't think you need to match against a known list of locations; you should be able to just print each line as you go:

$ awk '{print $1; split($1,a,";"); for (i=2; i<=NF; ++i) print a[1] ";" $i}' file 01234;LOC_1=ABC 01234;LOC_1=BCD 01234;LOC_2=CDE 56789;LOC_2=DEF 56789;LOC_3=EFG

How about without loop or without split one as follows.(considering that Input_file is same as shown samples only)

awk 'BEGIN{FS=OFS=";"}{gsub(/[[:space:]]+/,"n"$1 OFS)} 1' Input_file

This might work for you (GNU sed):

sed -r 's/((.*;)S+)s+(S+)/1n23/;P;D' file

Repeatedly replace the white space between locations with a newline, followed by the UUID and a ;, printing/deleting each line as it appears.

;

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Search This Blog

Mgiyuk