How do I convert key value paired list into table with columns using AWK?


How do I convert key value paired list into table with columns using AWK?



I need to convert a dataset from a key value paired list (informix dbaccess output) into a columned csv. I'm fairly certain this can be done easily with awk or sed.



UPDATE The solution needs to be a single line response. I am using NSH (which is based on ZSH). So some of the typical "bashy" commands will not work.



Here is my data sample set:


part_no 100000001
date_part 2010-10-13 12:12:12
history_code ABCD
user_id rsmith
other_information note: Monday, December 10
pool_no 101011777

part_no 100000002
date_part 2010-10-21 12:12:12
history_code GHIJ
user_id jsmith
other_information
pool_no 101011888

part_no 100000002
date_part 2010-10-27 12:12:12
history_code LMNO
user_id fevers
other_information [Mail]
pool_no 101011999

part_no 100000003
date_part 2010-11-13 12:12:12
history_code QXRT
user_id sjohnson
other_information note: Tuesday, August 31
pool_no 101011111



I need it to look like this:


part_no,date_part,history_code,user_id,other_information,pool_no
100000001,10/13/2010 12:12:12,ABCD,rsmith,note: Monday, December 10,101011777
100000002,10/21/2010 12:12:12,GHIJ,jsmith,,101011888
100000002,10/27/2010 12:12:12,LMNO,fevers,[Mail],101011999
100000003,11/13/2010 12:12:12,QXRT,sjohnson,note: Tuesday, August 31,101011111





Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
– Cyrus
Jun 29 at 15:28





Does the date format have to change from 2010-10-13 to 10/13/2010? Also, there are commas within fields, but the suggested output doesn't use the convention to double quote fields and is thus ambiguous.
– Benjamin W.
Jun 29 at 15:34


2010-10-13


10/13/2010





Data format does not have to change, and the output can use the double quote fields (commas will be present in the data)
– Matt Ragland
Jun 29 at 15:42





Update your question to show the expected output plus what you've tried so far and do it quick before your question gets closed as unclear and/or you get any more crazy answers with 20 sed commands, cats, cuts, pipes and the batman symbol.
– Ed Morton
Jun 29 at 15:54






You create 'single line' solutions by creating a shell script that does the job, and running the shell script from your 'single line' system. Use an absolute path name if necessary. Otherwise, you're fighting with one hand tied behind your back (actually, both hands tied behind your back).
– Jonathan Leffler
Jun 29 at 20:32




5 Answers
5



Your question isn't clear but this MAY be what you're looking for:


$ cat tst.awk
BEGIN { RS=""; FS="n"; OFS=","; ofmt=""%s"%s" }
{
for (i=1; i<=NF; i++) {
tag = val = $i
sub(/[[:space:]].*/,"",tag)
sub(/[^[:space:]]+[[:space:]]+/,"",val)
tags[i] = tag
vals[i] = val
}
}
NR==1 {
for (i=1; i<=NF; i++) {
printf ofmt, tags[i], (i<NF ? OFS : ORS)
}
}
{
for (i=1; i<=NF; i++) {
printf ofmt, vals[i], (i<NF ? OFS : ORS)
}
}

$ awk -f tst.awk file
"part_no","date_part","history_code","user_id","other_information","pool_no"
"100000001","2010-10-13 12:12:12","ABCD","rsmith","note: Monday, December 10","101011777"
"100000002","2010-10-21 12:12:12","GHIJ","jsmith","other_information","101011888"
"100000002","2010-10-27 12:12:12","LMNO","fevers","[Mail]","101011999"
"100000003","2010-11-13 12:12:12","QXRT","sjohnson","note: Tuesday, August 31","101011111"





Can your awk statement be run from a pipe (on a single line)? I am using this as part of an "extended object" in BMC Server Automation.
– Matt Ragland
Jun 29 at 16:10





Of course. Just replace every newline with a ; and call it as whatever | awk 'script'. I have no idea what an "extended object" in BMC Server Automation is of course.
– Ed Morton
Jun 29 at 18:21



;


whatever | awk 'script'


an "extended object" in BMC Server Automation



I'm tackling this as an Informix question rather than an Awk question.



Using the standard Informix SQL commands, you could create an external table in CSV format, too — but you have to know that there is an undocumented format "DB2" that you can use:


"DB2"


DROP TABLE IF EXISTS data_table;

CREATE TABLE data_table
(
part_no INTEGER,
date_part DATETIME YEAR TO SECOND,
history_code VARCHAR(4),
user_id VARCHAR(32),
other_information VARCHAR(64),
pool_no INTEGER
);

INSERT INTO data_table VALUES(100000001, "2010-10-13 12:12:12", "ABCD", "rsmith", "note: Monday, December 10", 101011777);
INSERT INTO data_table VALUES(100000002, "2010-10-21 12:12:12", "GHIJ", "jsmith", NULL, 101011888);
INSERT INTO data_table VALUES(100000002, "2010-10-27 12:12:12", "LMNO", "fevers", "[Mail]", 101011999);
INSERT INTO data_table VALUES(100000003, "2010-11-13 12:12:12", "QXRT", "sjohnson", "note: Tuesday, August 31", 101011111);

DROP TABLE IF EXISTS csv_data;
CREATE EXTERNAL TABLE csv_data
(
part_no INTEGER,
date_part DATETIME YEAR TO SECOND,
history_code VARCHAR(4),
user_id VARCHAR(32),
other_information VARCHAR(64),
pool_no INTEGER
)
USING (FORMAT "DB2", DELIMITER ",", DATAFILES("DISK:/tmp/data/csv_data.csv"));

INSERT INTO csv_data
SELECT part_no, date_part, history_code, user_id, other_information, pool_no
FROM data_table;



The content of /tmp/data/csv_data.csv then looks like:


/tmp/data/csv_data.csv


100000001,2010-10-13 12:12:12,"ABCD","rsmith","note: Monday, December 10",101011777
100000002,2010-10-21 12:12:12,"GHIJ","jsmith",,101011888
100000002,2010-10-27 12:12:12,"LMNO","fevers","[Mail]",101011999
100000003,2010-11-13 12:12:12,"QXRT","sjohnson","note: Tuesday, August 31",101011111



The default output from DB-Access is not readily amenable to parsing in practice.
It may be doable in some limited cases, such as the one you show, but you'd do better using the UNLOAD format instead of the command-line output, and then converting the UNLOAD data format to CSV.



I have a Perl script that does that. It uses the Perl Text::CSV module to handle CSV formatting. It doesn't pretend to handle the first line with column names; those aren't present in the UNLOAD format file.


#!/usr/bin/env perl
#
# @(#)$Id: unl2csv.pl,v 1.3 2018/06/29 20:36:58 jleffler Exp $
#
# Convert Informix UNLOAD format to CSV

use strict;
use warnings;
use Text::CSV;
use IO::Wrap;

my $csv = new Text::CSV({ binary => 1 }) or die "Failed to create CSV handle ($!)";
my $dlm = defined $ENV{DBDELIMITER} ? $ENV{DBDELIMITER} : "|";
my $out = wraphandle(*STDOUT);
my $rgx = qr/((?:[^$dlm]|(?:.))*)$dlm/sm;

# $csv->eol("rn");

while (my $line = <>)
{
print "1: $line";
MultiLine:
while ($line eq "n" || $line =~ m/[^](?:\)*$/)
{
my $extra = <>;
last MultiLine unless defined $extra;
$line .= $extra;
}
my @fields = split_unload($line);
$csv->print($out, @fields);
}

sub split_unload
{
my($line) = @_;
my @fields;
print "$line";

while ($line =~ $rgx)
{
printf "%d: %sn", scalar(@fields), $1;
push @fields, $1;
}
return @fields;
}

__END__

=head1 NAME

unl2csv - Convert Informix UNLOAD to CSV format

=head1 SYNOPSIS

unl2csv [file ...]

=head1 DESCRIPTION

The unl2csv program converts a file from Informix UNLOAD file format to
the corresponding CSV (comma separated values) format.

The input delimiter is determined by the environment variable
DBDELIMITER, and defaults to the pipe symbol "|".
It is not assumed that each input line is terminated with a delimiter
(there are two variants of the UNLOAD format, one with and one without
the final delimiter).

=head1 EXAMPLES

Input:

10|12|excessive|cost |of, living|
20|40|bouncing tigger|grrrrrrrr|

Output:

10,12,"excessive","cost |of, living"
20,40,"bouncing tigger",grrrrrrrr

=head1 PRE-REQUISITES

Text::CSV_XS

=head1 AUTHOR

Jonathan Leffler <jonathan.leffler@hcl.com>

=cut



You would use a command such as this (via DB-Access):


UNLOAD TO "datatable.unl" SELECT * FROM DataTable;



and then run:


perl unl2csv datatable.unl > datatable.csv



If you have my SQLCMD program (available from the IIUG web site in the software repository — and wholly unrelated to Microsoft's johnny-come-lately with the same name), then you can unload direct to CSV format:


sqlcmd -d database -F csv -e 'unload to "data_table.csv" select * from data_table'



Try this:


cat $file | cut -d ' ' -f 2- | sed 's/^[ t]*//' | sed 's/$/,/'
| xargs | sed 's/ , /n/g' | sed 's/.$//' | sed 's/, /,/g'
| sed '1ipart_no,date_part,history_code,user_id,other_information,pool_no'





I should have mentioned that I am using NSH (network shell) which is based off of ZSH. Here is the output I received: sed: : No such file or directory nsh: command not found: xargs sed: : No such file or directory sed: 1: "1ipart_no,date_part,his ...": command i expects followed by text
– Matt Ragland
Jun 29 at 15:53






You did should, I tried it with Ubuntu and it worked. Sorry dude. $file is the name of the file you want to convert.
– Bernat Pedrol Vozmediano
Jun 29 at 15:54



Ubuntu


$file





I changed $file to the name of my file, and that was the output I received
– Matt Ragland
Jun 29 at 15:57



I know the OP said awk but bash was just sitting there.


#
# line to be printed
line=""

#
# first value on a line flag
first=""

#
# read the file
while read key val; do
#
# if key is empty then the input line is empty.
if [ "$key" = "" ] ; then
#
# skip leading blank lines in the file
if [ "$line" = "" ] ; then
continue
else
#
# print and reset the line
echo $line
line=""
first=""
fi
else
#
# place the first comma after the first value
if [ "$first" = "" ] ; then
line=""$val""
first="1"
else
line="$line,"$val""
fi
fi
done < file.txt

#
# print the last line, if there is one
if [ "$line" != "" ] ; then
echo $line
fi





Downvoting this would be cruel; but really, don't do this.
– tripleee
Jun 29 at 19:57





Agreed. See unix.stackexchange.com/q/169716/133219 for just a few of the reasons but there are more too.
– Ed Morton
Jun 30 at 3:48




Could you please try following and let me know if this helps you.


awk -v s1="," '/part_no/ && value{if(header){print header;flag=1;header=""};print value;value=""} NF{if(!flag){header=(header?header s1 "":"")$1};sub(/^[^[:space:]]+[[:space:]]+/,"");value=value?value s1 $0:$0} END{if(value){print value}}' Input_file



Output will be as follows.


part_no,date_part,history_code,user_id,other_information,pool_no
100000001,2010-10-13 12:12:12,ABCD,rsmith,note: Monday, December 10,101011777
100000002,2010-10-21 12:12:12,GHIJ,jsmith,,101011888
100000002,2010-10-27 12:12:12,LMNO,fevers,[Mail],101011999
100000003,2010-11-13 12:12:12,QXRT,sjohnson,note: Tuesday, August 31,101011111



Adding a non-one liner form of solution too now.


awk -v s1="," '
/part_no/ && value{
if(header){
print header;
flag=1;
header=""}
print value;
value=""
}
NF{
if(!flag){
header=(header?header s1 "":"")$1}
sub(/^[^[:space:]]+[[:space:]]+/,"")
value=value?value s1 $0:$0
}
END{
if(value){
print value}
}' Input_file





Can this become a single line statement?
– Matt Ragland
Jun 29 at 16:11





Ahh good catch, yes, mine had carriage returns, I converted it to unix format, and that did the trick, however I am getting extra commas where there shouldn't be any. See the other information column?
– Matt Ragland
Jun 29 at 16:25






Take a sec to think about your ternary expression and you'll understand why it can fail. Also, assigning to $1 will transform all other white space in the input to commas. Finally, it's obviously bad practice to hard-code the header line and thereby tightly couple your script to the specific values and the order they appear in the input data when the values are already present in the input in the order they need to be printed.
– Ed Morton
Jun 29 at 17:08






Your ternaries instead of var = var ? var s1 $X: s1 $X should be var = (var == "" ? "" : var s1) $X to avoid duplicate code, not cause syntax errors in some awks, and so they don't fail when the first value of var evaluates numerically to 0. wrt the $1 assignment - I'm really not sure what you're trying to do but I THINK it might be to just get rid of the header value from each line and that'd be just sub(/^[^[:space:]]+[[:space:]]+/,"") instead of $1=""; gsub(/^ +|^,/,"");
– Ed Morton
Jun 29 at 17:44



var = var ? var s1 $X: s1 $X


var = (var == "" ? "" : var s1) $X


sub(/^[^[:space:]]+[[:space:]]+/,"")


$1=""; gsub(/^ +|^,/,"");





@EdMorton, sure Ed have done that and as usual THANKS A TON for guiding here, really grateful to you, happy weekend.
– RavinderSingh13
Jun 29 at 22:53






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Export result set on Dbeaver to CSV

Opening a url is failing in Swift