Split File based on multiple column values

classic Classic list List threaded Threaded
2 messages Options
Tom
Reply | Threaded
Open this post in threaded view
|

Split File based on multiple column values

Tom
I need to take this file & split it to have a separate file for each unique combination of col5 and col6. The resulting files will have all 32 columns.
Also the naming convention needs to be pulled from the file as well:
"$5"_"$6"_P"sysdate"_IU"$4"_60".zip

col1, col2, col3, col4, col5, col6 ..... col32
1234, 6789, 1, 01/31/2017 00:00:00, 1000, 1234 ...col32
1235, 1233, 1, 01/31/2017 00:00:00, 1000, 1234 ...col32
1236, 4423, 1, 01/31/2017 00:00:00, 1000, 5678 ...col32
1237, 3323, 1, 01/31/2017 00:00:00, 1000, 1234 ...col32
1238, 0808, 1, 01/31/2017 00:00:00, 1000, 1234 ...col32
1239, 2222, 1, 01/31/2017 00:00:00, 2000, 1234 ...col32
1231, 4535, 1, 01/31/2017 00:00:00, 2000, 1234 ...col32
1232, 8080, 1, 01/31/2017 00:00:00, 2000, 5678 ...col32
1233, 7878, 1, 01/31/2017 00:00:00, 2000, 5678 ...col32


Results should look like:

1000_1234_P20170203_IU20170131_60.ZIP
1234, 6789, 1, 01/31/2017 00:00:00, 1000, 1234 ...col32
1235, 1233, 1, 01/31/2017 00:00:00, 1000, 1234 ...col32
1237, 3323, 1, 01/31/2017 00:00:00, 1000, 1234 ...col32
1238, 0808, 1, 01/31/2017 00:00:00, 1000, 1234 ...col32

1000_5678_P20170203_IU20170131_60.ZIP
1236, 4423, 1, 01/31/2017 00:00:00, 1000, 5678 ...col32

2000_1234_P20170203_IU20170131_60.ZIP
1239, 2222, 1, 01/31/2017 00:00:00, 2000, 1234 ...col32
1231, 4535, 1, 01/31/2017 00:00:00, 2000, 1234 ...col32

2000_5678_P20170203_IU20170131_60.ZIP
1232, 8080, 1, 01/31/2017 00:00:00, 2000, 5678 ...col32
1233, 7878, 1, 01/31/2017 00:00:00, 2000, 5678 ...col32
Tom
Reply | Threaded
Open this post in threaded view
|

Re: Split File based on multiple column values

Tom
Resolved using:

awk -F', *' -v sysdate="$(date +'%Y%m%d')" '
NR==1 { hdr = $0; next }
(cnt[$5,$6]++ % 150000) == 0 { sfx[$5,$6]++ }
{
    split($4,d,/[\/ ]/)
    out = $5 "_" $6 "_P" sysdate "_IU" d[3] d[1] d[2] "_60_" sfx[$5,$6] ".zip"
    if (!seen[out]++) {
        print hdr > out
    }
    print > out
}
' file