Splitting XML file on basis of contents

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Splitting XML file on basis of contents

Ajju
 have one file around 20 MB and wanted to split it on content basis by awk or split utility.

I have done it by on basis of size but splitted files are of no use so wanted to split on content basis.

So here I need to splitt this file on content basis with addition of opening and closing tags in each splitted files.
for e.g
Original file having Opening tags...
<?xml version="1.0" encoding="UTF-8"?>
<ns0:ABCFile xmlns:ns0="urn:PQR:OTHERS:WXYZ:HELLOTEST">
<ABCFileHeader>
<RecordType>01</RecordType>
<Date>20140405</Date>
<TotalRecord>46048</TotalRecord> // 46048/4 = 11512 records in each file
</ABCFileHeader>
.
.
Actualrecord ....starts like
<ABRecordDetail>
<RecordType>02</RecordType>
<LineItem>0000000002</LineItem>
<CompanyCode>PQR</CompanyCode>
<ABDate>20130901</ABtDate>
<CurrencyKey>PVR</CurrencyKey>
<AmountInDC>0</AmountInDC>
<AmountInLC>0</AmountInLC>
<CostCenter>BBN</CostCenter>
<FType>DTH</FType>
<QNumber>VBR3581 </QNumber>
<SNumber>9kBQ</SNumber>
<VNumber>BBGRB</SNumber>
<Assignment>0945</Assignment>
</ABRecordDetail>

So the above actual 15 lines are the actual record and in original file it has 46048 such records so I wanted to split in a way that records 46048/4 = 11512 in each file in addition to opening and closing tags in each file

Opening tags.

<?xml version="1.0" encoding="UTF-8"?>
<ns0:ABCFile xmlns:ns0="urn:PQR:OTHERS:WXYZ:HELLOTEST">
<ABCFileHeader>
<RecordType>01</RecordType>
<Date>20140405</Date>
<TotalRecord>46048</TotalRecord> // 46048/4 = 11512 records in each file so in splited file tag would be like <TotalRecord>11512</TotalRecord>
</ABCFileHeader>

Closing tag:
</ns0:ABCFile>

Hope you understood, in a simple way file needs to be splitted on content basis [record basis] i.e 15 line just need to add fixed tags at top and bottom of each file.
Reply | Threaded
Open this post in threaded view
|

Re: Splitting XML file on basis of contents

Ajju
In a other way the same query would be like if file needs to be split on basis of lines as


I have more than half million lines of XML file , wanted to split in four files in a such a way that top 7 lines should be present in each file on top and bottom line of should be present in each file at bottom.

from the 8th line actual record starts and each record contains 15 lines means from 8th to 22nd line is the first record of the file.
so the total number of actual records are varying each time.

wanted to divide this actual record in four chunks and each chunks should move to new four files respectively below to the top 7 and above the bottom line.

say...
cat ABCD.xml |wc -l
690728
Actual Record lines = 690728 -[top 7] -[bottom1]= 690720
690720/15 = Actual Record = 46048

then
46048/4 = 11512 or [if it is not exactly divisible then the remainder record should move to the last ]


so first 11512 record move to ABCD_part1.xml
second 11512 will move to ABCD_part2.xml
third 11512 will move move to ABCD_part3.xml
Fourth/remaining records 11512 will move to to ABCD_part4.xml

Please help on this .
Reply | Threaded
Open this post in threaded view
|

Re: Splitting XML file on basis of contents

Ajju
Is this active forum?