2014年7月22日星期二

Parallelism of esProc enhances Oracle Data Import Speed

Recently, we skillfully handled an industry project to import a great amount of data from file to Oracle in comparatively short time.

At the beginning, we tried to import the data with the sqlldr of Oracle, only to find it is surprisingly time-consuming to load a great amount of data: Too slow if 2.5 hours have to be spent to import a table containing 80 million records.

Later, by adopting the parallel import of sqlldr, the time is shortened to 0.8 hour. Here are the full details:

Train of thoughts 
Split the data file to import into 10 shares. Then, with the multi-task parallelism, execute the sqlldr command for the corresponding shares. Needless to say, prepare the control files of the same amount. Then, multiple clients will start to import data to database all at the same time. Please note these two things: 1. The way to generate multiple sqlldr commands and corresponding number of control files - a bit tried if writing them one by one; 2, The way to perform parallelly - ever more tired if performing one by one.

In this case, we use the tool, esProc by name, to generate the commands and control files automatically and then run parallelly.

Implementation steps 
Main program
Responsible for task control, task distribution, and calling sub-program.



Subprogram
Generate the specific control file and sqlldr command, and execute the import command to complete the data loading .


Note: In this case, the parallelism feature of esProc is used to execute multiple sqlldr commands; The function system is used to call the system commands.

Actual result 
Because of the programmable-controlled parallel tasks, the number of parallel tasks can be set as necessary to tap into the machine’s full performance potential.

The below figure illustrates the sqlldr import speed for different degree of parallelism - linear increasing on the whole - the more parallel tasks, the faster the import would be.

没有评论:

发表评论