Home

Key Functionalities of ERPJewels

What is new in ERPJEWELS

Latest Trends

ETL/Trends in Data mining/ Trends in Data mining/ Data tranformation and multi-dimensional repoting capabilites

Download

Docs, Executable, AVI and others in Download Section

Hot Links

Find what is Hot in ERPJewels

Related Links

Code Generator for data transformation


BI tools are basically ETL tools besides pre-built templates and applications if any. In ETL also the data Transformation stage is very crucial. Baan reports uses disk based data transformation which is inherently slow and disk/CPU intensive. In-memory data transformation is a major criteria in evaluation criteria of BI Vendors . Here are the details of first-ever in-memory data transformation dll on Baan.

Assumptions

1.It is presumed that the read of this document has good knowledge of Baan reporting. If one wants to know more about Baan reporting, one can get more info in the link BaanreportFlow 

2.The reader has good knowledge of Baan 3 GL array handling functions including qss.sort and qss.search.

3.Finally the reader should be aware of performance improvement dll created by Jewelex India Pvt Ltd  the details of which can be gathered from performance dll available for data transformation on Baanboard ,capabilities of functions contained therein and memory limitations .

Background

Slow Disk based data transformation

Natively without use of third party tools, data transformation in Baan happens mainly via 2 ways viz

1.      Through a Baan report using after.field section without a detail.

2.      Through a program script by updating history / balances / cumulative data in a Baan table.

Both these methods, use the disk based data transformation meaning it writes each detail / line transaction to disk (either through rprt_send()  or  through database driver using db commands) . This disk based data transformation is inherently very slow. It does not use the power of bshell ability to manipulate data in memory and then generate smaller summary results which is written last to the disk (of course in much smaller volume).

Solution to Slow disk based transformation

Use Baan’s agile memory operations

Baan has certain powerful functions like QSS.SORT and QSS.SEARCH which are extremely powerful in manipulating array variables in memory using quick binary search and sort algorithms in memory. These QSS functions combined with Baan’s agile 4GL manipulating capability, offers very good memory based data transformation capability. Baan can handle a single array variable upto size of 5 MB very easily. This is sufficient in most situations. One can check himself/herself if his /her situation hits the array limitations with the help of memory calculator sheet.

Hide QSS complexities in a DLL

QSS functions and coding related thereto is quite complex and at times incomprehensible There is strong need for a simplified common functions which hide the QSS complexities and is widely usable.

In these directions, we have already taken efforts and created a DLL which is published on www.baanboard.com and is available freely. Here is the link from where user can get avail of the dll code.

http://www.baanboard.com/baanboard/showthread.php?t=28569

Now what is the problem

With the use of code contained in aforesaid DLL, the data transformation in program definitely becomes faster. However there are still some improvement area.

  1. Though the complexity surrounding qss function has drastically reduced with the aforesaid dll, still there is some complexity which can be simplified more.
  2. Over a period of time user/customer may have developed many summary reports and transformation logics which may take huge effort to migrate to memory based transformation.

Proposed solution

In these scenario, if any intelligent solution (which understands baan data dictionary, table and reports) can generate a relevant Baan 3 gl code which can be easily integrated into   existing program structure with mere copy / paste and recompile, it is a good help to the developer and the user. Such solution should also allow for enough flexibility for real life kind of scenario.

Situations where the help of code generator may be required.

1. Many summaries report linked to a single session

When there are many summary reports (without detail lines) attached to a session, there is a need to generate common code for all reports (rather than writing for each report separately).

2. Complex Summary (with / without detail lines).

In certain situations multiple summaries are embedded in a single report. This kind of parallel summaries can be done by increasing the number of rprt_send for each detail line with some of sort fields set specific values only, copying the same after.field layout with new report layout sequence and assigning print conditions wrt its highest sort fields.

Such kind of report are special and code generator must take care of minute nuances contained therein. Such report can even happen with detail lines.  If there is print condition related to a higher sort file in an after.field layout with many aggregation values (despite it containing detail layout), it is case fit for performance improvement. These kind of summaries in fact duplicates / triplicates / quadruplicates the number of  rprt_send() depending on the number of  parallel summaries required . This results in a multiplied disk io  (writing , sorting and reading ) which can easily be curtailed with the use of code generator .

3. Accumulated Values in a Table

Many a times the programs are written to update summary values in a table for later reference. Classic examples in standard Baan IV ERP for this is tfgld20[1-6] , tdilc101 , tdinv001 , tdinv7[56]0 tables . Similarly there can be many in customizations also. As writing /updating each detail line to table is also disk sensitive, performance improvement can be done there too. Disk IO with respect table may be more costly than flat file io involved in rprt_sends because of db driver overheads.

4. USE OF QSS function

Whenever user wants to use qss functions merely for sort and search operation, these can be used .

Kinds of Baan Report Summaries

1.Simple summaries (only1 rprt_send corresponding to each detail line, and no detail layout in Report). This will be denoted by ~~ in the code generator.

2.Parallel summaries (more than 1 rprt_send per detail lines, use of print conditions to print multipleParallel summaries and may or may not have detail layout in the report). This will be denoted by || in the code generator.

What code generator will do in addition to generating performance improvement code

1. Code remarks

While generating the code, code generator will also generate comment lines helping user / developer about the code.

2. Error Handling

While generating the code, the program will also take care off error handling in memory operations when accidentally exceeding the memory limits.

3. Flexibility

The performance improvement also requires certain new variables and accumulators. The code generator has certain defaults for each variable. User if he/she wishes can change these variable codes, he/she can do so.

4. Top Down Grouping

Many a times, besides grouping user also need to know only first few top / down statistics. Normally this involves further sorting after summary grouping is done. This again is a disk based operation for known methods. The performance dll has very good memory based function to do this.  The code generator will use the same to increase the speed in case top down grouping is involved.

5. Identify the reports / tables for performance improvement

The code generator can give complete listing of the report / table which can be considered for performance improvement for this code generator.

6.Meaningful variable coding with comments

It’is done by the code generator itself.

Usage notes.

1. Different functions on same report variable

2. non-exclusive conditions on same report variable

3. Different variables for different set of summaries in same program running in same run.

4. Program structure for the baan reporting and dll must be known to the developed. It works on the same ETL principles.  

5. The extent of performance improvement depends primarily on 2 factors (other things remaining the same).

      a.Ratio of number of transformed rows to number of extracted row. The lower the ratio, the higher the performance gains. Number of transformed rows are the summary lines and are by nature much lesser than extracted rows. Number of  extracted rows are the actual number of detail lines .In case of Baan reporting , the number of extracted rows are equal to number of times rprt_sends are executed in the program script and number of transformed rows are actually the number of times , the after field section of the sort field of highest order is executed.

     b.The length of sort / key fields and number of accumulators.  – The higher the length of keyfields and number of accumulators, the higher the savings. As the transformation operations move from disk based io to memory IO, the performance gains will be the function of number of accumulators and length of key/sort fields.

6. While evaluating the performance gains one must note that there are many other factors too affecting the performance like number of users, memory utilization, CPU utilization. If one is to assess the performance gains objectively one must consider all these factors too.

7. As the ratio of transformed rows to extracted rows is higher in case of data involving transactional data like sales / purchase/ production order,  sales/purchase invoice , item data etc , the gains from this code generator and performance dll may not be higher . In contrast this ratio is actually much lower in case of KPI data like ledger , customer , region , country  , area , supplier , customer/supplier groups , company , item group , product type , selection code , statistical group etc . Hence the summaries involving these KPIs will see larger gains.  As the report covers more period, the ratio of transformed rows to extracted rows reduces drastically and performance gains increases substantially.

Benefits of code generator

1.Time saving

2.Usage of performance dll

3.Increased user satisfaction due to speedy output generation

4.Maximizing investment out of Baan

5.Increase in hardware life.

Structure of the Code generated by ERPJewels Code generator

1. Comments

2. declaration

3. initiatilization

4. data extraction and transformation

5.Loading the transformed results ie termination

Other Related Links

1. Parallel Summaries

2. Procedure to use memory  based algorithms

3. Example of generated code

If you wanted you can download Code Generator



FastTrack to ERPJEWELS :

Latest Trend | Hot Links | Jewels of ERPJewels | Jewels of Baan | Jewels of Excel