2014年10月21日星期二

esProc Program: Operations of Table Sequences and Record Sequences(II)

2.Loop functions
Loop functions can compute each record of a table sequence/record sequence, express complex loop statements with simple function. For instance, select is used to make query, sort to sort, id to merge repeated records, pselect to fetch sequence numbers of eligible records and maxif to read the maximum value from eligible records. Here the most basic ones - select function and sort function- will be illustrated. For more information about loop functions, please refer to esProc Program: Loop Operation.

2.1 Query

In the following cellset, query out records whose Amount field is greater than or equal to 20,000 and whose Date is the month of March. As both record sequence and table sequence have the same expression for querying data, only the latter is selected for our illustration:

The result of B1 is as follows:
Whether the computing object is a table sequence or a record sequence, the computed result of select function will always be a record sequence, that is, the references of records instead of the physical records.

2.2 Sort
In the following cellset, sort records in ascending order according to SalesID field, and for the records having the same SalesID, sort them in descending order according to Date field. The record sequence and table sequence have the same expression for sorting. Here we’ll take the record sequence in B1 as an example:

The computed result is:

Whether the computing object is a table sequence or a record sequence, the computed result of sort function will always be a record sequence. In fact, most of the functions for table sequences and record sequences can be employed universally unless the records are modified. 

3  Aggregate function
3.1 Seek maximum value
Seek the maximum value of Amount field. The record sequence and table sequence in this example have the same expression, so we only take table sequence A1 as an example:

The computed result of A2 is as follows:
Similar functions include min(minimum value), sum(summation), avg(average value), count(count), etc.

3.2 Sum by grouping
In the following cellset, sum Amount in each group of data according to SalesID and the month, and count orders of each group. The record sequence and table sequence in this example have the same expression, so we only take the table sequence A1 as an example:

The computed result of A2 is as follows:

Note: groups function will create a new table sequence."~" in expression count(~) represents the current group. count(~) can also be written as count(ID). Besides, we don’t designate field names of computed results in writing code, so default field names like month(Date)will appear. A colon could be used in designating field names, such as =A1.groups(SalesID, month(Date):Month;sum(Amount),count(ID)).

2014年10月20日星期一

esProc Program: Operations of Table Sequences and Record Sequences (I)

The esProc table sequence is a structured two-dimensional table, having concepts of field, record, primary key and reference. These concepts originate from the data table of relational database. A table sequence is also an explicit set of genericity and orderliness, which can perform structured data computing more flexibly. A record sequence is the reference of table sequences. They are closely related and their usages are basically the same. The article will explain their basic computations from aspects of access, loop function, aggregate function and sets operations.

1.      Access
1.1   Creation

In the cellset below, read two-dimensional structured data from a file, create the table sequence object and store it in cell A1. Create the record sequence object by referring to A1 and store it in B1:

The following is the table sequence in A1 after computing. Only part of the data is displayed in the window. You can drag the scrollbar on the right to check the complete records:

The record sequence in B1 is as follows:
Note: The table sequence object can be created based on a database or a file, or be created by inserting records into an empty object. A record sequence originates from a table sequence but it doesn't store physical records. It only stores the references of some records in the table sequence.

1.2   Access field values
In the cellset below, get the field PName of the twentieth recording table sequence object A1 and store it in cell A2, and get the field PName of the second record in record sequence object B1 and store it in cell B2:

The results in A2 and B2 are as follows:
It can be seen that, as the second record in B1 is identical to the twentieth record in A1, so A2 and B2 have the same computed results. As can also be seen from the expressions of A2 and B2, both table sequence and record sequence have completely the same syntax for accessing fields.

A field name can be replaced by the field's sequence number and the result won't change. For instance:=A1(20).#3. Because this kind of replacement is employed universally in esProc, we won't go into details about it.

1.3   Access column data
In the cellset below, fetch column PName according to the column name from table sequence A1 and store it in A2, and again from table sequence A1, fetch column PName and column Amount according to the column names and store them in B2. The record sequence and table sequence have the same expression when accessing the column data, so only the latter is selected for our illustration:

The results in A3 and B3 are as follows:

Using the syntax of T.(x), you can only fetch one column of data, and the computed result is a sequence without structured column name. With T.new() function, however, you can fetch one or more columns of data and the computed result is a table sequence with structured column names.

Whether the computing object is a table sequence or a sequence, both T.new() function and A.new() function will create a new table sequence. This means the computed result of B1.new(PName, Amount)is also a table sequence.

1.4   Access row data
In the following, fetch the first two records from table sequence A1 according to row number and store them in A2, and fetch the first two records from table sequence B1 according to row number and store them in B2. Both the record sequence and table sequence in this example have the same expression for accessing row data:

The result of A2 is as follows:

The result of B2 is as follows:

2014年10月19日星期日

esProc Helps Process Heterogeneous Data Sources in Java - JSON

Java’s JSON open source package can only parse JSON data and hasn't the computational function. It is troublesome for programmers to develop a general program for performing computations, such as grouping, sorting, filtering and joining, by themselves. For example, during developing program for performing conditional filtering in JSON files using Java, the code has to be modified if the conditional expression is changed. If they want to make it as flexible as SQL in performing conditional filtering, they have to write code for analyzing and evaluating expressions dynamically. This requires quite a lot of programming work.

esProc supports dynamic expressions. It also can be embedded in Java to write the general program for computing JSON data. Let's give an example to see how it works. There are to-be-processed JSON strings that contain employee information, including fields such as EID, NAME, SURNAME, GENDER, STATE, BIRTHDAY, HIREDATE and DEPT, etc. Parse the strings and select female employees who were born on and after January 1st, 1981. The content of the strings is shown as follows:

[{EID:1,NAME:"Rebecca",SURNAME:"Moore",GENDER:"F",STATE:"California",BIRTHDAY:1974-11-20,HIREDATE:2005-03-11,DEPT:"R&D",SALARY:7000},
{EID:2,NAME:"Ashley",SURNAME:"Wilson",GENDER:"F",STATE:"New York",BIRTHDAY:1980-07-19,HIREDATE:2008-03-16,DEPT:"Finance",SALARY:11000},
{EID:3,NAME:"Rachel",SURNAME:"Johnson",GENDER:"F",STATE:"New Mexico",BIRTHDAY:1970-12-17,HIREDATE:2010-12-01,DEPT:"Sales",SALARY:9000},…]

Implementation approach: Call esProc program using Java and input the JSON strings which will then be parsed by esProc, perform the conditional filtering and return the result in JSON format to Java. Because esProc supports parsing and evaluating expression dynamically, it enables Java to filter JSON data as flexibly as SQL does.


For example, it is required to query female employees who were born on and after January 1, 1981. esProc can input two parameters: "jsonstr" and "where", as the conditions. This is shown as follows:

"where" is a string, its values isBIRTHDAY>=date(1981,1,1) && GENDER=="F".

The code written in esProc is as follows:

A1Parse the JSON data into a table sequence. esProc's IDE can display the imported data visually, as shown in the right part of the above figure.

A2: Perform the conditional filtering, using macro to realize parsing the expression dynamically. The “where” in this process is an input parameter. In executing, esProc will first compute the expression surrounded by ${…}, take the computed result as macro string value, replace ${…} with it and then interpret and execute the code. The final code to be executed in this example is =A1.select(BIRTHDAY>=date(1981,1,1) && GENDER=="F").

A3: Generate JSON strings using the filtered table sequence.

A4Return the eligible result set to the external program.
When the filtering condition is changed, you just need to modify “where”– the parameter. For example, it is required to query female employees who were born on and after January 1, 1981, or employees whose NAME+SURNAME is equal to "RebeccaMoore". The value of "where" can be written as BIRTHDAY>=date(1981,1,1) && GENDER=="F" || NAME+SURNAME=="RebeccaMoore". After the code is executed, the result set in A2 is as follows:

Since the esProc script is called in Java through the JDBC interface, the returned result is set- the object of ResultSet. Fetch the first field of string type in set, and this is the filtered JSON string. Detailed code is as follows (save the above program in esProc as test.dfx):
         // create a connection
         Class.forName("com.esproc.jdbc.InternalDriver");
         con= DriverManager.getConnection("jdbc:esproc:local://");
         // call the program in esProc (the stored procedure); test is the name of file dfx
         com.esproc.jdbc.InternalCStatementst;
         st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call json(?,?)");
         // set the parameters; as the JSON string is long, part of it is omitted.
//In practice, JSON strings may be generated by various ways; see below for the explanation.
          String jsonstr=
"[{EID:1,NAME:\"Rebecca\",SURNAME:\"Moore\",GENDER:\"F\",STATE:\"California\...}]";
         st.setObject(1,jsonstr);
         st.setObject(2,"BIRTHDAY>=date(1981,1,1) && GENDER==\"F\"");
// execute the esProcstored procedure
         ResultSet set=st.executeQuery();
         // get the filtered JSON string
         String jsonstrResult;
         if(set.next()) jsonstrResult = set.getString(1);

JSON format is common used by interactive data in internet application. In practice, JSON strings may be retrieved from local files or remote HTTP server. esProc can directly read JSON strings from the files or the HTTP server. Take the latter as an example. It is assumed that there is a testS ervlet which returns JSON strings of employee information. The code for performing the operation is as follows:
A1Define an httpfile object, the URL is
         http://localhost:6080/myweb/servlet/testServlet?table=employee&type=json.

A2Read the result returned by the httpfile object.

A3Parse the JSON string and generate a table sequence. 

A4Filter data according to the conditions. 

A5Convert the filtered table sequence to JSON strings. 

A6Return the result in A4 to the Java code thatcalls this piece of esProc program.

2014年10月15日星期三

esProc Helps Process Heterogeneous Data Sources in Java - HDFS

It is not difficult for Java to access HDFS through API provided by Hadoop. But to realize computations, like grouping, filtering and sorting, on files in HDFS in Java is troublesome. esProc is a good helper in Java’s dealing with these computations. It can execute the access to HDFS too. With the help of esProc, Java will increase its ability in performing structured and semi-structured data computing, like the above-mentioned computations. Let’s look at how it works through an example.

The text file employee.gz in HDFS contains the employee data. You are required to import the data and select the female employees who were born on and after January 1st, 1981. The text file has been zipped with gzip in HDFS and cannot be loaded to the memory entirely.

The data in employee.gz is as follows:
EID   NAME       SURNAME        GENDER  STATE        BIRTHDAY        HIREDATE         DEPT         SALARY
1       Rebecca   Moore      F       California 1974-11-20       2005-03-11       R&D          7000
2       Ashley      Wilson      F       New York 1980-07-19       2008-03-16       Finance    11000
3       Rachel      Johnson   F       New Mexico     1970-12-17       2010-12-01       Sales         9000
4       Emily         Smith        F       Texas        1985-03-07       2006-08-15       HR    7000
5       Ashley      Smith        F       Texas        1975-05-13       2004-07-30       R&D          16000
6       Matthew Johnson   M     California 1984-07-07       2005-07-07       Sales         11000
7       Alexis        Smith        F       Illinois       1972-08-16       2002-08-16       Sales         9000
8       Megan     Wilson      F       California 1979-04-19       1984-04-19       Marketing        11000
9       Victoria    Davis        F       Texas        1983-12-07       2009-12-07       HR    3000
10     Ryan         Johnson   M     Pennsylvania    1976-03-12       2006-03-12       R&D          13000
11     Jacob        Moore      M     Texas        1974-12-16       2004-12-16       Sales         12000
12     Jessica     Davis        F       New York 1980-09-11       2008-09-11       Sales         7000
13     Daniel       Davis        M     Florida      1982-05-14       2010-05-14       Finance    10000
Implementation approach: Call the esProc script with Java program, import and compute the data, then return the result to Java program in the form of ResultSet.

First, you should develop and debug program in esProc's Integration Development Environment (IDE). The preparatory work is to copy the core packages and the configuration packages of Hadoop to "esProc's installation directory\esProc\lib",such ascommons-configuration-1.6.jarcommons-lang-2.4.jarhadoop-core-1.0.4.jarHadoop1.0.4.


Because esProc supports analyzing and evaluating expressions dynamically, it will enable Java to filter the data in HDFS file as flexibly as SQL does. For example, to query the data of female employees who were born on and after January 1st, 1981, esProc will use an input parameter "where" as the condition, as shown in the figure below:

"where" is a string, its value is BIRTHDAY>=date(1981,1,1) && GENDER=="F".
The code in esProc is as follows:

A1Define a HDFS file object cursor with the first row being the title and tab being the default field separator. The zipping mode is determined by the filenameextension. Here gzip is used. esProc also supports other zipping modes. UTF-8 is a charset, which is a JVM charset by default.

A2Filter the cursor according to the condition. Here macro is used to realize analyzing the expression dynamically, in which“where” is the input parameter. esProc will first compute the expression surrounded by ${…}, take the computed result as the macro string value and replace ${…} with it, then interpret and execute the code. The final code executed in this example is =A1.select(BIRTHDAY>=date(1981,1,1) && GENDER=="F").

A3Return the cursor. If the filtering condition is changed, you only need to change the parameter "where" without modifying the code. For example, you are required to query the data of the female employees who were born on January 1st, 1981, or of the employees in which NAME+SURNAMEis "RebeccaMoore". The code for the value of “where” can be written as BIRTHDAY>=date(1981,1,1) && GENDER=="F" || NAME+SURNAME=="RebeccaMoore".

The code for calling this block of code in Java with esProc JDBC is as follows (save the esProc program as test.dfxand put the Hadoop jars needed by HDFS in Java's classpath):
          // create a connection usingesProcjdbc
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
// call the program in esProc (the stored procedure); test is the file name of dfx
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?)");
//set the parameters
st.setObject(1," BIRTHDAY>=date(1981,1,1) && GENDER==\"F\" ||NAME+SURNAME==\"RebeccaMoore\"");// the parameters are the dynamic filtering conditions
// execute esProc stored procedure
st.execute();
// get the result set, which is the eligible set of employees
ResultSet set = st.getResultSet();


2014年10月14日星期二

esProc Helps Process Heterogeneous Data Sources in Java –Cross-Database Relating

JoinRowSet and FilteredRowSet provided by RowSet– Java's class library for data computing – can perform cross-database related computing, but they have a lot of weaknesses. First, JoinRowSet only supports inner join, it doesn't support outer join. Second, test shows that db2, mysql and hsql can work with JoinRowSet, yet the result set of join oracle11g to other databases is empty though no error reporting will appear. The fact is there were two users who perform cross-database join using oracle11g database even got the correct result. This suggests that JDBC produced by different database providers will probably affect the result obtained by using this method. Last, the code is complicated.

esProc has proved its ability in assisting Java to perform cross-database relating. It can work with various databases, such as oracle, db2, mysql, sqlserver, sybase and postgresql, to perform a variety of cross-database related computing, like inner join and outer join involving heterogeneous data. An example will teach you the way esProc works. Requirement: relate table sales in db2 to table employee in mysql through sale.sellerid and employee.eid, and then filter data in both sales and employee according to the criterion state=”California”. The way the code is written in this task applies to situations where other types of databases are involved.


The structure and data of table sales are as follows:

The structure and data of table employee are as follows:

Implementation approach: Call esProc script using Java program, join the multiple databases together to realize the cross-database relating, perform filtering and return the result to Java in the form of ResultSet.

The code written in esProc is as follows:

A1Connect to the data source db2 configured in advance.

A2Connect to the data source mysql configured in advance. In fact oracle and other types of databases can be used too.

A3, A4Retrieve table sequences: sales and employee, from db2 and mysql respectively. esProc's Integration Development Environment (IDE) can display the retrieved data visually, as shown in the right part of the figure in the above.
A5Relate sales to employee through sellerid=eid using esProc's object reference mechanism.

A6Filter the two table sequences according to state="California".

A7Generate a new table sequence and get the desired fields.

A8Return the result to the caller of esProc program.

This piece of program is called in Java using esProc JDBC to get the result. The code is as follows (save the above esProc program as test.dfx):

          //create a connection using esProcjdbc
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
// call esProc program (the stored procedure) in which test is the name of file dfx
com.esproc.jdbc.InternalCStatementst;
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()");
// execute esProc stored procedure
st.execute();
// get the result set
ResultSet set = st.getResultSet();

2014年10月13日星期一

esProc Getting Start: Basic Usage of JDBC

esProc can be embedded into Java program. So the latter can call the cellset program written in esProc using a way of connection such as JDBC. The method of calling the esProc program is the same as that of calling the stored procedure. The following is a brief introduction to esProc JDBC.

1.Description of the jars of esProc JDBC

esProc JDBC is like an incomplete database JDBC driver without physical tables. It can be regarded simply as a database that only supports the stored procedure. In addition, itisa built-in computing engine, thus no standalone servers are needed.

esProc JDBC has five basic jars, which are all situated in \esProc\ lib in installation directory:
dm.jar                   esProc computing engine and JDBC driver
poi-3.7-20101029.jar     process the access of Excel files
log4j_128.jar            process logs
icu4j_3_4_5.jar          handle internationalization
dom4j-1.6.1.jar          parse the configuration files

If other databases are to be used as the datasources of esProc JDBC, then the driver jars of these databases are required to be in place. For example, hsqldb.jar is necessary to use the demo database. Please note the esProc JDBC requires JDK1.6 or higher versions. 

2.Basic usage of esProc JDBC


In the cellset code, the result set is returned by result statement.

In the application code, arg1 is a cellset parameter. This dfx file will be named as test.dfx.

Note: The result set of dfx is returned by result statement. When dfx is called, the parameter names that receive the parameters won't be used; the values of parameters will be assigned according to their order instead. 

a.Load the jars to be used. Load the basic jars of esProc JDBC mentioned above while launching the Java application. These jars can be put in the directory of WEB-INF/libunder a web application.

b.  Deploy dfxConfig.xml, config.xml and the dfx file.

Prepare file config.xml, which contains the basic configuration information of esProc, such as registration code, searching path, datasource configuration, etc. The file can be found in the path esProc\configin esProc's installation directory. The information in it is the same as that set in the esProc Option page. The configuration is allowed to be adjusted before deployment (like modifying the Searching path which is used to search the dfx file):

Or the datasources necessary for dfxcan be configured in the Data Source Explorer:

After the modification, config.xml and dfxConfig.xml, which are situated in esProc\classes in the esProc's installation directory, will be saved in the class path of the application that will use them.

Put the test.dfx created in Step 1 in the class path of the application, or put it in the path specified by file dfxConfig.xml’s<paths/> node (i.e. the above-mentioned Searching path).

a. Further configure file dfxConfig.xml manually if necessary. For detailed operation, please refer to related documents. Please note the name of configured file should still be dfxConfig.xml and must not be changed.

b. Call dfx in Java program.

publicvoid testDataServer(){
              Connection con = null;
              com.esproc.jdbc.InternalCStatementst;
              com.esproc.jdbc.InternalCStatement st2;
              try{
                     //create a connection
                     Class.forName("com.esproc.jdbc.InternalDriver");
                     con= DriverManager.getConnection("jdbc:esproc:local://");
                     //call the stored procedure in which test is the name of dfx file
                     st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("calltest(?)");
                     //set the parameters
                     st.setObject(1,"3");
                     //the result obtained by executing the following code is the same as that obtained using the above calling method
                     st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(3)");
                     //execute the stored procedure
                     st.execute();
                     //get result set
                     ResultSet set = st.getResultSet();
}
              catch(Exception e){
                     System.out.println(e);
              }
              finally{
                     //close the connection
                     if (con!=null) {
                            try {
                                   con.close();
                            }
                            catch(Exception e) {
                                   System.out.println(e);
                            }
                     }
              }
}

To know more about calling methods and configuration, please refer to documents that cover a more in-depth discussion at this point. 




2014年10月12日星期日

esProc Getting Start : Installation and Basic Use

esProc is a tool for programmed data computing. It can execute various types of data analysis and structured data computing, as well as get access to the database freely and perform online data analysis.

1.Installation

You can download esProc installation package on the following official website:  http://www.raqsoft.com/esproc-download.html .

To install esProc, please run the installer according to the instructions below step by step:
1) Run the installer.

2) Click Next to continue, accept the agreement.

3)        Select the installation directory, click Install.

4)        Complete the installation.
If you are familiar with the configuration of running environment of Java, and JDK1.5 or higher version has been installed locally, you can also choose to launch the esProc installer that will not install JDK automatically but prompt you to navigate to the directory of JDK in the local machine.

2.Basic Use of esProc

Run esProc main program to open esProc and click the "new" button to create the cellset file.

On the left is the cellsetsection, the active cellsetfiles you are operating are shown here; on the right, the upper area is the valueview section and the lower area is the variable section. Value view section and variable section can be retracted and expanded easily.

In esProc, if a cell has a value, then its name can be used to reference the cell value in a computational cell or an executable cell. A cell possessing a value can be a constant cell, a computational cell or other types of cells got assigned by the executable cell. The name of a cell includes the letter representing the column in which the cell is located and the integer – the serial number of the row it resides. In the above cellset, the expression in cell A2 calls the value of cell A1. The following is another example:

But different to Excel, in esProc, the letter in the cell name must be capital
esProc provides lots of functions to handle various kinds of data computing. For example:

And auxiliary function for editing functions can make code-writing in cells more easily. During editing expressions in cells, press Alt+to open/close the auxiliary function for editing functions. On the auxiliary interface for editing functions, you can see function syntax, function reference and other information:

3.Installation and Operation under Linux System

At present, ways of installing esProc under Linux are as follows:
1.deb file
2.rpm file
3.zip file
4. Download and install esProc by updating the software source

Only Linux systems that support deb files can use this kind of installation document, such as Debian and Ubuntu. For detailed information about installing and uninstalling esProc, please refer to Installation Instructions on esProc for Linux System.

By default, esProc will be installed in the path /usr/share/raqsoft. To start esProc, just execute startup-en.sh in the path esProc/bin under the installation directory:
To start the demo database, you need to execute startDataBase.sh under the above directory.