1.Sequence and Set in esProc
Unlike
traditional programming languages, esProc employs set commonly. In fact, the
sequence in esProc belongs to the field of set. So it’s quite important to deeply
understand the concept of sets when using esProc. Like an integer and a string,
the set is a basic data type; it could be a variable value, a computation of
an expression, and a return value of a function.
As a data
type, esProc provides operators of two
sets A and B, like intersection, union, union of set, subtract and so on: A^B,A|B,A&B,A\B.
It is
recommended for users to deeply understand and master these set operations. The
thinking of sets and good use of data may bring about an easier solution.
The
following is an illustration of using set operations to simplify code:
Unlike a set in the
sense of mathematics,that in esProc is arranged orderly;
there may be duplicates of a member in a set, like that in a sequence or
sequence table.
In the scope of
mathematics, the set Intersection and Union operations are both commutative; in
other words, A∩B º B∩A and A∪B º B∪A. However, this commutative property is not valid for esProc because
the member order in a sequence for esProc cannot be changed at will. For
esProc, the result of the intersection / union operation is required to be
arranged according to the order of the left operator.
Because of this
ordering feature of sequence members, we should adopt function A.eq(B)to judge whether two sequences have the same members
instead of simply using the comparison operator == :
2.Parameters in a Loop Function
With the data type
of set, you can handle several operations on the members of set in a single
clause, without the need of loop code.
Sometimes, the loop
functions will not process the set members but the values computed based on the
members. In this case, you are allowed to use parameters to represent the
formula in a function, in which the "~" represents the current
member.
For the nested loop
functions, "~" is interpreted as a member in an inner sequence; in such
case, if you want to reference a member in an outer sequence, name of the outer
sequence must headed the "~".
The above
rule is also applicable to the field reference when ~ is omitted, of which the fields
will be interpreted as fields in an inner RSeq first, if such fields cannot
be found in the inner RSeq, esProc will search for them in the outer RSeq.
3 Order of Loop
The arguments in a
loop function will be computed according to their order in the original
sequence. This is very important.
In many cases, it is
available for a single expression to implement the function which can be
achieved by simple loop code.
4.Computation Sequence
Different from such
loop functions as sum and avg returning a single aggregate value, a computation sequence function A.(x) returns a set. Often, a new set can also be created by employingcomputationsequence,
in addition to using the basic set operations such as union, intersection and
subtract.
The execution of an aggregate
function with arguments will be divided into two steps:
1) Use arguments to produce
computed column;
2) Aggregate the result.
In other
words, A.f(x) = A.(x).f()
The new function is used
to return a sequence table by computation sequence.
In addition, there
is a run function for relevant sequence computation. This function returns
the original sequences themselves instead of the result of the relevant sequence
computation. Generally, this function is used to assign values to fields in a
record sequence (sequence table).
5.Impure Sets
esProc has no
restriction on the consistency of sequence member types, that is, a sequence
may consist of numbers, character strings, and records.
However, in
many cases, it would be of little practical significance to arrange data in
variable types in one sequence. Therefore, users should not be too concerned
about it.
On the other
hand, a record sequence – A sequence consisting of records – can consist of
records from different sequence tables; this is very convenient.
Under the
environment of esProc, it is not necessary that records in a record sequence originate
from the same sequence table. As long as they have the same field names, the
records can be processed uniformly. Here, merits of esProc include simpler program
writing, higher efficiency and reduced occupation of memory. For SQL, however,
two tables with different structures must be united into a new one by using the
UNION
clause before making any operations.
6.Sets consisting of Subsets
Since the member of a
set in esProc is less restrictive by nature, a set may be a member of another
set.If A is a set consisting of other sets, functions A.conj(), A.union(), A.diff(), A.isect() could be employed to compute concatenate, union, difference and
intersection between subsets of A.
Also, a record
sequence may be a member of a sequence.
7.Understanding Group
The group
operation is commonly used in SQL. However, many people have no in-depth
knowledge about it. The nature of a group operation is to split a set into
several subsets according to a certain rule. In other words, the return value
of a group operation shall be a set consisting of subsets. However, people
often do not need to use such a set; instead, they may need to view a part of
aggregate values of its subsets. Therefore, group operations are often followed
by the summarization operations for subsets.
This is
what the SQL does. The GROUP BY clause is always followed by a summarization operation. Because
there are no explicit set data types in SQL, a set consisting of subsets cannot
be returned directly. Therefore, summarization operations must follow group
operations.
As time
passes, one would think that the group operations are always accompanied with
the follow-up summarization operations and forget that the group operations and
the summarization operations are independently.
However,
sometimes, we would still be interested in these grouped subsets but not the
summarized values. Although a part of us might be interested in the summarized
values, they would still need to hold these subsets for reuse, but not to
discard them once a summarization is completed.
This
requests us to understand the original meaning of the group operations. With a
full realization of the thinking of sets, esProc can achieve this goal very
well. In fact, the basic group functions in esProc can only be used for
grouping which is independent to the summarization operation.
The group result is
a set consisting of several subsets, and the subset also can be grouped.
Members in the group result are also sets; and they can be grouped.Both of the
two operations will produce a multilayer set.
Because these
results are so deep in hierarchy, they are rarely used in practice. The above
example is only to show you the thinking pattern of set and the nature of the
operation.
8.Non-Equi Group
Besides the
ordinary group function, esProc provides an align@a() function for processing
alignment group and an enum() function for processing the enum group.
The group
implemented by the group function is called an equi-group with the following features:
1) Any member in the original set
must and can only be in one subset;
2) There is no empty subset.
The above
two features are unavailable for the alignment group or the enumeration group.
Alignment
group is an operation that calculates grouping expressions with members in a
set and makes a match between resulting subsets and specified values. For the
alignment group, the following steps are required:
1) Specify a group of values;
2) In the set to be grouped, move
members whose property values are equal to the specified values to one subset;
3) Each resulting subset must be
corresponding to a pre-defined condition.
It is
possible that a member exists in none of subsets, an empty set exists, or any
member exists in both subsets.
The
following case will group employees by specified states sequence:
Enum group:
First, specify a group of conditions, take the members in the set to be grouped
as arguments and compute these conditions, members satisfying these conditions
will be grouped into a subset; each subset is corresponding to a pre-defined
condition. A member may be in none of these subsets, or in two subsets at the
same time; besides, an empty set may appear.
The
following case will group employees by specified age groups:
Although apparently
it seams that these two functions differ greatly with the group, the three
functions share the same nature regarding the group operation – to split a set
into several sub-sets. The only difference is that these
three functions split sets in different ways.