Both R language and esProc have the
outstanding ability to perform the stepwise computations. However, in the
particulars they differ from each other. A comparison between them will be done
by the following example:
A company’s Sales department wants to
select out the outstanding salespersons through statistics, that is, the
salespersons whose sales amounts are always among the top 3 in each month from
the January this year to the previous year. The data is mainly from the order
table of MSSQL database: salesOrder, and the main fields include the ID of
order: ordered, name of salesperson: name, sales amount: sales, and date of
order: salesDate.
The solution is
like this substantially:
1.
Compute the beginning dates of
this year and this month, and filter the data by date.
2.
Group by month and salesperson,
and compute the sales amount of each salesperson in each month.
3.
Group by month, and compute the
rankings of sales amount in each group.
4.
Filter out the top 3
salespersons from each group.
5.
Compute the set of
intersections of each group, that is, salespersons always among the top 3 in
each month.
The solution of R language is as shown
below:
01 library(RODBC)
02 odbcDataSources()
03 conn<-odbcConnect("sqlsvr")
04 originalData<-sqlQuery(conn,'select *
from salesOrder')
05 odbcClose(conn)
06
starTime<-as.POSIXlt(paste(format(Sys.Date(),'%Y'),'-01-01',sep=''))
07
endTime<-as.POSIXlt(paste(format(Sys.Date(),'%Y'),format(Sys.Date(),'%m'),'01',sep='-'))
08
fTimeData<-subset(originalData,salesDate>=starTime&salesDate<endTime)
09
gNameMonth<-aggregate(fTimeData$sales,list(fTimeData$name,format(fTimeData$salesDate,'%m')),sum)
10
names(gNameMonth)<-c('name','month','monthSales')
11 gNameMonth$rank<- do.call(c,
tapply(gNameMonth$monthSales, gNameMonth$month,function(x) rank(-x)) )
12 rData<-subset(gNameMonth,rank<=3)
13 nameList<- split(rData$name, rData$month)
14 Reduce(intersect, nameList)
The solution of esProc is as shown below:
Then, let’s compare the two solutions by
checking the database access firstly:
R language solution implements the data
access from Line01 to 05 through relatively a few more steps, and this is
acceptable considering it as the normal operations.
esProc solution allows for directly
inputting SQL statements in the cell A1, which is quite convenient.
In respect of database access, R language
and esProc differ to each other slightly. Both solutions are convenient.
Secondly, compare the time function:
R language solution computes the beginning
dates of this year and this month through line 06-07. Judging from this point,
R language is abundant in the basic functions.
esProc solution completes the same
computation in A2 and B2, in which pdate
function can be used to compute the beginning date of this month directly,
which is very convenient.
In respect of date function, it seems that
esProc is slightly better, while R language has a huge amount of 3rd-party-function
library, and maybe there is any date function that is easier to use.
The focal point is stepwise computation:
Firstly, filter by date, group by month and
sales person and then summarize by sales amount. The above functionalities are
implemented respectively in line 8-9 for R language and cell A3-A4 for esProc. The
difference is not great.
Proceed with the computation. According to
the a bit straightforward thought of analysis, the steps followed should be: 1
Group by month; 2 Add the field of ranking in the group, and compute the
rankings; 3 Filter by ranking, and only keep the salespersons that achieved the
sales amounts ranking the top 3 in each group; 4. Finally, compute the set of
intersection on the basis of the data in each group.
The corresponding codes of R language are
from line 10 – 14 in the order of 2->3->1->4. In order words, rank the
data in each group throughout the whole table, and then group. Have you noticed
anything awkward? Although it is the ranking within the group, users of R
language have to sort first and then group! This is because R language is weak
in the ability to group first and then process. To barely compose the
statements following the train of thought of 1->2->3->4, users of R
language must have a strong technical background to handle the complex iteration
statement expressions. The style of reverse thinking on this condition will
greatly simplify the codes.
esProc solution completes the similar
computation in the cell A5 – A8, not requiring any reverse thought. esProc
users can simply follow their intuitive thinking of 1->2->3->4. This
is because that esProc provides the ingenious representing style of ~. The ~
represents the current member that takes part in the computation. For this
case, the ~ is each 2-dimension table in the group (corresponds to the
data.frame of R language or the resutSet of SQL). In this way, ~.monthSales can
be used to represent a certain column of the current 2-dimension table. By compassion,
users of R language can only resort to some rather complicated means like loops
to access the current member, which is more troublesome for sure.
With regard to this comparison, esProc is
more intuitive with relatively more advantageous.
Next, let’s study on their abilities in
computing the intermediate results.
R language allows users to view the result
of each step by clicking the variable name at any time, with RStudio and other
tools.
esProc provides only one official tool,
that is, click the cell to view the result of this step.
Regarding this ability, esProc does not
differ from R language much. Considering that R language supports for a great
many of 3rd party tools, maybe there is any tool capable of
providing the better observed results.
Then, let compare their abilities to
reference the result.
R language users are only required to
define a variable for the result of computation in each step to conveniently
reference the result in the steps followed with regard to the R language
solution.
esProc users can also define variables to
reference, however, using the cell name as the variable name is more convenient
and saves the trouble of finding a meaningful name.
Next, let’s compare their performances on
set of intersection.
In the last step, the intersection set of
data of every group are to be computed. R language provides the intersect function at the bottom layer,
using together with Reduce function,
the intersection set of multiple groups of data can be computed.
esProc provides isect function to compute the set of intersection on multiple sets,
which is quite convenient.
Comparatively, R language provides the Reduce function of greater imaginary
space, and esProc is easier.
As it can be seen from the above case, R
language boasts the abundant fundamental functions and a huge amount of library
functions from the 3rd party.
In respect of data member access, esProc
provides the excellent representing style, in particular the grouping at
multi-levels. By comparison, R language relies more on the loop statements.
Both esProc and R language solutions have excellent
performances in respect of interaction.
About esProc: http://www.raqsoft.com/product-esproc
没有评论:
发表评论