The
computing power of SQL for mass structured data is complete, that is to say, it
is impossible to find anything that SQL cannot compute. But its support layer
is too low, which can lead to over-elaborate operation in practical application.
The
over-elaborate operation is specifically reflected in the following four
aspects:
l Computation without sub-step: SQL requires computation to be written out
in one statement, and it is necessary to adopt storage procedure to implement
computation step by step. No sub-step not only causes difficulty in thinking,
but also makes it difficult to use intermediate result.
l Set is unordered: SQL does not directly provide the mechanism of using position to
refer to set members, and conversion is needed to implement computation
relating to order and positioning.
l Set-lization is not complete: SQL set function is simple and is only used
to indicate the query result set and cannot be explicitly applied as basic data
type.
l It lacks object reference: SQL does not support record reference, the association between data
tables adopts equivalent foreign key scheme, and in conducting multi-table
joint computation, it is necessary to conduct join operation. So it is not only
difficult to understand, but also low in efficiency.
Implementing
data computation process based on a type of computation system is in fact the
process of translating business problem into formalized computation syntax (which
is similar to the case in which a primary-school student solves an application
problem by translating the problem into formalized four arithmetic operations).
Because of the above-mentioned four problems of SQL, in handling complex
computation, its model system is inconsistent with people’s natural thinking
habit. It causes a great barrier in translating problems, leading to the case
that the difficulty to formalize the
problem-solving method into computation syntax is much greater than to find the
solution of the problem.
We give
the following examples to describe respectively the problems in the four
aspects.
To make the statement in the examples as
simple as possible, here a large number of SQL2003 standard window functions
are used. So we adopt the ORACLE database syntax that does a relatively good
job in supporting SQL2003 standard as it will be generally more complex to
adopt the syntax of other databases to program these SQLs.
l Computation without
sub-step
Carrying
out complex computation step by step can reduce the difficulty of the problem
to a great extent, conversely, collecting a multi-step computation into one to
be completed in just one step increases the complexity of the problem.
Task1 The
number of persons of the sales department, where, the number of persons whose
native place is NY, and where, the number of female employees?
The
number of persons of the sales department
1
|
select count(*) from employee where
department='sales'
|
Where,
the number of persons whose native place is Beijing
1
|
select count(*) from employee where department=‘sales
’ and native_place='NY'
|
And
where, the number of female employees
1
|
select count (*) from employee
|
2
|
where department='sales' and native_place='NY' and gender ='female'
|
Conventional
thought: Select the persons of the sales department for counting, and from it,
find out the persons whose native place is NY for counting, and then further
find out the number of female employees for counting. The query each time is
based on the existing result last time, so it is not only simple in writing but
also higher in efficiency.
But, the
computation of SQL cannot be conducted in steps, and it is impossible to reuse
the preceding result in answering the next question, and it is only possible to
copy the query condition once more.
Task2 Each department selects a pair of male and
female employees to form a game team.
1
|
with A as
|
2
|
(select name, department,
|
3
|
row_number() over (partition by department order by 1) seq
|
4
|
from employee where gender =‘female’),
|
5
|
B as
|
6
|
(select name, department,
|
7
|
row_number() over(partition by department order by 1) seq
|
8
|
from employee where sex =‘female’)
|
9
|
select name, department from A
|
10
|
where department in ( select distinct
department from B ) and seq=1
|
11
|
union all
|
12
|
select name, department from B
|
13
|
where department in (select distinct
department from A ) and seq=1
|
Computation
without sub-step sometimes not only causes trouble in writing and low
efficiency in computation, but even causes serious deformation in the train of
thought.
The intuitive
thought of this task: For each department cycle, if this department has male
and female employees, then select one male employee and one female employee and
add them to the result set. But SQL does not support this kind of writing with
which the result set is completed step by step (to implement this kind of
scheme, it is necessary to use the stored procedure). At this time, it is
necessary to change the train of thought into: Select male employee from each department,
select female employee from each department, select out, respectively from the
two result sets, members whose departments appear in another result set, and
finally seek the union of the sets.
Fortunately,
there are still with sub-statement and window function over (SQL2003
standard begins to support); otherwise this SQL statement will be simply ugly.
l The set is unordered.
Ordered computation
is very common in mass data computation (obtain the first 3 places/the third
place, compare with the preceding period). But SQL adopts the mathematical
concept of unordered set, so ordered computation cannot be conducted directly,
and it is necessary to adjust the train of thought and change the method.
Task3 Company's employees whose ages are in the
middle
1
|
select name, birthday
|
2
|
from (select name, birthday, row_number()
over (order by birthday) ranking
|
3
|
from employee )
|
4
|
where ranking=(select
floor((count(*)+1)/2) from employee)
|
Median is
a very common computation, and originally it is only necessary to simple get
out, from the ordered set, the members whose positions are in the middle. But SQL
unordered set mechanism does not provide the mechanism which directly uses
position to access member. It is necessary to create a man-made sequence number
field, and then use the condition query method to select it out, causing the
case in which a sub-query is needed to complete the query.
Task4 For how many trading days has this stock
gone up consecutively in the longest?
1
|
select max(consecutive_day)
|
2
|
from (select count(*) (consecutive_day
|
3
|
from (select sum(rise_mark) over(order by trade_date) days_no_gain
|
4
|
from (select trade_date,
|
5
|
case when
|
6
|
closing_price >lag(closing_price) over(order by trade_date)
|
7
|
then 0 else 1 end rise_mark
|
8
|
from stock_price) )
|
9
|
group by days_no_gain)
|
Unordered
set can also cause train of thought to deform.
The conventional
train of thought for computing the number of consecutive days in which the
stock rises: Set up a temporary variable whose initial value is 0 to record the
consecutive dates in which the stock rises, and then compare it with the
preceding day. If the stock does not rise, then clear the variable to 0; if it
rises, add 1 to the variable, and see the maximum value appearing from the
variable when the cycle is over.
In using SQL,
it is impossible to describe this process, so it is necessary to change the train
of thought. To compute the accumulate number of days in which stock does not
rise from the initial date to the current date, and the one with the same number
of days in which stock does not rise is the consecutive trading days in which
the stock rises, and from its sub-group, it is possible to find out the
interval in which the stock rises, and then seek its maximum count. It is
already not so easy to read and understand this statement and it is more
difficult to write it out.
没有评论:
发表评论