Data processing with multi-database, off database, log, txt and excel: Example & Comments for SQL Computation Disadvantage (II)

l Set-lization is not complete.

It is beyond any doubt that set is the basis of mass data computation. Although SQL has the concept of set, it is limited to describing simple result set, and it does not take the set as a basic data type to enlarge its application scope.

Task5 Employees in the company whose birthday are the same as those of others

1	select from employee*
2	where to_char (birthday, ‘MMDD’) in
3	( select to_char(birthday, ‘MMDD’) from employee
4	group by to_char(birthday, ‘MMDD’)
5	having count()>1 )*

The original intention of grouping is to split the source set into several subsets, and its returned values are also these sub-sets. But SQL cannot describe this kind of "set consisting of sets", so it forcibly conducts the next step aggregating computation on these sub-sets and forms conventional result set.

But sometimes what we want is not the summary value on sub-sets, but rather the subsets themselves. At this time, it is necessary to use from the source set the condition obtained from grouping to query again, so sub-query appears again unavoidably.

Task6 Find out students whose scores ranks in top 10 for all subjects

1	select name
2	from (select name
3	from (select name,
4	rank() over(partition by subject order by score DESC) ranking
5	from score)
6	where ranking<=10)
7	group by name
8	having count()=(select count(distinct subject) from score)*

Use set-lized train of thought, order and filter the sub-sets of subjects after grouping to select the top 10 of every subject, and then it is possible to complete the task by finding out the intersection set of these sub-sets. But SQL cannot describe the "set of set" and has not the intersection operation to cope with indefinite quantity set. At this time, it is necessary to change the train of thought and use the window function to find out the top 10 of every subject, and then find out, according to student sub-group, the students whose number of appearances is the same as the quantity of subjects, which causes difficulty in understanding.

l It lacks object reference.

In SQL, the reference relation between tables depends on equivalent foreign key for maintenance and it is impossible to directly use the record at which the foreign key point as the field of this record. In query, it is necessary to seek help of multi-table join or sub-query to complete the query, which causes not only trouble in writing but also low efficiency in operations.

Task7 Female manager’s male employees

Use multi-table join.

1	select A.*
2	from employee A, department B, employee C
3	where A. department=B. department and B. manager=C. name and
4	A.sex ='male' AND C. gender ='female'

Use sub-query.

1	select from employee*
2	where department in
3	(select department from department
4	where manager in
5	(select name from employee where gender ='female'))

If the department field in the employee table points at the record in the department table while the manager field in the department table points at the record in the employee table, then it is only necessary to write this query condition simply as this kind of intuitive high-efficiency form:

where department.manager.sex ='female' and sex ='male'

But in SQL, it is only possible to use multi-table join or sub-query to write out the two kinds of obviously obscure statements.

Task8 Companies with which employees have their first jobs

Use multi-table join.

1	select name, company, first_company
2	from (select employee.name name, resume.company company,
3	row_number() over(partition by resume. name
4	order by resume.start_date) work_seq
5	from employee, resume where employee.name = resume.name)
6	where work_seq=1

Use sub-query.

1	select name,
2	(select company from resume
3	where name=A. name and
4	start date=(select min(start_date) from resume
5	where name=A. name)) first_company
6	from employee A

Without object reference mechanism and the completely set-lized of SQL, it is naturally impossible to handle the sub-table as an attribute of the primary table (field value). Regarding the query of sub-table, there are two methods. The first is to use multi-table join, increase the complexity of the statement, and use filter or grouping to convert the result set into the situation having one-to-one correspondence with the primary table record (the joined record has one-to-one correspondence with the sub-table). The second is to adopt sub-query, and each time compute temporarily the sub-table relating to the primary table record to record sub-sets, and increase the overall computation workload (it is impossible to use with sub-statement in sub-query) and trouble in writing.

Data processing with multi-database, off database, log, txt and excel

2014年7月30日星期三

Example & Comments for SQL Computation Disadvantage (II)

没有评论:

发表评论