题目:一条SQL语句,查处在此间连续三年纳税额超过2000万的企业纳税年份的起止是多少
YEAR | ENT | AMT |
---|---|---|
2010 | AA | 2000 |
2011 | AA | 2500 |
2012 | AA | 2300 |
2013 | AA | 1900 |
2014 | AA | 2200 |
2015 | AA | 2300 |
2016 | AA | 2006 |
2010 | BB | 2400 |
2011 | BB | 2401 |
2012 | BB | 1800 |
2013 | BB | 2100 |
2014 | BB | 2101 |
2015 | BB | 2102 |
2016 | BB | 2020 |
结果如下:
企业名称 | 开始时间 | 结束时间 | 平均纳税 |
---|---|---|---|
AA | 2010 | 2012 | 2266.7 |
AA | 2014 | 2016 | 2168.7 |
BB | 2013 | 2016 | 2080.8 |
Oracle实现方式:
SQL> with t
2 as
3 ( select '2010' year,'AA' ENT,2000 AMT FROM DUAL
4 UNION ALL
5 select '2011' year,'AA' ENT,2500 AMT FROM DUAL
6 UNION ALL
7 select '2012' year,'AA' ENT,2300 AMT FROM DUAL
8 UNION ALL
9 select '2013' year,'AA' ENT,1900 AMT FROM DUAL
10 UNION ALL
11 select '2014' year,'AA' ENT,2200 AMT FROM DUAL
12 UNION ALL
13 select '2015' year,'AA' ENT,2300 AMT FROM DUAL
14 UNION ALL
15 select '2016' year,'AA' ENT,2006 AMT FROM DUAL
16 UNION ALL
17 select '2010' year,'BB' ENT,2400 AMT FROM DUAL
18 UNION ALL
19 select '2011' year,'BB' ENT,2401 AMT FROM DUAL
20 UNION ALL
21 select '2012' year,'BB' ENT,1800 AMT FROM DUAL
22 UNION ALL
23 select '2013' year,'BB' ENT,2100 AMT FROM DUAL
24 UNION ALL
25 select '2014' year,'BB' ENT,2101 AMT FROM DUAL
26 UNION ALL
27 select '2015' year,'BB' ENT,2102 AMT FROM DUAL
28 UNION ALL
29 select '2016' year,'BB' ENT,2020 AMT FROM DUAL )
30 SELECT ENT, MIN(year), MAX(YEAR) ,round(AVG(AMT ),1)
31 FROM ( SELECT YEAR,ENT,AMT FROM T WHERE AMT >= 2000 ORDER BY ENT, YEAR ASC )
32 GROUP BY YEAR-ROWNUM,ENT
33 HAVING MAX(YEAR)-MIN(YEAR) >=2
34 ORDER BY ENT ,MIN(YEAR);
ENT MIN(YEAR) MAX(YEAR) ROUND(AVG(AMT),1)
--- --------- --------- -----------------
AA 2010 2012 2266.7
AA 2014 2016 2168.7
BB 2013 2016 2080.8
MySQL实现方式
mysql> select * from dao_1 ;
+------+------+------+
| year | ent | amt |
+------+------+------+
| 2010 | AA | 2000 |
| 2011 | AA | 2500 |
| 2012 | AA | 2300 |
| 2013 | AA | 1900 |
| 2014 | AA | 2200 |
| 2015 | AA | 2300 |
| 2016 | AA | 2006 |
| 2010 | BB | 2400 |
| 2011 | BB | 2401 |
| 2012 | BB | 1800 |
| 2013 | BB | 2100 |
| 2014 | BB | 2101 |
| 2015 | BB | 2102 |
| 2016 | BB | 2020 |
+------+------+------+
14 rows in set (0.00 sec)
mysql> select ent,min(year),max(year),round(avg(amt),1)
-> from (select @rownum := @rownum + 1 rn, year, ent, amt
-> from (SELECT * from dao_1 order by ent, year) dao_v,
-> (SELECT @rownum := 0) r
-> where amt >= 2000) dao_vv
-> group by year - rn,ent
-> having max(year)-min(year) >=2 ;
+------+-----------+-----------+-------------------+
| ent | min(year) | max(year) | round(avg(amt),1) |
+------+-----------+-----------+-------------------+
| BB | 2013 | 2016 | 2080.8 |
| AA | 2010 | 2012 | 2266.7 |
| AA | 2014 | 2016 | 2168.7 |
+------+-----------+-----------+-------------------+
3 rows in set (0.00 sec)
Hive实现方式
1: jdbc:hive2://localhost:10000/default> select * from bsr.bsr ;
+-----------+----------+----------+--+
| bsr.year | bsr.ent | bsr.amt |
+-----------+----------+----------+--+
| 2010 | AA | 2000 |
| 2011 | AA | 2500 |
| 2012 | AA | 2300 |
| 2013 | AA | 1900 |
| 2014 | AA | 2200 |
| 2015 | AA | 2300 |
| 2016 | AA | 2006 |
| 2010 | BB | 2400 |
| 2011 | BB | 2401 |
| 2012 | BB | 1800 |
| 2013 | BB | 2100 |
| 2014 | BB | 2101 |
| 2015 | BB | 2102 |
| 2016 | BB | 2020 |
+-----------+----------+----------+--+
1: jdbc:hive2://localhost:10000/default> select ent,min(year),max(year),round(avg(amt),1)
1: jdbc:hive2://localhost:10000/default> from
1: jdbc:hive2://localhost:10000/default> (select ent,year,amt ,year-row_number() over (order by ent,year) rn from bsr where amt>=2000 ) dao
1: jdbc:hive2://localhost:10000/default> group by dao.ent,dao.rn
1: jdbc:hive2://localhost:10000/default> having max(year)-min(year)>=2;
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:1
INFO : Submitting tokens for job: job_1524221525936_0022
INFO : The url to track the job: http://hadoop:8088/proxy/application_1524221525936_0022/
INFO : Starting Job = job_1524221525936_0022, Tracking URL = http://hadoop:8088/proxy/application_1524221525936_0022/
INFO : Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1524221525936_0022
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2018-04-20 20:12:16,322 Stage-1 map = 0%, reduce = 0%
INFO : 2018-04-20 20:12:27,181 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.21 sec
INFO : 2018-04-20 20:12:39,250 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.24 sec
INFO : MapReduce Total cumulative CPU time: 5 seconds 240 msec
INFO : Ended Job = job_1524221525936_0022
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:1
INFO : Submitting tokens for job: job_1524221525936_0023
INFO : The url to track the job: http://hadoop:8088/proxy/application_1524221525936_0023/
INFO : Starting Job = job_1524221525936_0023, Tracking URL = http://hadoop:8088/proxy/application_1524221525936_0023/
INFO : Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1524221525936_0023
INFO : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
INFO : 2018-04-20 20:12:59,635 Stage-2 map = 0%, reduce = 0%
INFO : 2018-04-20 20:13:08,677 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.44 sec
INFO : 2018-04-20 20:13:21,584 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.58 sec
INFO : MapReduce Total cumulative CPU time: 4 seconds 580 msec
INFO : Ended Job = job_1524221525936_0023
+------+-------+-------+---------+--+
| ent | _c1 | _c2 | _c3 |
+------+-------+-------+---------+--+
| AA | 2010 | 2012 | 2266.7 |
| AA | 2014 | 2016 | 2168.7 |
| BB | 2013 | 2016 | 2080.8 |
+------+-------+-------+---------+--+
3 rows selected (81.369 seconds)
深化问题
现在我们把问题深化一下, 如果出现一个企业 AA从 2014到2018 连续5年纳税超过2000万 则需要把这条记录拆分成 2014-2016 2015-2017 2016-2018
现在我们来实现一下
SQL> with t
2 as
3 ( select '2010' year,'AA' ENT,2000 AMT FROM DUAL
4 UNION ALL
5 select '2011' year,'AA' ENT,2500 AMT FROM DUAL
6 UNION ALL
7 select '2012' year,'AA' ENT,2300 AMT FROM DUAL
8 UNION ALL
9 select '2013' year,'AA' ENT,1900 AMT FROM DUAL
10 UNION ALL
11 select '2014' year,'AA' ENT,2200 AMT FROM DUAL
12 UNION ALL
13 select '2015' year,'AA' ENT,2300 AMT FROM DUAL
14 UNION ALL
15 select '2016' year,'AA' ENT,2006 AMT FROM DUAL
16 UNION ALL
17 select '2017' year,'AA' ENT,2066 AMT FROM DUAL
18 UNION ALL
19 select '2018' year,'AA' ENT,2076 AMT FROM DUAL
20 UNION ALL
21 select '2010' year,'BB' ENT,2400 AMT FROM DUAL
22 UNION ALL
23 select '2011' year,'BB' ENT,2401 AMT FROM DUAL
24 UNION ALL
25 select '2012' year,'BB' ENT,1800 AMT FROM DUAL
26 UNION ALL
27 select '2013' year,'BB' ENT,2100 AMT FROM DUAL
28 UNION ALL
29 select '2014' year,'BB' ENT,2101 AMT FROM DUAL
30 UNION ALL
31 select '2015' year,'BB' ENT,2102 AMT FROM DUAL
32 UNION ALL
33 select '2016' year,'BB' ENT,2020 AMT FROM DUAL
34 )
35 select t.ent,b3,e3,round(avg(amt),1)
36 from
37 ( SELECT ENT, MIN(year) b, MAX(YEAR) e
38 FROM ( SELECT YEAR,ENT,AMT FROM T WHERE AMT >= 2000 ORDER BY ENT, YEAR ASC )
39 GROUP BY YEAR-ROWNUM,ENT
40 HAVING MAX(YEAR)-MIN(YEAR) >=2
41 ORDER BY ENT ,MIN(YEAR) ) v1 ,
42 (select level+2000 b3 ,level+2000+2 e3
43 from dual
44 connect by level <=20 ) v2 ,
45 t
46 where v1.b <=v2.b3
47 and v1.e >= v2.e3
48 and v1.ent=t.ent
49 and t.year between v2.b3 and v2.e3
50 group by t.ent,b3,e3
51 order by ent ,b3;
ENT B3 E3 ROUND(AVG(AMT),1)
--- ---------- ---------- -----------------
AA 2010 2012 2266.7
AA 2014 2016 2168.7
AA 2015 2017 2124
AA 2016 2018 2049.3
BB 2013 2015 2101
BB 2014 2016 2074.3
6 rows selected
MySQL实现方式
mysql> select ent,begin_year,end_year
-> from
-> (select ent,min(year)min_year ,max(year) max_year
-> from (select @rownum := @rownum + 1 rn, year, ent, amt
-> from (SELECT * from dao_1 order by ent, year) dao_v,
-> (SELECT @rownum := 0) r
-> where amt >= 2000) dao_vv
-> group by year - rn,ent
-> having max(year)-min(year) >=2 ) level_1_res ,
-> (select m_year+rn begin_year, m_year+rn+2 end_year from
-> ( select @rownum2 := @rownum2 + 1 rn from dao_1 ) dao_year ,
-> (SELECT @rownum2 := 0) dao_row_meta ,
-> (select min(year)-1 m_year from dao_1 )dao_m_y
-> ) dao_rn_map
-> where min_year <= begin_year
-> and max_year >= end_year
-> order by ENT,MIN_YEAR;
+------+------------+----------+
| ent | begin_year | end_year |
+------+------------+----------+
| AA | 2010 | 2012 |
| AA | 2014 | 2016 |
| BB | 2013 | 2015 |
| BB | 2014 | 2016 |
+------+------------+----------+
4 rows in set (0.00 sec)