SQL Analytical Functions

Analytical Functions

Analytical Function:- Analytical functions provide simple solution to complex querying problem. When we want grouping data with non-grouping column then we use analytical functions. When we want values comparison of a column then only analytical is the way to do the same.  

Categories of Analytical Function:-
    1) Ranking Analytical Function:-
    a) row_number()        b) rank()        c) dense_rank()        d) percent_rank()     e) cume_dist()        f) ntile()
    2) Aggregate Analytical Function:-
    a) sum()            b) avg()        c) min()            d) max()            e) count()            f) stddev()
    3) Row Comparison Analytical Function:-
    a) lag()            b) lead()
    4) Statistical Analytical Function:-
    a) ratio_to_report
   
General Syntax:-
    function() over()
   
Type of Analytic clause:-
    a) Ordering      :-In this clause we use "order by column_name"
    b) Partitioning :-In this clause we use "partition by column_name"
    c) Windowing :-In this clause we use "rows between x preceding and y following" ,we can use "unbounded" and "current row" clause in place of a and y as well.
Note:-we can use combination of ordering, partitioning and windowing.   

1) Ranking Function:-

row_number() vs rownum :- rownum is a pseudo-column, calculating before ordering whereas row_number is a analytical function calculating analytical clause;

row_number() vs rank() vs dense_rank() :- row_number() analytical function generatr numbering on analytical clause but not handling row tie,rank() and dense_rank() generating number with handling problem of row tie in different ways,rank() skip the next rank. if row tie happen,but dense_rank() does not skip the next rank whereas give same rank of tie records.

Examples:- 1)
select empno,ename,job,sal,row_number() over(order by sal desc) row_num,
                           rank() over(order by sal desc) rank_num,
                           dense_rank() over(order by sal desc) drank_num
from emp;



Nulls and Analytical Function:-For treating nulls values we used NULLS FIRST and NULLS LAST clause in analytical clause and default value is NULLS FIRST.

Examples:-1)                               
select empno,ename,job,sal,rank() over(order by sal desc nulls first) rank_num
from emp2 ;
2)
select empno,ename,job,sal,rank() over(order by sal desc nulls last) rank_num
from emp2 ;

Note:- We can also use nvl function for trating nulls values
Examples:-1)
select empno,ename,job,nvl(sal,6000),rank() over(order by nvl(sal,6000) desc ) rank_num
from emp2
order by sal desc nulls last;
2)
select empno,ename,job,nvl(sal,6000),rank() over(order by nvl(sal,6000) desc ) rank_num
from emp2
order by sal desc

y) Partitioning :-partitioning in analytical function allow us to separate grouping of data and then perform a function from within that group.

Examples 1:- ranking according to salary and partition of deptno wise
select empno,ename,job,deptno,sal,rank() over(partition by deptno order by sal desc ) rank_num
from emp;
 

2) ranking accoring to their salary and partition of job wise
select empno,ename,job,deptno,sal,rank() over(partition by job order by sal desc ) rank_num
from emp;

Ntile():-ntile() roughly works by dividing the number of rows retrieved into the choosen number of segments then the percentile is displayed as the segment
        that the rows fall into.
Example:-
select empno,ename,job,deptno,sal,ntile(4) over(order by sal desc ) nt
from emp;

If you wanted to know which salaries where in the top 25%, then next 25%, the next 25% and the bottom 25% then ntile(4) function is used.

Calculation of NTILE:-if you use ntile(4) and max salary is 55000 and min salary is 39000 then
                    55000-39000=16000
                    16000/4=4000
                    55000 to 55000-4000=51000 is in the top 25%
                    51000 to 51000-4000=47000 is in the 2nd 25%
                    47000 to 47000-4000=43000 is in the 3rd 25%
                and 43000 to 43000-4000=39000 is in the bottom 25%

Percent_rank():- Formula for calculation of percent_rank() is :-
                percent_rank()=(rank-1)/(number of rows-1)
example:-select empno,ename,job,deptno,sal,rank() over(order by sal desc) rank,
                                  percent_rank() over(order by sal desc ) p_rank
from emp;

CUME_DIST():-formula for cumulative_distribution is :-
            cume_dist()=heighest rank of row/number of rows
example:-select empno,ename,job,deptno,sal,rank() over(order by sal desc) rank,
                                  percent_rank() over(order by sal desc ) p_rank,
                                  cume_dist() over(order by sal desc) cume_rank
from emp;


2) Aggregate functions:- Many of the aggregate function can be used as a analytical function like SUM,AVG,COUNT,STDDEV,VARIANCE,MAX and MIN.Aggregate functions used as analytical functions offer the advantage of partitioning and ordering as well;
                       
Examples:-1) select empno,ename,job,deptno,sal,round(avg(sal) over(),2) avg_sal,
                                  sum(sal) over() sum_sal
from emp;

2) select empno,ename,job,deptno,sal,round(avg(sal) over(partition by deptno),2) dept_avg_sal,
                                  sum(sal) over(partition by deptno) dept_sum_sal
from emp;

3)select empno,ename,job,deptno,sal,round(avg(sal) over(partition by deptno),2) dept_avg_sal,
                                  sum(sal) over(partition by deptno) dept_sum_sal,
                                  max(sal) over(partition by deptno) dept_max_sal,
                                  min(sal) over(partition by deptno) dept_min_sal
from emp;

Ratio_to_report() :- It is used to know fraction value of records in the group
formula for ratio_to_report:-
            ratio_to_report()=current_value/sum of values

Example:-select empno,ename,job,deptno,sal,rank() over(order by sal desc) rank,
                                  ratio_to_report(sal) over() fraction_sal
                                  from emp; 

Windowing Clause:- syntax of windowing clause
                    function(attribute1) over(order by attribute2 rows between x preceding and y following)

Example 1:- select empno,ename,job,deptno,sal,
                  sum(sal) over(order by sal desc rows between 1 preceding and 1 following) window_sum
                  from emp;

2) For running total of salary
select empno,ename,job,deptno,sal,sum(sal) over(order by sal desc rows between unbounded preceding and current row) running_total
from emp;

Lag() and Lead() :-lag() is for previous value and lead() is for next value.

Example 1:- select empno,ename,job,deptno,sal,lag(sal) over(order by sal desc) previous,
                           lead(sal) over(order by sal desc) next
                   from emp;

2)  select empno,ename,job,deptno,sal,lag(sal) over(partition by deptno order by sal desc) previous,
                                   lead(sal) over(partition by deptno order by sal desc) next
     from emp;

1 comment: