Class Recording Video on Lean Six Sigma  Basic Statistics
Class Recording Video on Lean Six Sigma  Basic Statistics
Lean Six Sigma  Basic Statistics Session
Why need Data?
 To analyse performance
 To understand the system
 To review the system
 To take the correct decision
 To conclude results
 For Comparison
 Identify deviations
 Future Planning
 For prediction or anticipation
 Find out discrepancies
 Comparison with std
 Determine the cause of the problem
 Review of plans and Goals
 Eliminate repeatability
 Visualize relationship
What do we need Data?
 Analyze the current situation
 To Analyze trend
 Prepare for corrective action
 Predict
Central Tendency
Data tends to be close to its centre
CT = 15 Km/Litre
Milege of car
Mean
Is a good measure of central tendency when there is not too much variation in data.
Arithmetic average
Disadvantage – Gets impacted by extreme high or extreme low values
10 Families – 1000 pounds = 10000
Avg = sum of all/ number of all observations
= 1000 + 1000 + 1000 + 1000 + 1000 1000 + 1000 + 1000 + 1000 + 1000 /10
Average or Mean = 1000 pounds
LNM =46.5Bn Pounds
Avg = 1000 + 1000 + 1000 + 1000 + 1000 1000 + 1000 + 1000 + 1000 + 1000 + 46.5 Bn /11
= 4.6 Bn Pounds
This locality has billionaires only
150 – 194, 28000
Median –
 Prefer median when your data has high variation
 data is arranged in Ascending order. And the 50^{th} position becomes your median.
 Advantage – Does not get impacted by extreme high or low data points
 Disadvantage – Because it is positional value, it can’t be used for mathematical calculations
Avg = 1L Total = 1L* 11 = 11L
Median = 1L = 50% time value is less than equal to 1L,
Recruiters – 35K * 20 = 7 L
Median = 27K –
Mode –
Should be used only when you have limited possibilities
Frequency of occurrence
Batsman =
0 1 2 3 4 5 6

1 
2 
3 
4 
5 
6 
7 

12 
14 
19 
100 
2 
100 
1 
Which data is occurring the most
Mode = 0
Mode = Dice
1 
2 
3 
4 
5 
6 
12 
14 
19 
85 
2 
42 
Mode = 4
1 
2 
3 
4 
5 
6 
12 
14 
92 
92 
2 
25 
Mode =3,4 – referred as bimodal
1 
2 
3 
4 
5 
6 
12 
14 
25 
25 
25 
2 
Mode = 3,4,5 – Tri Modal
Variation
Range = Max  Min
Quartiles will always have 25% data
Min – Q1 = 25%
Q1 – Median (Q2) = 25%
Q2 – Q3 =
IQR – Inter Quartile Data = Q3 Q1
SF = Q1/Q3 best case = 1, worst = as far away as possible
Range = Max – Min
1 2 3 4 5 = 15/5 = 3
Distance of data from Centre
Mean = 3
(13) + (23) + (33) + (43) + (53) =
Mean = 1 + 2 + 3 + 4 + 5/5 = 15/5 = 3
(13) + (23) + (33) + (43) + (53) = (2) + (1) + (0) + (+1) + (+2) = 0
(13)^2 + (23) ^2+ (33)^2 + (43)^2 + (53)^2 = 10
4+ 1+0+1+4= 10
1,2,3,4,5,3 
Sum of Square of distance of data from centre = SS
Avg of Sum of Square of distance of data from centre = Variance
(13)^2 + (23) ^2+ (33)^2 + (43)^2 + (53)^2 / n1
1 2 3 4 5
Mean = 3
(13) + (23) + (33) + (43) + (53) = 0
(13)^2 + (23) ^2 + (33) ^2 + (43) ^2 + (53) ^2 = Sum of Square
4 + 1 + 0 + 1 + 4 = 10 = Sum of Square of data from its centre (mean)
(13)^2 + (23) ^2 + (33) ^2 + (43) ^2 + (53) ^2 / n1 = Variance =average of Sum of Square of data from its centre (mean)
Variance = 10/4 = 2.5
(13)^2 + (23)^2 + (33)^2 + (43)^2 + (53)^2 = Sum of Square
Average of Sum of Square = Variance
Sqr Root of Variance = Std Dev
1,2,3,4,5
Mean =3
(13)+ (23) + (33) + (43) + (53) = 0
(13)^2+ (23)^2 + (33)^2 + (43)^2 + (53)^2
4 + 1+ 0 + 1 + 4 = 10 =>> Sum of Square
Variance = Sum of Square/ n1
Std Dev = Sqr root of Variance
Variation
Mean = 3
(13)^2+(23) ^2+(33) ^2+(43) ^2+(53) ^2 =
+ 4 + 1 + 0 + 1 + 4 = 10 = Sum of Square >> This is squared sum of distance of data from its centre
SS/n1 = 10/4 = 2.5 = Variance (Avg sum of Square) >> avergage of squared sum of distance of data from its centre
Variance Sqr Root = Std Dev
(13)^2+(23)^2+(33)^2+(43)^2+(53)^2 = Sum of Square
= Sum of Squared distance of data from its centre
(13)^2+(23)^2+(33)^2+(43)^2+(53)^2 / n1 = Variance
Std Dev = Srq Root of Variance
Normality
If a process is to be considered “Normal” it will follow the below rules
 Mean + 1 Std Dev Mean – 1 Std Dev = 68%
 Mean + 2 Std Dev Mean – 2 Std Dev = 95%
 Mean + 3 Std Dev – 3 Std Dev = 99.73%
The Process is considered non normal or out of control and the reason is considered special cause variation. Statistics mandates that u must conduct RCA for such special behaviour.
Mileage Mean = 15 Std Dev = 1

Lower 
Upper 
Mileage 
68% 
15 1=14 
15+1=16 
1416 
95% 
152(1) 
15+2(1) 
1317 
99.73% 
153(1) 
15+3(1) 
1218 
22 
Production
Mean = 200 Std Dev = 20
68% à




68% 
200 + 20 
200 – 20 
180  220 
95% 
200 + 2(20) 
200 – 2(20) 
160  240 
99.73 
200 +3(20) 
200 – 3(20) 
140  260 
Under normal conditions the production will be 140 – 260 (we are 99.73% sure)
Target  300
Route1: Mean = 30 Min Std Dev = 4 Min
Route 2: Mean = 20 Min Std Dev = 20 Min
Route1 



68% 



95% 



99.73% 
30 + 3(4) =30 + 12 
30 – 3(4)= 30 – 12 = 18 
18  42 Min 
Route2 



68% 



95% 



99.73% 
20 + 3(20)= 20 + 60 = 80 
20 – 3(20) = 0 
0 – 80 Min 
HDFC Avg =>10 Std Dev =1 from 7% to 13 %
Mean + 3 Std Dev = 10 + 3 = 13
Mean – 3 Std Dev = 10 – 3 = 7
ICICI => Mean =20 Std Dev = 20 Return 40% to 80%
Mean + 3 Std Dev = 20 + 3(20) = 20 + 60 = 80
Mean – 3 Std Dev = 20 – 60 = 40%
Milege
Mean = 20 Std Dev 1
68 % > 21 – 19
95% à 22 18
99.73% à 23 – 17
Under normal circumstance this is what my performance will be
Everytime you see special behaviour – you must conduct RCA
Consider that process is “normal” or within control
Even if one data is outside – we term that as special
Mean = 25 Min, Std Dev = 1
26  24
Mean + 2 Std Dev Mean – 2 Std Dev = 95%
27 Min 23 Min = >> 95%
Mean + 3 Std Dev – 3 Std Dev = 99.73%
28  22
Normality
Normal
Statistics has a definition of the term “normal”
Following criteria:
 Mean + 1 Std Dev = 68% of data
1317
 Mean + _ 2 Std Dev = 95%
11 – 19 Milege
 Mean + 3 Std Dev = 99.74 % of data
15 +6 156 = 219
If any data adheres to the above, it is referred to as normal (or following normal distribution)
If any of the above is not met – process is considered non normal and there is presence of special cause variation in data.
Milege
Mean = 15
Std Dev = 1
16 – 14 – 68%  it is 68% likely that milege shall be between 14 to 16
13 to 17 – 95%  You are 95% sure that milege shall be between 13 – 17
1218 – 99.74
Mean = 25 Min
Std Dev = 2 Min
25 + 2 = 27
252 = 23
23  27 = 68% time
HDFC MF = mean = 10% Std Dev = 1
ICICI MF = Mean 40 % Std Dev = 20%

68% 
95% 
99.73 % 
HDFC 
911 
812 
713 
ICICI 
2060 
080 
20  100 
Normal Data exhibits these properties:
 Mean Median Mode will be equal
 Unimodal i.e. you will have only one mode and that shall be at the centre
 Bell curve will accommodate entire process performance
 If you divide the bell curve at the centre, you will always have identical behaviour both sides.
 Bell curve never touches the axis
Mean must be used with Std Dev ( never use mean alone)
Median should be used with Min Q1 Median Q3 Max
25 min
Std dev =2 min
99.74%
25 + (3*2) = 31
25 – (3*2) = 19
30 min Target
95%
25 + 4 = 29
254 = 21
Std Deviation
Variation
 Visàvis Centre
 Centre = Mean, we want to study distance of data from centre
 1,2,3,4,5
 Mean = 3
 Distance from Centre
 (13) + (2 3) + (33) + (43) + (53) = 0
 Because of this challenge – they squared the same
 (13)^2+ (2 3)^2 + (33)^2 + (43)^2 + (53)^2
 Sum of Square is the sum of squared data from its mean
 Variance = Avg = (13)^2+ (2 3)^2 + (33)^2 + (43)^2 + (53)^2 / n1
 Sum of Square = 10/4 = 2.5
 Std Dev = Sqr root of Variance
 Root of the avg distance of data from its centre
 Positional Information
 Quartiles
 Q1
 Q3
 IQR
 SF
 Range
 Min
 Max
 Quartiles
Comments (0)
Facebook Comments