Class Recording Video on Lean Six Sigma - Basic Statistics
Class Recording Video on Lean Six Sigma - Basic Statistics
Lean Six Sigma - Basic Statistics Session
Why need Data?
- To analyse performance
- To understand the system
- To review the system
- To take the correct decision
- To conclude results
- For Comparison
- Identify deviations
- Future Planning
- For prediction or anticipation
- Find out discrepancies
- Comparison with std
- Determine the cause of the problem
- Review of plans and Goals
- Eliminate repeatability
- Visualize relationship
What do we need Data?
- Analyze the current situation
- To Analyze trend
- Prepare for corrective action
- Predict
Central Tendency
Data tends to be close to its centre
CT = 15 Km/Litre
Milege of car
Mean
Is a good measure of central tendency when there is not too much variation in data.
Arithmetic average
Disadvantage – Gets impacted by extreme high or extreme low values
10 Families – 1000 pounds = 10000
Avg = sum of all/ number of all observations
= 1000 + 1000 + 1000 + 1000 + 1000 1000 + 1000 + 1000 + 1000 + 1000 /10
Average or Mean = 1000 pounds
LNM =46.5Bn Pounds
Avg = 1000 + 1000 + 1000 + 1000 + 1000 1000 + 1000 + 1000 + 1000 + 1000 + 46.5 Bn /11
= 4.6 Bn Pounds
This locality has billionaires only
150 – 194, 28000
Median –
- Prefer median when your data has high variation
- data is arranged in Ascending order. And the 50th position becomes your median.
- Advantage – Does not get impacted by extreme high or low data points
- Disadvantage – Because it is positional value, it can’t be used for mathematical calculations
Avg = 1L Total = 1L* 11 = 11L
Median = 1L = 50% time value is less than equal to 1L,
Recruiters – 35K * 20 = 7 L
Median = 27K –
Mode –
Should be used only when you have limited possibilities
Frequency of occurrence
Batsman =
0 1 2 3 4 5 6
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|
12 |
14 |
19 |
100 |
2 |
100 |
1 |
Which data is occurring the most
Mode = 0
Mode = Dice
1 |
2 |
3 |
4 |
5 |
6 |
12 |
14 |
19 |
85 |
2 |
42 |
Mode = 4
1 |
2 |
3 |
4 |
5 |
6 |
12 |
14 |
92 |
92 |
2 |
25 |
Mode =3,4 – referred as bi-modal
1 |
2 |
3 |
4 |
5 |
6 |
12 |
14 |
25 |
25 |
25 |
2 |
Mode = 3,4,5 – Tri Modal
Variation
Range = Max - Min
Quartiles will always have 25% data
Min – Q1 = 25%
Q1 – Median (Q2) = 25%
Q2 – Q3 =
IQR – Inter Quartile Data = Q3- Q1
SF = Q1/Q3 best case = 1, worst = as far away as possible
Range = Max – Min
1 2 3 4 5 = 15/5 = 3
Distance of data from Centre
Mean = 3
(1-3) + (2-3) + (3-3) + (4-3) + (5-3) =
Mean = 1 + 2 + 3 + 4 + 5/5 = 15/5 = 3
(1-3) + (2-3) + (3-3) + (4-3) + (5-3) = (-2) + (-1) + (0) + (+1) + (+2) = 0
(1-3)^2 + (2-3) ^2+ (3-3)^2 + (4-3)^2 + (5-3)^2 = 10
4+ 1+0+1+4= 10
1,2,3,4,5,3 -
Sum of Square of distance of data from centre = SS
Avg of Sum of Square of distance of data from centre = Variance
(1-3)^2 + (2-3) ^2+ (3-3)^2 + (4-3)^2 + (5-3)^2 / n-1
1 2 3 4 5
Mean = 3
(1-3) + (2-3) + (3-3) + (4-3) + (5-3) = 0
(1-3)^2 + (2-3) ^2 + (3-3) ^2 + (4-3) ^2 + (5-3) ^2 = Sum of Square
4 + 1 + 0 + 1 + 4 = 10 = Sum of Square of data from its centre (mean)
(1-3)^2 + (2-3) ^2 + (3-3) ^2 + (4-3) ^2 + (5-3) ^2 / n-1 = Variance =average of Sum of Square of data from its centre (mean)
Variance = 10/4 = 2.5
(1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2 = Sum of Square
Average of Sum of Square = Variance
Sqr Root of Variance = Std Dev
1,2,3,4,5
Mean =3
(1-3)+ (2-3) + (3-3) + (4-3) + (5-3) = 0
(1-3)^2+ (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2
4 + 1+ 0 + 1 + 4 = 10 =>> Sum of Square
Variance = Sum of Square/ n-1
Std Dev = Sqr root of Variance
Variation
Mean = 3
(1-3)^2+(2-3) ^2+(3-3) ^2+(4-3) ^2+(5-3) ^2 =
+ 4 + 1 + 0 + 1 + 4 = 10 = Sum of Square >> This is squared sum of distance of data from its centre
SS/n-1 = 10/4 = 2.5 = Variance (Avg sum of Square) >> avergage of squared sum of distance of data from its centre
Variance Sqr Root = Std Dev
(1-3)^2+(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2 = Sum of Square
= Sum of Squared distance of data from its centre
(1-3)^2+(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2 / n-1 = Variance
Std Dev = Srq Root of Variance
Normality
If a process is to be considered “Normal” it will follow the below rules
- Mean + 1 Std Dev Mean – 1 Std Dev = 68%
- Mean + 2 Std Dev Mean – 2 Std Dev = 95%
- Mean + 3 Std Dev – 3 Std Dev = 99.73%
The Process is considered non normal or out of control and the reason is considered special cause variation. Statistics mandates that u must conduct RCA for such special behaviour.
Mileage---- Mean = 15 Std Dev = 1
|
Lower |
Upper |
Mileage |
68% |
15 -1=14 |
15+1=16 |
14-16 |
95% |
15-2(1) |
15+2(1) |
13-17 |
99.73% |
15-3(1) |
15+3(1) |
12-18 |
22 ---
Production
Mean = 200 Std Dev = 20
68% à
|
|
|
|
68% |
200 + 20 |
200 – 20 |
180 - 220 |
95% |
200 + 2(20) |
200 – 2(20) |
160 - 240 |
99.73 |
200 +3(20) |
200 – 3(20) |
140 - 260 |
Under normal conditions the production will be 140 – 260 (we are 99.73% sure)
Target - 300
Route1: Mean = 30 Min Std Dev = 4 Min
Route 2: Mean = 20 Min Std Dev = 20 Min
Route1 |
|
|
|
68% |
|
|
|
95% |
|
|
|
99.73% |
30 + 3(4) =30 + 12 |
30 – 3(4)= 30 – 12 = 18 |
18 - 42 Min |
Route2 |
|
|
|
68% |
|
|
|
95% |
|
|
|
99.73% |
20 + 3(20)= 20 + 60 = 80 |
20 – 3(20) = 0 |
0 – 80 Min |
HDFC Avg =>10 Std Dev =1 from 7% to 13 %
Mean + 3 Std Dev = 10 + 3 = 13
Mean – 3 Std Dev = 10 – 3 = 7
ICICI => Mean =20 Std Dev = 20 Return -40% to 80%
Mean + 3 Std Dev = 20 + 3(20) = 20 + 60 = 80
Mean – 3 Std Dev = 20 – 60 = -40%
Milege
Mean = 20 Std Dev 1
68 % -> 21 – 19
95% -à 22 -18
99.73% --à 23 – 17
Under normal circumstance this is what my performance will be
Everytime you see special behaviour – you must conduct RCA
Consider that process is “normal” or within control
Even if one data is outside – we term that as special
Mean = 25 Min, Std Dev = 1
26 -- 24
Mean + 2 Std Dev Mean – 2 Std Dev = 95%
27 Min- 23 Min = >> 95%
Mean + 3 Std Dev – 3 Std Dev = 99.73%
28 -- 22
Normality
Normal
Statistics has a definition of the term “normal”
Following criteria:
- Mean +- 1 Std Dev = 68% of data
13-17
- Mean + _ 2 Std Dev = 95%
11 – 19 Milege
- Mean +- 3 Std Dev = 99.74 % of data
15 +6 15-6 = 21----9
If any data adheres to the above, it is referred to as normal (or following normal distribution)
If any of the above is not met – process is considered non normal and there is presence of special cause variation in data.
Milege
Mean = 15
Std Dev = 1
16 – 14 – 68% - it is 68% likely that milege shall be between 14 to 16
13 to 17 – 95% - You are 95% sure that milege shall be between 13 – 17
12-18 – 99.74
Mean = 25 Min
Std Dev = 2 Min
25 + 2 = 27
25-2 = 23
23 --- 27 = 68% time
HDFC MF = mean = 10% Std Dev = 1
ICICI MF = Mean 40 % Std Dev = 20%
|
68% |
95% |
99.73 % |
HDFC |
9-11 |
8-12 |
7-13 |
ICICI |
20-60 |
0-80 |
-20 - 100 |
Normal Data exhibits these properties:
- Mean Median Mode will be equal
- Unimodal i.e. you will have only one mode and that shall be at the centre
- Bell curve will accommodate entire process performance
- If you divide the bell curve at the centre, you will always have identical behaviour both sides.
- Bell curve never touches the axis
Mean must be used with Std Dev ( never use mean alone)
Median should be used with Min Q1 Median Q3 Max
25 min
Std dev =2 min
99.74%
25 + (3*2) = 31
25 – (3*2) = 19
30 min Target
95%
25 + 4 = 29
25-4 = 21
Std Deviation
Variation
- Vis-à-vis Centre
- Centre = Mean, we want to study distance of data from centre
- 1,2,3,4,5
- Mean = 3
- Distance from Centre
- (1-3) + (2 -3) + (3-3) + (4-3) + (5-3) = 0
- Because of this challenge – they squared the same
- (1-3)^2+ (2 -3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2
- Sum of Square is the sum of squared data from its mean
- Variance = Avg = (1-3)^2+ (2 -3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2 / n-1
- Sum of Square = 10/4 = 2.5
- Std Dev = Sqr root of Variance
- Root of the avg distance of data from its centre
- Positional Information
- Quartiles
- Q1
- Q3
- IQR
- SF
- Range
- Min
- Max
- Quartiles
Comments (0)
Facebook Comments