TE Taylor Enterprises, Inc.
www.variation.com
Quality and Statistics
Books, Software, Training and Consulting

Return to Change-Point Analyzer Main Page


Search variation.com

Enter keywords: 

Exact Match Search

 

Site Map

Products (HOME)

Books

Software

Courses

Consulting


Expertise

Acceptance Sampling

Process Validation

CAPAs and Trending of Quality Data

FMEA

Measurement Systems Analysis

Spec Setting, Tolerance Analysis and Robust Design

General Statistics

Statistical Process Control

Design of Experiments

Six Sigma


Store  

What's New

Technical Library

FAQ


  Contact Info

Chairman
Dr. Wayne A. Taylor
President
Ann Taylor
Telephone
1 (847) 367-1032
FAX
1 (847) 367-1037
Postal address
5510 Fairmont Rd.
Libertyville, IL 60048
USA
Electronic mail
info@variation.com
Web
www.variation.com

 

Subscribe to our Web Site

By entering your e-mail address and clicking the Subscribe button, you will automatically be added to our mailing list.  You will receive an e-mail when new versions of our software or books are available as well as other significant announcements.  (privacy policy).

E-mail address to send notifications to:

    

 

A Pattern Test for Distinguishing Between
Autoregressive and Mean-Shift Data

Dr. Wayne A. Taylor

 

Statistical methods such as control charts and change-point analysis are commonly used to determine whether the mean has shifted.  Such methods assume independent errors around a possibly changing mean.  When such techniques are applied to autoregressive data, erroneous conclusions can result.  However, shifts of the mean create autocorrelation between the observations making it difficult to distinguish mean-shift data from autoregressive data.  A pattern test has been devised that can reliably distinguish between these two important cases.

 

Table of Contents

Introduction
The Mean-Shift Model
The First Order Autoregressive Model
The Pattern Test
Applications of the Pattern Test
Handling Ties
Other Applications of Pi
Conclusion
Appendix A
References

Introduction

Look at Figures 1-3.  Which two sets of data are most similar in structure?

 

Figure 1: Mean-Shift Data

 

Figure 1:  Mean-Shift Model

 

Figure 2:  First Order Autoregressive Model - Positive Correlation

 

Figure 2:  First Order Autoregressive Model - Positive Correlation

 

Figure 3:  First Order Autoregressive Model - Negative Correlation

 

Figure 3:  First Order Autoregressive Model - Negative Correlation

 

Would you be surprised to find out it is the plots in Figures 2 and 3?  Both were generated using a first order autoregressive model.  The plot in Figure 1 was generated using a different model, called the mean-shift model.  When analyzing data collected over time, it is important to be able to distinguish between these two important cases.  Visual inspection of such data is unreliable.  A pattern test has been developed which can reliably distinguish between these two models.

 

The Mean-Shift Model

 Statistical methods such as control charts and change-point analysis assume a series of independent observations collected over time.  At one or more points in time the mean may shift.  Let X1, X2, ... represent the data in time order.  The mean-shift model can be written as

 

            Xi = mi + ei

 

where mi is the average at time i.  Generally mi = mi-1 except for a small number of values of i called the change-points.  ei is the random error associated with the i-th value.  It is assumed that the ei are independent and identically distributed with means of zero.  Other assumptions including normality may also be made by some of these statistical methods but are not required for the proposed pattern test.

 

The data shown in Figure 1 was generated using the following model:

 

            ei~ N(0,1) and independent

            m1, m21, m41, m61, m81~ N(10,1) and independent

            For all other i, mi = mi-1

 

N(m,s) means normally distributed with mean m and standard deviation s.  This model could result from a process where the mean shifts as a result of periodic material changes.  It could also result from a process subject to both setup and within setup variation.  In other cases, the mean-shifts could occur at random times.  The proposed pattern test works for any of these situations.

 

The First Order Autoregressive Model

The data shown in Figures 2 and 3 were generated using the first order autoregressive model:

 

            ei~ N(0,1) and independent

            ri = f ri-1 + ei

            r0 = 0

            Xi = 10 + ri

 

f is a constant between -1 and 1.  The above model results in a correlation between successive values of:

 

            Corr{Xi, Xi-1} = f

 

Values of f=0.7 and f=-0.7 were used respectively in Figures 2 and 3.  When f=0, the autoregressive model reduces to what is called the white noise model where Xi ~ N(10,1) and independent.  This is also a special case of the mean-shift model with no shifts.

 

When checking for an autoregressive model, one frequently calculates the autocorrelations and displays them in the form of a correologram.  However, this is only useful for distinguishing between an autoregressive model and white noise.  The mean-shift model also results in autocorrelations between the values.  In Figure 1 the correlation between consecutive values is 0.43.  Looking at the autocorrelations will not allow one to distinguish between these two models.

 

 The Pattern Test

 Figure 4 shows the six possible patterns that can result from plotting three consecutive points when there are no ties.  Pattern 1 is called the double up pattern and Pattern 6 is called the double down pattern.  The other 4 patterns will be referred to as reversal patterns.  For the autoregressive model, the double up and double down patterns are most common when there is a positive autocorrelation as in Figure 2.  The reversal patterns are most common when there is a negative correlation as in Figure 3.

 

When the means of the 3 points are the same, all six patterns are equally likely.  In this case, the double up and double down patterns should occur 1/3 the time and the reversal patterns should occur 2/3 of the time.  The pattern test involves counting the number of times the double up/down patterns occur.  This count is slightly biased when the mean shifts or there is an outlier.  However the bias is small and easily compensated for making this count useful for distinguishing between mean-shift and autoregressive data.  If this count is significantly greater than a third the number of values, the data is autoregressive with positive correlation.  If this count is significantly less than a third, the data is autoregressive with negative correlation.  Otherwise the mean-shift model fits the observed data.

 

 

Figure 4:  Six Patterns for Three Consecutive Points

 

Figure 4:  Six Patterns for Three Consecutive Points

 

Table 1 gives critical values for S for a 2-sided test with a=0.05 for n between 10 and 200.  If S £ slower, the data is autocorrelated with negative correlation.  If S ³ supper, the data is autocorrelated with positive correlation.  Otherwise, the data is consistent with the mean-shift model.  These critical values and the approximations given below are all based on the assumption that the number of shifts and outliers is less than 1 per 20 data points.  This assumption should rarely restrict the use of this procedure.

 

 

Table 1:  Two-Sided Critical Values for S = Number of Double Up/Down Patterns (a=0.05)

 

n

slower

supper

 

n

slower

supper

 

n

slower

supper

 

n

slower

supper

10

0

6

 

58

12

26

 

106

26

46

 

154

40

64

11

0

6

 

59

12

27

 

107

26

46

 

155

40

64

12

0

7

 

60

12

27

 

108

26

46

 

156

41

65

13

0

7

 

61

13

28

 

109

27

47

 

157

41

65

14

1

8

 

62

13

28

 

110

27

47

 

158

41

65

15

1

8

 

63

13

28

 

111

27

47

 

159

41

66

16

1

9

 

64

13

29

 

112

27

48

 

160

42

67

17

1

9

 

65

14

30

 

113

27

48

 

161

42

67

18

1

9

 

66

14

30

 

114

28

49

 

162

42

67

19

2

10

 

67

14

30

 

115

28

49

 

163

43

68

20

2

11

 

68

15

31

 

116

28

49

 

164

43

68

21

2

11

 

69

15

31

 

117

29

50

 

165

43

68

22

2

11

 

70

15

31

 

118

29

50

 

166

44

69

23

3

12

 

71

16

32

 

119

29

50

 

167

44

69

24

3

13

 

72

16

32

 

120

30

51

 

168

44

70

25

3

13

 

73

16

32

 

121

30

52

 

169

44

70

26

3

13

 

74

16

33

 

122

30

52

 

170

45

71

27

4

14

 

75

16

33

 

123

30

52

 

171

45

71

28

4

14

 

76

17

34

 

124

31

53

 

172

45

71

29

4

14

 

77

17

34

 

125

31

53

 

173

46

72

30

4

15

 

78

17

34

 

126

31

53

 

174

46

72

31

4

15

 

79

18

35

 

127

32

54

 

175

46

72

32

5

16

 

80

18

35

 

128

32

54

 

176

46

72

33

5

16

 

81

18

36

 

129

32

54

 

177

47

73

34

5

16

 

82

18

36

 

130

33

55

 

178

47

73

35

6

17

 

83

19

37

 

131

33

55

 

179

47

73

36

6

17

 

84

19

37

 

132

33

55

 

180

47

74

37

6

18

 

85

19

37

 

133

34

56

 

181

48

75

38

6

18

 

86

20

38

 

134

34

57

 

182

48

75

39

7

19

 

87

20

38

 

135

34

57

 

183

48

75

40

7

19

 

88

20

38

 

136

34

57

 

184

49

76

41

7

20

 

89

21

39

 

137

35

58

 

185

49

76

42

7

20

 

90

21

39

 

138

35

58

 

186

49

76

43

8

21

 

91

21

40

 

139

35

58

 

187

50

77

44

8

21

 

92

21

40

 

140

36

59

 

188

50

77

45

8

21

 

93

22

41

 

141

36

59

 

189

50

77

46

9

22

 

94

22

41

 

142

36

60

 

190

51

78

47

9

22

 

95

22

41

 

143

37

60

 

191

51

78

48

9

22

 

96

23

42

 

144

37

61

 

192

51

78

49

9

23

 

97

23

42

 

145

37

61

 

193

52

79

50

9

23

 

98

23

42

 

146

37

61

 

194

52

80

51

10

24

 

99

24

43

 

147

38

62

 

195

52

80

52

10

24

 

100

24

44

 

148

38

62

 

196

52

80

53

10

24

 

101

24

44

 

149

38

62

 

197

53

81

54

11

25

 

102

24

44

 

150

39

63

 

198

53

81

55

11

25

 

102

25

45

 

151

39

63

 

199

53

81

56

11

25

 

104

25

45

 

152

39

63

 

200

54

82

57

12

26

 

105

25

45

 

153

40

64

 

 

 

 

 

Note:  n = sample size.  If S £ slower, the data is autocorrelated with negative correlation.  If S ³ supper, the data is autocorrelated with positive correlation.  Otherwise, the data is consistent with the mean-shift model.

 

 

Formulas 1 and 2 can also be used to calculate significance levels.  If alower£ 0.025, the data is autocorrelated with negative correlation.  If aupper£ 0.025, the data is autocorrelated with positive correlation.  Otherwise, any correlation in the data is the result of mean shifts.

 

                                                                                              (1)

 

                        where   ,      and  

 

                                                                                                   (2)

 

                        where   ,      and  

 

Ip(a,b) is the incomplete beta function.  The derivation of these formulas is given in Appendix A.  They are within 2% of the true value for 0.01£a£0.1 and n³10.  Formulas 3 and 4 give a second less accurate approximation that can be used when n³100.

 

                                                                                                   (3)

 

                                                                                        (4)

 

 Applications of the Pattern Test

Table 2 shows the results of applying the pattern test to the three sets of generated data in Figures 1-3 plus the three real sets of data shown in Figures 5-7.  In Figures 1-3, n=100 resulting in critical values slower=24 and supper=44.  For the mean-shift data in Figure 1, S=38 which falls between the two critical values.  This is consistent with a mean-shift model.  For the Figure 2 autoregressive data with positive correlation, S=46.  This exceeds the upper critical value proving the data is not consistent with a mean-shift model.  For the Figure 3 autoregressive data with negative correlation, S=19.  This is below the lower critical value again proving the data is not consistent with a mean-shift model.  The a values from Equations 1-4 support these same conclusions.  Also shown are the true a values obtained through simulation.  All four approximations are accurate to three digits when n=100.

 

Table 2:  Analysis of Example Data Sets

 

Fig.

Model

n

S

slower

supper

alower
true

alower
(Eq. 1)

alower
(Eq. 3)

aupper

true

aupper
(Eq. 2)

aupper
(Eq. 4)

1

Mean-Shift

100

38

24

44

0.9187

0.9185

0.9187

0.2300

0.2296

0.2298

2

Autoregressive - Positive

100

46

24

44

0.9995

0.9996

0.9995

0.0047

0.0045

0.0046

3

Autoregressive - Negative

100

19

24

44

0.0007

0.0007

0.0008

0.9999

0.9999

0.9999

5

Number Sunspots

50

38

9

23

1.0000

1.0000

1.0000

0.0000

0.0000

0.0000

6

Batch Yields

70

9

15

31

0.0001

0.0000

0.0000

1.0000

1.0000

1.0000

7

Part Strength

52

19

10

24

0.8294

0.8286

0.8286

0.3491

0.3499

0.3509

 

 

Figure 5 shows the number of sunspots for a 50 year period of time.  This data is Series E from Box and Jenkins (1976).  The number of double up/down patterns is S=38.  This exceeds the upper critical value supper=23 indicating the data is autoregressive with positive correlation.  The a values from Equations 1-4 support this same conclusion.

 

Figure 5:  Wölfer Sunspot Data

 

Figure 5:  Wölfer Sunspot Data

 

Figure 6 shows the yields from 70 consecutive batches of a chemical process.  This data is Series F from Box and Jenkins (1976).  The number of double up/down patterns is S=9.  This is below the lower critical value slower=15 indicating the data is autoregressive with negative correlation.  The a values from Equations 1-4 support this same conclusion.

 

Figure 6:  Batch Yields

 

Figure 6:  Batch Yields

 

Figure 7 shows part strength readings taken once an hour over 52 consecutive hours.  The number of double up/down patterns is S=19.  This is between the lower critical value slower=10 and the upper critical value supperr=24 indicating the data is consistent with the mean-shift model.  The a values from Equations 1-4 support this same conclusion.

 

Figure 7:  Part Strength

 

Figure 7:  Part Strength

 

Handling Ties

When ties are possible, two new patterns can occur: the single tie and the double tie.  In this case, let Pi be defined in terms of Xi-2, Xi-1, Xi as follows:

 

           

 

Further, let S be defined as:

 

           

 

When Xi-2, Xi-1, Xi are identically distributed, E{Pi} = 1/3.  Again a test for autoregression can be constructed based on S averaging above or below 1/3 the number of patterns.  If the number of ties is small, Table 1 and Equations 1-4 may still be used.  But if ties are more common, Table 1 and Equations 1-4 can no longer be used because the ties reduce the variation of S.  Instead Equations 5-8 should be used:

 

                                                                                              (5)

 

                        where   ,
                                        and  

 

                                                                                                   (6)

 

                        where   ,
                       
                and  

 

                     (7)

 

                (8)

 

Estimates of Var{Pi}, Cov{Pi,Pi+1} and Cov{Pi,Pi+2} can be obtained from the data.  A special case with numerous ties is pass/fail data.  In this case:

 

           

 

Then:

 

           

 

This gives:

 

           

 

For pass/fail data, the variance and covariances of Pi are:

 

                                                                                                            (9)

  

                                                           (10)

  

                                          (11)

 

For pass/fail data, an estimate of p can be obtained from the data and substituted into Equations 9-11 to estimate Var{Pi}, Cov{Pi,Pi+1} and Cov{Pi,Pi+2}.  These estimates can then be plugged into Equations 5-8 to obtain approximate a levels.

 

Other Applications of Pi

An example of a data set with ties is shown in Figure 8.  197 chemical concentrations are shown.  This data is Series A from Box and Jenkins (1976).

 

Figure 8:  Chemical Concentration Data

 

Figure 8:  Chemical Concentration Data

 

From this data P3, ..., P197 can be calculated.  The Pi values are time ordered data that reacts to changes in the autoregressive behavior of the data.  A CUSUM chart of the Pi values is shown in Figure 9.  The sudden change in direction in the CUSUM chart indicates a sudden change in the autoregressive behavior of this data.

 

Figure 9:  CUSUM Chart of Pi for Chemical Concentration Data

 

Figure 9:  CUSUM Chart of Pi for Chemical Concentration Data

 

A change-point analysis was then performed on the Pi using Taylor (2000).  This software performs a bootstrap analysis on the CUSUM chart to obtain confidence levels and confidence intervals for the change.  The results of this analysis are shown in Figure 10.  It verifies a change occurred with 98% confidence.  The change is estimated to have occurred just prior to point 145.  With 95% confidence it occurred between points 83 and 179.

 

Figure 10:  Results of Change-Point Analysis of Pi for Chemical Concentration Data

 

Figure 10:  Results of Change-Point Analysis of Pi for Chemical Concentration Data

 

The average Pi before the change is 0.326, which is close to 1/3, indicating a lack of autoregressive behavior.  The average Pi following the change is 0.542 indicating autoregression with a positive correlation.  Separate tests for autoregression were performed on points 1-144 and points 1405-197.  The results are shown in Table 3.  These tests confirm that following the change, the data is autoregressive with positive correlation, while before the change, the data is consistent with the mean-shift model.

 

 

Table 3:  Pattern Test for Chemical Concentration Data

 

Points

n

S

alower
(Eq. 5)

alower
(Eq. 7)

aupper
(Eq. 6)

aupper
(Eq. 8)

1-144

144

47.33

0.4358

0.4442

0.8624

0.8631

145-197

53

26.67

1.0000

1.0000

0.0000

0.0000

 

Conclusion

The pattern test has proven to be useful for distinguishing between two very important models: the mean-shift model and the first order autoregressive model.  The pattern test can be used to detect a violation of the assumption of independent errors when control charting data and performing a change-point analysis.  The series Pi can also be used to detect changes in the autoregressive behavior of the data.  It provides a useful new tool for helping to analyze complicated time series data.

 

Appendix A

The distribution of the test statistic S will be derived assuming no mean shifts or ties.  Assume that a series of n data points X1, X2, ..., Xn has been collected in time order.  Let Pi be an indicator function of whether the double up/down pattern occurred for points Xi-2, Xi-1, Xi.  Further let:

 

           

 

The average and variance of S are:

 

                                                                                                                (12)

 

                                           (13)

 

Assuming no ties or mean shifts, the Pi are identically distributed with:

 

            E{Pi} = 1/3

            Var{Pi} = 2/9

            Cov{Pi,Pi+1}= -1/36

            Cov{Pi,Pi+2} = 1/180

 

All other covariances are zero.  The above moments were calculated by generating the 5!=120 possible patterns for 5 points.  Substituting the moments of Pi into Equations 12 and 13 gives the following moments for S:

 

                                                                                                                     (14)

 

                                                           (15)

 

When the mean shifts between time i-1 and i, the following values change:

 

            E{Pi} = E{Pi+1} = 1/2

            Var{Pi} = Var{Pi+1} = 1/4

            Cov{Pi-1,Pi}= 0

            Cov{Pi,Pi+1}= 0

            Cov{Pi+1,Pi+2}= 0

            Cov{Pi-2,Pi}= 0

            Cov{Pi-1,Pi+1}= 0

            Cov{Pi,Pi+2}= 0

            Cov{Pi+1,Pi+3}= 0

 

All other values are as before.  The above moments were calculated by generating the (4!)2= 576 possible patterns for 8 points where the first 4 points are all less than the last four points.  Let t be the number of shifts.  When t shifts occur:

 

                                                                           (16)

 

                            (17)

 

Shifts increase both E{S} and Var{S}.  To see what effect this has on the critical values, take E{S} ± 2 SD{S} as an approximate critical values.  Both upper and lower critical values increase as t increases.  Figure 11 shows the percentage increase in these approximate critical values as t ranges from 0% to 10% of n.  When t is 5% of n, i.e. a change occurs once every 20 points, the critical values increase only 5%.

 

Figure 11:  Approximate Percent Increase in Critical Values As t Increases

 

Figure 11:  Approximate Percent Increase in Critical Values As t Increases

 

Since the number of changes is not known, one cannot exactly determine the distribution of S.  However, by assuming an upper bound on the number of changes, one can bound its distribution.  It would seem reasonable to expect no more than one change per twenty points (t £ n/20).  A lower critical value is then calculated based on t=0 changes while the upper critical value is based on t=n/20 changes.

 

If the Pi where uncorrelated, S would follow the binomial distribution.  Since the correlations are small, one would expect the binomial distribution to provide a close approximation.  The binomial distribution B(x|nb,pb) has parameters nb and pb.  It has a mean of nbpb and variance nbpb(1-pb).  Setting E{S} = nbpb and Var{S} = nbpb(1-pb) and solving for nb and pb gives:

 

                                                                                      (18)

 

                                                                             (19)

 

Since nb may not be an integer as required by the binomial distribution, the more general incomplete Beta function, Ip(a,b), will be used.  Assuming t changes, the upper and lower significance levels for S can be approximated by:

 

                                                                       (20)

 

                                                                   (21)

 

Equation 1 was obtained from Equation 20 by substituting Equations 18 and 19 and setting t=0.  Equation 2 was derived from Equation 21 by substituting Equations 18 and 19 and setting t=n/20.  Equation 5 was obtained from Equation 20 by substituting Equations 13 and 16 and setting t=0.  Equation 6 was derived from Equation 21 by substituting Equations 13 and 16 and setting t=n/20.  Simulations indicate that Equations 20 and 21 are accurate to within 2% of the true value for 0.01£a£0.1 and n³10.

 

A second less accurate estimate can be obtained by approximating the distribution of S using the normal distribution with continuity correction.  This results in Equations 22 and 23.  Equation 3 was derived from Equation 22 by substituting Equations 16 and 17 and setting t=0.  Equation 4 was derived from Equation 23 by substituting Equations 16 and 17 and setting t=n/20.  Equation 7 was derived from Equation 22 by substituting Equations 13 and 16 and setting t=0.  Equation 8 was derived from Equation 23 by substituting Equations 13 and 16 and setting t=n/20.  These approximations should only be used when n³100.

 

                                                                                               (22)

 

                                                                                          (23)

 

References

Box, George E. P. and Jenkins, Gwilym (1976).  Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, California.

 

Taylor, Wayne (2000).  Change-Point Analyzer 2.0 software package, Taylor Enterprises, Libertyville, Illinois.  WEB: www.variation.com/cpa

 

 

Key Words:  Mean-Shift, Autoregression, Change-Point Analysis, Control Chart, Time Series

 

 

Citation:  Taylor, Wayne A. (2000), "A Pattern Test for Distinguishing Between Autoregressive and Mean-Shift Data," WEB: www.variation.com/cpa/tech/pattern.html.  


Copyright © 1997-2012 Taylor Enterprises, Inc.
Last modified: August 04, 2012