This is a cross post from stackexchange, see: http://stats.stackexchange.com/quest...es-stata-and-r

I have a question about what the difference is in how Stata and R compute ANOVAs. I have run exactly the same ANOVA in both softwares, but curiously get a different F-statistics for one of the predictors. I´m not too familiar with Stata, but as far as I understood it, I do a Type 2 SS ANOVA for both.

To understand my output, this is my model:

Outcome variable is a continuous variable called vertrauen (=trust)

predictor 1 is a 2-level factor called trustee in R and Goodguy in Stata

predictor 2 is also a 2 level factor called Group in R and uw in Stata.

This is the R output:

This is the Stata output:

As you can see, the F-statistics for the Group (UW) main effect and for the Group (UW) x trustee (Goodguy) interaction are the same, but for the trustee (Goodguy) main effect they differ. In R it´s almost twice as high as in Stata. I tried to change the order of the predictor and the reference levels, but that didn´t change my R output.

Does anyone know what causes the difference in the F-statistic here? I´m really puzzled about it. I expected it to be the same.

Here is the Stata output without the interaction:

And here is the R output without the interaction:

It´s the same, thus it has to do something with how the two softwares incorporate the interaction term.

I also tried to manually compute the interaction term and found something interesting:

Here is the R output:

And here is the Stata output:

Thus it seems that there is a difference in how R/ Stata computes the interactions. The R output of the manually computed interaction matches the automatically computed interaction output in Stata.

And finally the descriptives from R:

and from Stata:

I have a question about what the difference is in how Stata and R compute ANOVAs. I have run exactly the same ANOVA in both softwares, but curiously get a different F-statistics for one of the predictors. I´m not too familiar with Stata, but as far as I understood it, I do a Type 2 SS ANOVA for both.

To understand my output, this is my model:

Outcome variable is a continuous variable called vertrauen (=trust)

predictor 1 is a 2-level factor called trustee in R and Goodguy in Stata

predictor 2 is also a 2 level factor called Group in R and uw in Stata.

This is the R output:

Code:

> m2-lm(vertrauen~trustee*Group,data=RTG.UWD.short.50) > Anova(m2,type="2") > Anova Table (Type II tests) >Response: vertrauen > Sum Sq Df F value Pr(>F) >trustee 2.4928 1 24.5497 1.367e-05 *** >Group 0.0030 1 0.0292 0.8651 >trustee:Group 0.1137 1 1.1200 0.2963 >Residuals 4.0617 40 > >Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >

Code:

. anova vertrauen uw Goodguy uw#Goodguy Number of obs = 44 R-squared = 0.3912 Root MSE = .318658 Adj R-squared = 0.3455 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 2.6095358 3 .86984526 8.57 0.0002 | uw | .00296733 1 .00296733 0.03 0.8651 Goodguy | 1.2981586 1 1.2981586 12.78 0.0009 uw#Goodguy | .11373073 1 .11373073 1.12 0.2963 | Residual | 4.0617062 40 .10154266 -----------+---------------------------------------------------- Total | 6.671242 43 .15514516

Does anyone know what causes the difference in the F-statistic here? I´m really puzzled about it. I expected it to be the same.

Here is the Stata output without the interaction:

Code:

. anova vertrauen uw Goodguy Number of obs = 44 R-squared = 0.3741 Root MSE = .319124 Adj R-squared = 0.3436 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 2.495805 2 1.2479025 12.25 0.0001 | uw | .00296733 1 .00296733 0.03 0.8653 Goodguy | 2.4928377 1 2.4928377 24.48 0.0000 | Residual | 4.175437 41 .10183993 -----------+---------------------------------------------------- Total | 6.671242 43 .15514516

Code:

> m2.4-lm(vertrauen~trustee+Group,data=RTG.UWD.short.50) > Anova(m2.4) Anova Table (Type II tests) Response: vertrauen Sum Sq Df F value Pr(>F) trustee 2.4928 1 24.4780 1.328e-05 *** Group 0.0030 1 0.0291 0.8653 Residuals 4.1754 41 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >

I also tried to manually compute the interaction term and found something interesting:

Here is the R output:

Code:

RTG.UWD.short.50$interaction-as.numeric(RTG.UWD.short.50$trustee)*as.numeric(RTG.UWD.short.50$Group) > m2.7 Anova(m2.7) Anova Table (Type II tests) Response: vertrauen Sum Sq Df F value Pr(>F) trustee 1.2982 1 12.7844 0.0009316 *** Group 0.0030 1 0.0292 0.8651282 interaction 0.1137 1 1.1200 0.2962617 Residuals 4.0617 40 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >

Code:

. gen interaction=uw*Goodguy . anova vertrauen uw Goodguy interaction Number of obs = 44 R-squared = 0.3912 Root MSE = .318658 Adj R-squared = 0.3455 Source | Partial SS df MS F Prob>F ------------+---------------------------------------------------- Model | 2.6095358 3 .86984526 8.57 0.0002 | uw | .0399785 1 .0399785 0.39 0.5339 Goodguy | 2.3984067 1 2.3984067 23.62 0.0000 interaction | .11373073 1 .11373073 1.12 0.2963 | Residual | 4.0617062 40 .10154266 ------------+---------------------------------------------------- Total | 6.671242 43 .15514516

And finally the descriptives from R:

Code:

> describe(RTG.UWD.short.50$vertrauen) RTG.UWD.short.50$vertrauen n missing unique Info Mean 44 0 43 1 0.5046 > describe(RTG.UWD.short.50$Group) RTG.UWD.short.50$Group n missing unique 44 0 2 1 (34, 77%), 2 (10, 23%) > describe(RTG.UWD.short.50$trustee) RTG.UWD.short.50$trustee n missing unique 44 0 2 bad (22, 50%), good (22, 50%)

Code:

. sum vertrauen uw Goodguy Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- vertrauen | 44 .5045969 .3938847 .000998 1 uw | 44 .2272727 .4239151 0 1 Goodguy | 44 .5 .5057805 0 1

## Comment