This website converted the following original .R scripts into .rmd files.
Please visit GitHub for the latest .R files.
Specific questions with respect to the .rmd files can be addressed to: Jochem Tolsma.
For questions on RSiena please visit the designated GitHub page.
A quick version of the model fitting without comments is given at the end of this script.
Let us start by loading the data
library(RSiena)
friend.data.w1 <- s501
friend.data.w2 <- s502
friend.data.w3 <- s503
drink <- s50a
smoke <- s50s
friendship <- sienaDependent(array(c(friend.data.w1, friend.data.w2, friend.data.w3), dim = c(50, 50,
3)))
smoke1 <- coCovar(smoke[, 1])
alcohol <- varCovar(drink)
mydata <- sienaDataCreate(friendship, smoke1, alcohol)
# and request
mydata
# to see what you have produced.
#> Dependent variables: friendship
#> Number of observations: 3
#>
#> Nodeset Actors
#> Number of nodes 50
#>
#> Dependent variable friendship
#> Type oneMode
#> Observations 3
#> Nodeset Actors
#> Densities 0.046 0.047 0.05
#>
#> Constant covariates: smoke1
#> Changing covariates: alcohol
Parameters of the model are estimated by the function
siena07()
. This requires the data specification; the
effects specification; and a number of parameters, or settings, for the
estimation algorithm.
The latter are contained in an object created by the function
sienaAlgorithmCreate()
. You can look at the help provided
by ?sienaAlgorithmCreate
to find out about options that you
may use here; for beginning users, only the two options mentioned below
are relevant.
Output will be written to a file with name projname.txt, where projname is whatever name is given; the default (used if no name is given) is Siena. This file will be written to your current directory. New estimation runs will append to it. A new call to print01Report will overwrite it!
myalgorithm <- sienaAlgorithmCreate(projname = "s50_3")
#> If you use this algorithm object, siena07 will create/use an output file s50_3.txt .
Let us first redefine the model, to obtain a simpler specification that will serve as an illustration here.
myeff <- getEffects(mydata)
myeff <- includeEffects(myeff, transTrip, cycle3)
myeff <- includeEffects(myeff, egoX, altX, egoXaltX, interaction1 = "alcohol")
myeff <- includeEffects(myeff, simX, interaction1 = "smoke1")
myeff
#> effectName include fix test initialValue parm
#> 1 transitive triplets TRUE FALSE FALSE 0 0
#> 2 3-cycles TRUE FALSE FALSE 0 0
#> effectName include fix test initialValue parm
#> 1 alcohol alter TRUE FALSE FALSE 0 0
#> 2 alcohol ego TRUE FALSE FALSE 0 0
#> 3 alcohol ego x alcohol alter TRUE FALSE FALSE 0 0
#> effectName include fix test initialValue parm
#> 1 smoke1 similarity TRUE FALSE FALSE 0 0
#> effectName include fix test initialValue parm
#> 1 constant friendship rate (period 1) TRUE FALSE FALSE 4.69604 0
#> 2 constant friendship rate (period 2) TRUE FALSE FALSE 4.32885 0
#> 3 outdegree (density) TRUE FALSE FALSE -1.46770 0
#> 4 reciprocity TRUE FALSE FALSE 0.00000 0
#> 5 transitive triplets TRUE FALSE FALSE 0.00000 0
#> 6 3-cycles TRUE FALSE FALSE 0.00000 0
#> 7 smoke1 similarity TRUE FALSE FALSE 0.00000 0
#> 8 alcohol alter TRUE FALSE FALSE 0.00000 0
#> 9 alcohol ego TRUE FALSE FALSE 0.00000 0
#> 10 alcohol ego x alcohol alter TRUE FALSE FALSE 0.00000 0
The function siena07()
actually fits the specified model
to the data. If you wish the pretty picture of Siena on the screen as
information about the progress of the algorithm, type:
ans <- siena07(myalgorithm, data = mydata, effects = myeff)
(ans for “answer”).
If however you do not want the pretty picture, or if this leads to
difficulties (which may happen e.g. on a Mac), then type
ans <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE)
and intermediate information will be written to the console.
Function siena07()
produces a so-called sienaFit object,
here called ans
; and it fills in a few things in the
sienaEffects object myeff
, if this is the first use of
myeff
in a siena07 call. By using various different effects
objects, i.e., with different names, you can switch between
specifications.
The batch = FALSE
parameters will give a graphical user
interface being opened which reports on the progress of the estimation
algorithm; verbose = TRUE
leads to extensive diagnostic
information being sent to the console during the estimation, and results
after the estimation (these results are also copied to the output file
projname.txt, see above); while batch=TRUE
gives only a
limited amount of printout sent to the console during the estimation
(which is seen when clicking in the console, or more immediately if the
Buffered Output is deselected in the Misc menu) which monitors the
progress of the estimation algorithm in a different way.
The call of siena07 leads to output in the file
s50_3.txt (or more generally projname.txt, where
projname is the name given in sienaAlgorithmCreate()
) and
to the creation of the object which here is called ans (for
“answer”).
To use multiple processors, e.g., if you wish to use 2 processes, request:
ans <- siena07( myalgorithm, data = mydata, effects = myeff, nbrNodes = 2, useCluster = TRUE)
.
Adjust the nbrNodes to the number that you wish to use. If you wish to work on with other programs while running siena07, it is advisable to use one node less than the number of available processors. If you wish to use other machines as well, see the more detailed instructions below. You will then need to use the clusterString argument as well.
For more advanced use, it can be helpful to have access to the
networks simulated in the so-called third phase of the estimation
algorithm. These networks can be used, e.g., for checking goodness of
fit. This can be achieved by using the parameter
returnDeps=TRUE
. The fitted object ans
will
then have a component named “sims” which contains a list (each
iteration) of lists (each data object) of lists (each dependent network
or behavior variable) of edgelists for networks or vectors for behavior
variables. See the manual for further explanation.
The file “s50_3.txt” will contain the results of the estimation. It
is contained in the current directory (getwd()
). This file
can be read by any text editor. A summary of the results is obtained on
the screen by:
ans
and a larger summary by…
summary(ans)
Depending on the random seed and the model specification, the results could be something like the following.
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 6.6750 ( 1.1897 )
#> 0.2 Rate parameter period 2 5.1921 ( 0.8558 )
#>
#> Other parameters:
#> 1. eval outdegree (density) -2.7333 ( 0.1305 ) -0.0212
#> 2. eval reciprocity 2.4437 ( 0.2310 ) -0.0208
#> 3. eval transitive triplets 0.6614 ( 0.1495 ) 0.0253
#> 4. eval 3-cycles -0.1110 ( 0.3053 ) 0.0208
#> 5. eval smoke1 similarity 0.2667 ( 0.1960 ) 0.0301
#> 6. eval alcohol alter -0.0277 ( 0.0661 ) 0.0325
#> 7. eval alcohol ego 0.0425 ( 0.0790 ) 0.0586
#> 8. eval alcohol ego x alcohol alter 0.1276 ( 0.0533 ) -0.0078
#>
#> Overall maximum convergence ratio: 0.1326
#>
#>
#> Total of 2315 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.017 -0.020 -0.010 0.014 -0.003 0.000 0.000 0.000
#> -0.680 0.053 0.014 -0.035 0.000 0.000 0.000 -0.002
#> -0.491 0.418 0.022 -0.041 -0.001 0.000 -0.002 -0.001
#> 0.343 -0.497 -0.890 0.093 0.001 -0.001 0.004 0.002
#> -0.119 0.010 -0.020 0.014 0.038 0.003 0.004 0.000
#> -0.002 -0.025 0.003 -0.047 0.209 0.004 -0.002 0.000
#> 0.026 -0.001 -0.209 0.181 0.249 -0.408 0.006 0.000
#> -0.056 -0.142 -0.128 0.109 -0.001 0.015 -0.069 0.003
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 278.700 220.775 469.028 146.529 24.579 17.119 27.944 149.380
#> 127.185 138.885 253.605 84.518 10.893 11.617 11.939 76.299
#> 343.766 312.470 1183.846 376.491 27.263 77.978 84.579 208.177
#> 154.974 154.885 547.270 184.084 13.968 28.358 26.380 83.932
#> 21.931 17.690 35.523 10.906 37.163 -50.547 -47.707 2.093
#> 27.465 30.505 91.528 31.205 -51.280 380.434 253.012 73.797
#> 23.851 15.869 70.514 20.088 -47.004 237.976 322.445 85.576
#> 144.467 134.532 303.711 98.943 2.944 53.536 54.717 555.211
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 430.012 373.066 994.309 316.886 43.209 30.585 52.125 238.885
#> 0.919 383.249 990.555 323.503 37.674 38.611 48.612 224.967
#> 0.776 0.819 3818.162 1234.370 108.715 122.533 146.554 582.517
#> 0.758 0.820 0.991 406.130 34.492 41.391 47.090 188.821
#> 0.313 0.289 0.264 0.257 44.338 -72.785 -67.939 3.045
#> 0.067 0.090 0.091 0.094 -0.500 478.026 383.878 128.769
#> 0.119 0.117 0.112 0.110 -0.481 0.828 449.216 131.681
#> 0.417 0.416 0.341 0.339 0.017 0.213 0.225 764.715
The results can also be viewed externally in the output file s50_3.txt. It is advisable that you have a look at all three reports and understand how information is organized in each of them.
To understand the table above, note that the “convergence t-ratio” is
the t-ratio for convergence checking, not the t statistic for testing
the significance of this effect! See Section 6.2 of the manual to
understand this better.
In the external output file, these are called “t-ratios for deviations
from targets”. The rule of thumb is that all t-ratios for convergence
should ideally be less than 0.1 in absolute value, and the “Overall
maximum convergence ratio” should be less than 0.25; this signifies good
convergence of the algorithm.
In the example here, this is the case. If this would not be the case,
the best thing to do would be to continue the estimation, using the
estimates produced here, and contained in ans, as the new initial
values. This is explained below. Because the estimation algorithm is
based on random simulations of the network evolution, there always will
be small differences between different runs of the algorithm. To obtain
“publication grade” estimates, where this variability is minimized,
choose the value of parameter n3 in sienaAlgorithmCreate()
(“Number of iterations in phase 3”) larger than the default value of
1000; e.g., n3=5000
.
With function siena07 we made ans as the object containing all the results of the estimation. For example,
ans$theta
# contains the vector of parameter estimates,
ans$se
# contains the standard errors, while
ans$covtheta
# contains the covariance matrix of the estimates.
#> [1] -2.73331050 2.44373875 0.66140102 -0.11102284 0.26670485 -0.02769309 0.04250091 0.12758722
#> [1] 0.13052299 0.23100999 0.14952777 0.30525026 0.19597244 0.06611808 0.07897151 0.05327322
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1.703625e-02 -0.0204940690 -9.574301e-03 0.0136709178 -3.042977e-03 -1.796669e-05
#> [2,] -2.049407e-02 0.0533656163 1.444350e-02 -0.0350621662 4.607441e-04 -3.800081e-04
#> [3,] -9.574301e-03 0.0144434970 2.235855e-02 -0.0406209377 -5.872923e-04 2.825742e-05
#> [4,] 1.367092e-02 -0.0350621662 -4.062094e-02 0.0931777227 8.507029e-04 -9.507407e-04
#> [5,] -3.042977e-03 0.0004607441 -5.872923e-04 0.0008507029 3.840520e-02 2.713486e-03
#> [6,] -1.796669e-05 -0.0003800081 2.825742e-05 -0.0009507407 2.713486e-03 4.371601e-03
#> [7,] 2.699005e-04 -0.0000165309 -2.465539e-03 0.0043661173 3.856261e-03 -2.131715e-03
#> [8,] -3.910310e-04 -0.0017461387 -1.016507e-03 0.0017727855 -1.556483e-05 5.442013e-05
#> [,7] [,8]
#> [1,] 0.0002699005 -3.910310e-04
#> [2,] -0.0000165309 -1.746139e-03
#> [3,] -0.0024655394 -1.016507e-03
#> [4,] 0.0043661173 1.772786e-03
#> [5,] 0.0038562612 -1.556483e-05
#> [6,] -0.0021317154 5.442013e-05
#> [7,] 0.0062364995 -2.899014e-04
#> [8,] -0.0002899014 2.838036e-03
There are several “methods” available for viewing the object
containing the results of the estimation. Above we already mentioned the
commands ans
and summary(ans)
The command
siena.table( ans )
will produce in your working directory a
table formatted for inclusion in a LaTeX document. The command
siena.table(ans, type="html" )
produces a table formatted
in html, which can be included, e.g., in a Word document. See
?print.sienaFit
for further information, e.g., about the
use of the xtable package for RSiena; if you use xtable, see the set of
vignettes for xtable, which
gives more options.
If the estimation algorithm has not produced good estimates (it ‘has
not converged well’), as will be indicated by some of the t-ratios for
convergence being larger than 0.1 or the overall maximum convergence
ratio being more than 0.25, (the precise values of these thresholds may
be taken with a gain of salt) the best thing to do is continuing the
estimation, using the estimates produced here, and contained in
ans
, as the new initial values.
This is done by the option prevAns
(‘previous ans’) as
in:
ans <- siena07(myalgorithm, data = mydata, effects = myeff, prevAns = ans)
the parameter estimates in ans
then are extracted and
used in the new estimation; moreover, Phase 1 will be omitted from the
algorithm, as derivatives and covariance matrix are used from the
previous run. This should be used only if the model specification in
myeff
has not changed, and if the provisional parameter
estimates obtained in ans
are reasonable; if they are not
reasonable, make a fresh estimation without the prevAns
parameter.
In Section 6.3.1 of the manual you can read more about the initial values used for the estimation algorithm; but this rarely is of any concern. Sections 6.3-6.5 of the manual give further help.
Three types of tests are available in SIENA:
Parameters can be restricted by putting TRUE in the include, fix and test columns of the effects object. For example, to request a score test for the indegree popularity effect, the commands can be as follows.
myeff <- setEffect(myeff, inPopSqrt, fix = TRUE, test = TRUE, initialValue = 0)
#> effectName include fix test initialValue parm
#> 1 indegree - popularity (sqrt) TRUE TRUE TRUE 0 0
ans <- siena07(myalgorithm, data = mydata, effects = myeff)
summary(ans)
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 6.5682 ( 1.1425 )
#> 0.2 Rate parameter period 2 5.2280 ( 0.8588 )
#>
#> Other parameters:
#> 1. eval outdegree (density) -2.7254 ( 0.1239 ) -0.0128
#> 2. eval reciprocity 2.4363 ( 0.2241 ) 0.0064
#> 3. eval transitive triplets 0.6558 ( 0.1423 ) -0.0083
#> 4. eval 3-cycles -0.1089 ( 0.2859 ) -0.0132
#> 5. eval indegree - popularity (sqrt) 0.0000 ( NA ) 0.3320
#> 6. eval smoke1 similarity 0.2650 ( 0.1980 ) 0.0216
#> 7. eval alcohol alter -0.0188 ( 0.0718 ) -0.0141
#> 8. eval alcohol ego 0.0341 ( 0.0766 ) -0.0476
#> 9. eval alcohol ego x alcohol alter 0.1276 ( 0.0509 ) 0.0360
#>
#> Overall maximum convergence ratio: 0.1121
#>
#>
#> Score test for 1 parameter:
#> chi-squared = 5.66, p = 0.0174.
#>
#> Total of 2341 iteration steps.
#>
#> Generalised score test <c>
#>
#> Testing the goodness-of-fit of the model restricted by
#> (1) eval: indegree - popularity (sqrt) = 0.0000
#> _________________________________________________
#>
#> c = 5.6596 d.f. = 1 p-value = 0.0174
#>
#> one-sided (normal variate): -2.3790
#> _________________________________________________
#>
#> One-step estimates:
#>
#> eval: outdegree (density) -1.9253
#> eval: reciprocity 2.3598
#> eval: transitive triplets 0.7850
#> eval: 3-cycles -0.2235
#> eval: indegree - popularity (sqrt) -0.4590
#> eval: smoke1 similarity 0.2676
#> eval: alcohol alter -0.0066
#> eval: alcohol ego 0.0218
#> eval: alcohol ego x alcohol alter 0.1225
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.015 -0.018 -0.008 0.009 NA -0.003 0.001 0.000 -0.001
#> -0.650 0.050 0.010 -0.028 NA -0.001 -0.001 0.000 -0.001
#> -0.442 0.328 0.020 -0.035 NA -0.002 0.000 -0.002 -0.001
#> 0.254 -0.436 -0.849 0.082 NA 0.002 -0.001 0.003 0.001
#> NA NA NA NA NA NA NA NA NA
#> -0.140 -0.016 -0.057 0.041 NA 0.039 0.003 0.004 -0.001
#> 0.073 -0.079 -0.038 -0.045 NA 0.210 0.005 -0.002 0.000
#> -0.016 0.018 -0.155 0.137 NA 0.256 -0.353 0.006 -0.001
#> -0.097 -0.082 -0.077 0.054 NA -0.091 -0.004 -0.157 0.003
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 308.287 245.678 535.529 168.619 758.406 31.880 37.571 40.067 180.994
#> 129.076 140.464 255.122 85.458 317.616 11.938 21.377 18.486 83.135
#> 350.674 314.345 1121.101 357.889 947.033 30.636 86.337 87.203 245.639
#> 162.046 156.448 521.617 176.549 433.303 14.995 34.620 30.291 103.693
#> 571.923 453.525 1073.185 335.210 1464.019 58.808 78.619 77.552 339.152
#> 29.999 24.686 49.483 15.301 74.697 37.291 -48.460 -44.162 9.917
#> 42.619 51.639 183.669 62.044 127.648 -40.720 350.526 228.950 70.379
#> 37.310 29.644 113.443 36.555 92.340 -41.736 237.670 327.621 114.653
#> 171.421 154.463 375.706 119.281 425.719 18.385 86.059 90.319 619.460
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 445.873 373.055 1028.959 331.413 1146.850 41.412 78.900 83.962 286.136
#> 0.918 370.563 962.361 316.789 969.445 33.707 77.868 74.830 259.958
#> 0.806 0.827 3658.525 1190.377 2865.116 90.922 255.883 244.901 759.228
#> 0.791 0.829 0.992 393.899 921.486 29.367 84.462 80.276 242.537
#> 0.983 0.911 0.857 0.840 3053.639 107.859 212.413 210.935 735.093
#> 0.299 0.267 0.229 0.226 0.298 42.904 -69.485 -64.332 13.142
#> 0.165 0.179 0.187 0.188 0.170 -0.468 513.410 414.441 165.087
#> 0.182 0.178 0.185 0.185 0.175 -0.449 0.836 478.279 178.849
#> 0.464 0.462 0.429 0.418 0.455 0.069 0.249 0.280 854.148
To see the results, including those for the score test. You can also simply request:
score.Test(ans)
#> Tested effects:
#> indegree - popularity (sqrt)
#> chi-squared = 5.66, d.f. = 1; one-sided Z = -2.38; two-sided p = 0.017.
see ?score.Test
for further explanation.
An application of the score test is given for the special case of parameter heterogeneity by Lospinoso et al. (2010) and implemented in RSiena. To apply the test to the results obtained above, request, e.g.,
tt2 <- sienaTimeTest(ans)
tt2
#> Joint significance test of time heterogeneity:
#> chi-squared = 6.55, d.f. = 8, p= 0.5857,
#> where H0: The following parameters are zero:
#> (1) (*)Dummy2:outdegree (density)
#> (2) (*)Dummy2:reciprocity
#> (3) (*)Dummy2:transitive triplets
#> (4) (*)Dummy2:3-cycles
#> (5) (*)Dummy2:smoke1 similarity
#> (6) (*)Dummy2:alcohol alter
#> (7) (*)Dummy2:alcohol ego
#> (8) (*)Dummy2:alcohol ego x alcohol alter
If you wish more information, see summary(tt2)
.
### Step 1: define data
friend.data.w1 <- s501
friend.data.w2 <- s502
friend.data.w3 <- s503
drink <- s50a
smoke <- s50s
# define RSiena data structures
friendship <- sienaDependent(array(c(friend.data.w1, friend.data.w2, friend.data.w3), dim = c(50, 50,
3)))
smoke1 <- coCovar(smoke[, 1])
alcohol <- varCovar(drink)
mydata <- sienaDataCreate(friendship, smoke1, alcohol)
### Step 2: create effects structure
myeff <- getEffects(mydata)
### Step 3: get initial description
print01Report(mydata, modelname = "s50_3_init")
### Step4: specify model
myeff <- includeEffects(myeff, transTrip, cycle3)
myeff <- includeEffects(myeff, egoX, altX, egoXaltX, interaction1 = "alcohol")
myeff <- includeEffects(myeff, simX, interaction1 = "smoke1")
### Step5 estimate
myAlgorithm <- sienaAlgorithmCreate(projname = "s50_3")
(ans <- siena07(myAlgorithm, data = mydata, effects = myeff))
# (the outer parentheses lead to printing the obtained result on the screen) if necessary, estimate
# further
(ans <- siena07(myAlgorithm, data = mydata, effects = myeff, prevAns = ans))
Next: SienaBehaviour FOR MODELING NETWORKS AND BEHAVIOUR BY RSIENA
Copyright © 2020 Jochem Tolsma