Start with clean workspace
rm(list = ls())
Our goal is to make a network based on the publications we scraped for our scholars.
fpackage.check
: Check if packages are installed (and install if not) in R (source).fsave
: Save to processed data in repositoryfload
: To load the files back after an fsave
fshowdf
: To print objects (tibbles / data.frame) nicely on screen in .rmdfpackage.check <- function(packages) {
lapply(packages, FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
})
}
fsave <- function(x, file = NULL, location = "./data/processed/") {
ifelse(!dir.exists("data"), dir.create("data"), FALSE)
ifelse(!dir.exists("data/processed"), dir.create("data/processed"), FALSE)
if (is.null(file))
file = deparse(substitute(x))
datename <- substr(gsub("[:-]", "", Sys.time()), 1, 8)
totalname <- paste(location, datename, file, ".rda", sep = "")
save(x, file = totalname) #need to fix if file is reloaded as input name, not as x.
}
fload <- function(filename) {
load(filename)
get(ls()[ls() != "filename"])
}
fshowdf <- function(x, ...) {
knitr::kable(x, digits = 2, "html", ...) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>%
kableExtra::scroll_box(width = "100%", height = "300px")
}
stringdist
: string stuff.stringi
: string stuffpackages = c("stringdist", "stringi", "tidyverse")
fpackage.check(packages)
Hopefully you managed to construct the datasets yourself. If not. Here are the downloads:
Please use this one:
Download 20230621df_complete.rda.
Download 20230621list_publications_jt.rda.
Save file in correct directory: ‘./data/processed’.
As a brief update:
Thus based on these egonets we have to construct a ‘complete network’.
Important question for complete network data: What is our sampling unit?
df <- fload("./data/processed/20230621df_complete.rda")
publications <- fload("./data/processed/20230621list_publications_jt.rda")
publications <- publications %>%
bind_rows() %>%
distinct(title, .keep_all = TRUE)
We have to make a couple of decisions:
# select scholars
df %>%
filter(affil1 == "RU" | affil2 == "RU") %>%
filter(discipline == "sociology") -> df_sel
fshowdf(df_sel)
id | uni | discipline | year | name | affil1 | affil2 | position | np | lastname | firstname | ini | gender | gs_id | gs_name | gs_homepage | gs_field | gs_i10_index | gs_h_index | gs_total_cites | gs_affiliation |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
471b42 | RU | sociology | 2022 | Ronald Batenburg | RU | full_prof | batenburg | ronald | male | UK7nVSEAAAAJ | Ronald Batenburg | https://www.nivel.nl/nl/ronald-batenburg | Gezondheidszorg | 94 | 33 | 4306 | Programmaleider NIVEL en bijzonder hoogleraar Radboud Universiteit Nijmegen | |||
31144d | RU | sociology | 2022 | Katia Begall | RU | assistant_prof | begall | katia | female | e7zfTqMAAAAJ | Katia Begall | NA | family sociology | 9 | 9 | 1268 | Radboud University Nijmegen | |||
888093 | RU | sociology | 2022 | Hidde Bekhuis | RU | other | bekhuis | hidde | male | Q4saWX8AAAAJ | Hidde Bekhuis | NA | Sociologie | 9 | 9 | 458 | Post Doc Sociology, Radboud University Nijmegen | |||
c47638 | RU | sociology | 2022 | Lonneke van den Berg | RU | postdoc | van den | berg | lonneke | female | vzBNQ1kAAAAJ | Lonneke van den Berg | https://lonnekevandenberg.com/ | Family sociology | 3 | 5 | 79 | Netherlands Interdisciplinary Demographic Institute (NIDI) | ||
f99757 | RU | sociology | 2022 | Lieselotte Blommaert | RU | assistant_prof | blommaert | lieselotte | female | RG54uasAAAAJ | Lieselotte Blommaert | http://www.ru.nl/english/people/blommaert-e/ | Labor market | 11 | 11 | 479 | Sociology/Social Cultural Research, Radboud University, Nijmegen, the | |||
7e85d6 | RU | sociology | 2022 | Rob Eisinga | RU | assistant_prof | eisinga | rob | male | GDHdsXAAAAAJ | Rob Eisinga | http://robeisinga.ruhosting.nl/ | research methods | 83 | 35 | 6401 | Professor social science research methods, Radboud University Nijmegen | |||
5f9d45 | RU | sociology | 2022 | Maurice Gesthuizen | RU | assistant_prof | gesthuizen | maurice | male | n6hiblQAAAAJ | Maurice Gesthuizen | NA | Sociology | 45 | 28 | 2972 | Assistant professor | |||
3e9d6d | RU | sociology | 2022 | Nella Geurts | RU | postdoc | geurts | nella | female | VCTvbTkAAAAJ | Nella Geurts | NA | Sociology | 3 | 4 | 90 | Department of Sociology, Radboud University | |||
7f8ff4 | RU | sociology | 2022 | Saskia Glas | RU | assistant_prof | glas | saskia | female | ZMc0j2YAAAAJ | Saskia Glas | NA | Gender | 6 | 7 | 223 | Assistant Professor (tenured), Radboud University | |||
6075fe | RU | sociology | 2022 | Margriet van Hek | RU | assistant_prof | van | hek | margriet | female | ZvLlx2EAAAAJ | Margriet van Hek | NA | Sociology | 10 | 10 | 487 | Radboud University | ||
2d4a95 | RU | sociology | 2022 | Remco Hoekman | RU | other | hoekman | remco | male | LsMimOEAAAAJ | Remco Hoekman | https://www.mulierinstituut.nl/over-mulier/medewerkers/remco-hoekman/ | sport sociology | 17 | 13 | 918 | Director, Mulier Institute / Senior researcher, Radboud University | |||
61b8a6 | RU | sociology | 2022 | Bas Hofstra | RU | assistant_prof | hofstra | bas | NA | Nx7pDywAAAAJ | Bas Hofstra | http://www.bashofstra.com/ | Sociology of science | 10 | 10 | 966 | Assistant Professor, Radboud University | |||
07b8a5 | RU | sociology | 2022 | Judith Koops | RU | postdoc | koops | judith | female | kLiOlQoAAAAJ | Judith C. Koops | NA | family demography | 4 | 6 | 110 | Dr. Family Demographer, Radboud University | |||
1bf89e | RU | sociology | 2022 | Gerbert Kraaykamp | RU | full_prof | kraaykamp | gerbert | male | l8aM4jAAAAAJ | Gerbert Kraaykamp | https://www.ru.nl/english/people/kraaykamp-g/ | social stratification | 107 | 51 | 9279 | Professor of Sociology, Radboud Universiteit Nijmegen | |||
f618fc | RU | sociology | 2022 | Roza Meuleman | RU | assistant_prof | meuleman | roza | female | iKs_5WkAAAAJ | Roza Meuleman | NA | Cultural Sociology | 10 | 10 | 308 | Assistant Professor - Sociology - Radboud University Nijmegen | |||
dad54c | RU | sociology | 2022 | Michael Savelkoul | RU | assistant_prof | savelkoul | michael | male | _f3krXUAAAAJ | Michael Savelkoul | NA | NA | 9 | 9 | 722 | Assistant Professor - Sociology, Radboud University Nijmegen, the Netherlands | |||
a576d6 | RU | sociology | 2022 | Peer Scheepers | RU | full_prof | scheepers | peer | male | hPeXxvEAAAAJ | peer scheepers | NA | sociale wetenschappen | 190 | 64 | 16727 | hoogleraar methodologie, faculteit der sociale wetenschappen radboud universiteit | |||
5da2d4 | RU | sociology | 2022 | Niels Spierings | RU | associate_prof | spierings | niels | male | cy3Ye6sAAAAJ | Niels Spierings | https://www.ru.nl/english/people/spierings-c/ | politics | 46 | 27 | 2673 | Associate Professor of Sociology, Radboud University | |||
07231a | RU/RUG | sociology | 2022 | Jochem Tolsma | RU | RUG | full_prof | tolsma | jochem | male | Iu23-90AAAAJ | Jochem Tolsma | http://www.jochemtolsma.nl/ | social divisions between groups | 38 | 25 | 2888 | Professor, Radboud University Nijmegen / University of Groningen | ||
b65f6f | RU | sociology | 2022 | Ellen Verbakel | RU | full_prof | verbakel | ellen | female | w2McVJAAAAAJ | Ellen Verbakel | http://www.ellenverbakel.nl/ | Family | 40 | 26 | 2242 | Professor of Sociology, Department of Sociology, Radboud University Nijmegen | |||
5e2898 | RU | sociology | 2022 | Mark Visser | RU | assistant_prof | visser | mark | male | ItITloQAAAAJ | Mark Visser | https://www.researchgate.net/profile/Mark_Visser | life course | 10 | 10 | 580 | Assistant Professor, Radboud University | |||
9ab0fc | RU | sociology | 2022 | Maarten Wolbers | RU | full_prof | wolbers | maarten | male | TqKrXnMAAAAJ | Maarten HJ Wolbers | http://www.socsci.ru.nl/maartenw/ | school-to-work transitions | 65 | 32 | 4380 | Professor of Sociology, Radboud University, Nijmegen | |||
6cca60 | RU | sociology | 2022 | Carlijn Bussemakers | RU | phd | bussemakers | carlijn | female | bDPtkIoAAAAJ | Carlijn Bussemakers | NA | NA | 3 | 3 | 102 | Unknown affiliation | |||
4ed5be | RU | sociology | 2022 | Rob Franken | RU | phd | franken | rob | male | NA | NA | NA | NA | NA | NA | NA | ||||
f92019 | RU | sociology | 2022 | Mustafa Firat | RU | phd | firat | mustafa | male | rrh0V7IAAAAJ | NA | NA | NA | NA | NA | NA | NA | |||
f782c2 | RU | sociology | 2022 | Aysegül Güneyli | RU | phd | guneyli | aysegul | female | NA | NA | NA | NA | NA | NA | NA | ||||
513b49 | RU | sociology | 2022 | Inge Hendriks | RU | phd | hendriks | inge | female | NA | NA | NA | NA | NA | NA | NA | ||||
e37333 | RU | sociology | 2022 | Thijmen Jeroense | RU | phd | jeroense | thijmen | male | izq-KNUAAAAJ | Thijmen Jeroense | https://www.ru.nl/personen/jeroense-t/ | Political Participation | 1 | 2 | 15 | PhD candidate, Radboud University Nijmegen | |||
b672bd | RU | sociology | 2022 | Rachel Kollar | RU | phd | kollar | rachel | female | b96_CCUAAAAJ | Rachel Kollar | NA | Migration | 2 | 2 | 49 | PhD Candidate Radboud University, Nijmegen | |||
ac8ab3 | RU | sociology | 2022 | Nik Linders | RU | phd | linders | nik | male | NA | NA | NA | NA | NA | NA | NA | ||||
b6cedc | RU | sociology | 2022 | Renae Loh | RU | phd | loh | renae | female | tFaMPOQAAAAJ | Renae Sze Ming Loh | http://renaeloh.com/ | sociology | 2 | 4 | 182 | PhD candidate, Radboud University | |||
e2ebbb | RU | sociology | 2022 | Maikel Meijeren | RU | phd | meijeren | maikel | male | NA | NA | NA | NA | NA | NA | NA | ||||
a527b8 | RU | sociology | 2022 | Carly van Mensvoort | RU | phd | van | mensvoort | carly | female | z6iMs-UAAAAJ | Carly van Mensvoort | https://www.ru.nl/english/people/mensvoort-c-van/ | Gender inequality | 2 | 3 | 48 | Radboud University | ||
1e61cb | RU | sociology | 2022 | Anne Maaike Mulders | RU | phd | mulders | anne | female | NA | NA | NA | NA | NA | NA | NA | ||||
554613 | RU | sociology | 2022 | Katrin Müller | RU | phd | muller | katrin | female | NA | NA | NA | NA | NA | NA | NA | ||||
7c14e8 | RU | sociology | 2022 | Klara Raiber | RU | phd | raiber | klara | female | xE65HUcAAAAJ | Klara Raiber | https://www.ru.nl/english/people/raiber-k/ | informal care | 1 | 4 | 44 | PhD candidate, Radboud University Nijmegen | |||
809f71 | RU | sociology | 2022 | Marlou Ramaekers | RU | phd | ramaekers | marlou | female | fp99JAQAAAAJ | Marlou Ramaekers | NA | Sociology | 0 | 2 | 7 | Data Manager and Researcher | |||
c1722b | RU | sociology | 2022 | Sara Wiertsema | RU | phd | wiertsema | sara | female | wgQQD6kAAAAJ | Sara Wiertsema | NA | Sociology | 0 | 1 | 2 | PhD candidate, Radboud University | |||
a00209 | RU | sociology | 2022 | Janos Betko | RU | phd | betko | janos | male | Cvdrl6AAAAAJ | Janos Betko | NA | welfare | 0 | 5 | 37 | Radboud University | |||
8e7b43 | RU | sociology | 2022 | Jansje van Middendorp | RU | phd | van | middendorp | jansje | female | gs0li6MAAAAJ | Jansje van Middendorp | NA | vrijwilligers | 0 | 1 | 8 | Buitenpromovendus Radboud Universiteit | ||
91db4e | RU | sociology | 2022 | Elize Vis | RU | phd | vis | elize | female | NA | NA | NA | NA | NA | NA | NA | ||||
bb8ffa | RU | sociology | 2022 | Tijmen Weber | RU | phd | weber | tijmen | male | KfLALRIAAAAJ | Tijmen Weber | NA | Sociology | 2 | 2 | 60 | Lecturer Statistics and Research, HAN University of Applied Sciences | |||
05a9d1 | RU | sociology | 2022 | Elissa El Khawli | RU | other | el | khawli | elissa | female | 2wDZZbsAAAAJ | Elissa El Khawli | NA | NA | 2 | 3 | 27 | Assistant Professor, Erasmus University Rotterdam | ||
683c64 | RU | sociology | 2022 | Carl Sterkens | RU | other | sterkens | carl | male | NA | NA | NA | NA | NA | NA | NA | ||||
e5d4c4 | RU | sociology | 2022 | Paul Vermeer | RU | other | vermeer | paul | male | NA | NA | NA | NA | NA | NA | NA | ||||
0e20be | RU | sociology | 2022 | Malou Grubben | RU | other | grubben | malou | female | NA | NA | NA | NA | NA | NA | NA | ||||
1c7266 | RUG/RU | sociology | 2022 | Marcel Lubbers | RUG | RU | full_prof | lubbers | marcel | male | 078qsZoAAAAJ | Marcel Lubbers | https://www.uu.nl/staff/MLubbers | Nationalism | 91 | 43 | 8010 | Professor in Interdisciplinary Social Science | ERCOMER | Utrecht University |
We will use an adjacency matrix to store our network ties: the first author is sending a tie to other authors.
We make the assumption that the composition of the network is stable!
networkw1 <- matrix(0, nrow = nrow(df_sel), ncol = nrow(df_sel))
networkw2 <- matrix(0, nrow = nrow(df_sel), ncol = nrow(df_sel))
# wave1
publications %>%
filter(gs_id %in% df_sel$gs_id) %>%
filter(year %in% c(2018, 2019, 2020)) -> pub_w1
nrow(pub_w1)
# wave2
publications %>%
filter(gs_id %in% df_sel$gs_id) %>%
filter(year %in% c(2021, 2022, 2023)) -> pub_w2
nrow(pub_w2)
#> [1] 324
#> [1] 289
cleaned a bit
# wave1
pub_w1 <- str_split(pub_w1$author, ",") %>%
# lowercase
lapply(tolower) %>%
# Removing diacritics
lapply(stri_trans_general, id = "latin-ascii") %>%
# only last name
lapply(word, start = -1, sep = " ") %>%
# only last last name
lapply(word, start = -1, sep = "-")
# let us remove all publications with only one author
remove <- which(sapply(pub_w1, FUN = function(x) length(x) == 1) == TRUE)
pub_w1 <- pub_w1[-remove]
length(pub_w1)
#> [1] 278
# wave2
pub_w2 <- str_split(pub_w2$author, ",") %>%
# lowercase
lapply(tolower) %>%
# Removing diacritics
lapply(stri_trans_general, id = "latin-ascii") %>%
# only last name
lapply(word, start = -1, sep = " ") %>%
# only last last name
lapply(word, start = -1, sep = "-")
# let us remove all publications with only one author
remove <- which(sapply(pub_w2, FUN = function(x) length(x) == 1) == TRUE)
pub_w2 <- pub_w2[-remove]
length(pub_w2)
#> [1] 250
Wouldn’t it be great if we could not write a function that would all this in one go?
Okay
f_pubnets <- function(df_scholars = df, list_publications = publications, discip = "sociology", affiliation = "RU",
waves = list(wave1 = c(2018, 2019, 2020), wave2 = c(2021, 2022, 2023))) {
publications <- list_publications %>%
bind_rows() %>%
distinct(title, .keep_all = TRUE)
df_scholars %>%
filter(affil1 == affiliation | affil2 == affiliation) %>%
filter(discipline == discip) -> df_sel
networklist <- list()
for (wave in 1:length(waves)) {
networklist[[wave]] <- matrix(0, nrow = nrow(df_sel), ncol = nrow(df_sel))
}
publicationlist <- list()
for (wave in 1:length(waves)) {
publicationlist[[wave]] <- publications %>%
filter(gs_id %in% df_sel$gs_id) %>%
filter(year %in% waves[[wave]]) %>%
select(author) %>%
lapply(str_split, pattern = ",")
}
publicationlist2 <- list()
for (wave in 1:length(waves)) {
publicationlist2[[wave]] <- publicationlist[[wave]]$author %>%
# lowercase
lapply(tolower) %>%
# Removing diacritics
lapply(stri_trans_general, id = "latin-ascii") %>%
# only last name
lapply(word, start = -1, sep = " ") %>%
# only last last name
lapply(word, start = -1, sep = "-")
}
for (wave in 1:length(waves)) {
# let us remove all publications with only one author
remove <- which(sapply(publicationlist2[[wave]], FUN = function(x) length(x) == 1) == TRUE)
publicationlist2[[wave]] <- publicationlist2[[wave]][-remove]
}
for (wave in 1:length(waves)) {
pubs <- publicationlist2[[wave]]
for (ego in 1:nrow(df_sel)) {
# which ego?
lastname_ego <- df_sel$lastname[ego]
# for all publications
for (pub in 1:length(pubs)) {
# only continue if ego is author of pub
if (lastname_ego %in% pubs[[pub]]) {
aut_pot <- which.max(pubs[[pub]] %in% lastname_ego)
# only continue if ego is first author of pub
if (aut_pot == 1) {
# check all alters/co-authors
for (alter in 1:nrow(df_sel)) {
# which alter
lastname_alter <- df_sel$lastname[alter]
if (lastname_alter %in% pubs[[pub]]) {
networklist[[wave]][ego, alter] <- networklist[[wave]][ego, alter] + 1
}
}
}
}
}
}
}
return(list(df = df_sel, network = networklist))
}
outputUU_sociology <- f_pubnets(affiliation = "UU")
str(outputUU_sociology)
#> List of 2
#> $ df :'data.frame': 62 obs. of 21 variables:
#> ..$ id : chr [1:62] "f84cb8" "ad5eeb" "5a5e57" "7c3ed2" ...
#> ..$ uni : chr [1:62] "UU" "UU" "UU" "UU" ...
#> ..$ discipline : chr [1:62] "sociology" "sociology" "sociology" "sociology" ...
#> ..$ year : num [1:62] 2022 2022 2022 2022 2022 ...
#> ..$ name : chr [1:62] "Ece Arat" "Marcel Van Assen" "Weverthon Barbosa Machado" "Vardan Barsegyan" ...
#> ..$ affil1 : chr [1:62] "UU" "UU" "UU" "UU" ...
#> ..$ affil2 : chr [1:62] "" "" "" "" ...
#> ..$ position : chr [1:62] "phd" "full_prof" "other" "other" ...
#> ..$ np : chr [1:62] "" "van" "" "" ...
#> ..$ lastname : chr [1:62] "arat" "assen" "machado" "barsegyan" ...
#> ..$ firstname : chr [1:62] "ece" "marcel" "weverthon" "vardan" ...
#> ..$ ini : chr [1:62] "" "" "" "" ...
#> ..$ gender : chr [1:62] "female" "male" "male" "male" ...
#> ..$ gs_id : chr [1:62] "_MW2-cgAAAAJ&hl" "uwPhxMcAAAAJ" "miJiVQcAAAAJ" "yh5pBBoAAAAJ" ...
#> ..$ gs_name : chr [1:62] "Ece Arat" "Marcel A.L.M. van Assen" "Weverthon Machado" "Vardan Barsegyan" ...
#> ..$ gs_homepage : chr [1:62] "https://www.uu.nl/medewerkers/EArat" NA "http://weverthon.com/" "https://www.wodc.nl/over-het-wodc/organisatie/medewerkers/vardan-barsegyan" ...
#> ..$ gs_field : chr [1:62] "Sociology" NA "Sociology" "Political Inequality" ...
#> ..$ gs_i10_index : num [1:62] 0 107 1 3 6 NA NA NA 76 0 ...
#> ..$ gs_h_index : num [1:62] 2 49 3 5 6 NA NA NA 35 2 ...
#> ..$ gs_total_cites: num [1:62] 10 18428 33 66 229 ...
#> ..$ gs_affiliation: chr [1:62] "Utrecht University" "Tilburg University, the Netherlands" "Postdoctoral researcher, Utrecht University" "The Research and Documentation Centre (WODC, The Hague, the Netherlands)" ...
#> $ network:List of 2
#> ..$ : num [1:62, 1:62] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ : num [1:62, 1:62] 3 0 0 0 0 0 0 0 0 0 ...
head(outputUU_sociology[[1]])
#> id uni discipline year name affil1 affil2 position np lastname
#> 1 f84cb8 UU sociology 2022 Ece Arat UU phd arat
#> 2 ad5eeb UU sociology 2022 Marcel Van Assen UU full_prof van assen
#> 3 5a5e57 UU sociology 2022 Weverthon Barbosa Machado UU other machado
#> 4 7c3ed2 UU sociology 2022 Vardan Barsegyan UU other barsegyan
#> 5 944260 UU sociology 2022 Rutger Blom UU other blom
#> 6 6c23fc UU sociology 2022 Lute Bos UU other bos
#> firstname ini gender gs_id gs_name
#> 1 ece female _MW2-cgAAAAJ&hl Ece Arat
#> 2 marcel male uwPhxMcAAAAJ Marcel A.L.M. van Assen
#> 3 weverthon male miJiVQcAAAAJ Weverthon Machado
#> 4 vardan male yh5pBBoAAAAJ Vardan Barsegyan
#> 5 rutger male RNktFGMAAAAJ Rutger Blom
#> 6 lute male <NA>
#> gs_homepage gs_field
#> 1 https://www.uu.nl/medewerkers/EArat Sociology
#> 2 <NA> <NA>
#> 3 http://weverthon.com/ Sociology
#> 4 https://www.wodc.nl/over-het-wodc/organisatie/medewerkers/vardan-barsegyan Political Inequality
#> 5 <NA> <NA>
#> 6 <NA> <NA>
#> gs_i10_index gs_h_index gs_total_cites
#> 1 0 2 10
#> 2 107 49 18428
#> 3 1 3 33
#> 4 3 5 66
#> 5 6 6 229
#> 6 NA NA NA
#> gs_affiliation
#> 1 Utrecht University
#> 2 Tilburg University, the Netherlands
#> 3 Postdoctoral researcher, Utrecht University
#> 4 The Research and Documentation Centre (WODC, The Hague, the Netherlands)
#> 5 Postdoctoral researcher, Utrecht University
#> 6 <NA>
# head(outputUU_sociology[[2]])
It seems to work.
Assignment 4:
4a: Have a look at the networks. Could you describe the network? Dyad_census, triad_census? etc. 4b: What do you think/make of the evolution dynamics? Is there stability/change? Could you calculate the Jaccard Index?
4c: Define the relation/tie differently (see above for suggestions) and adapt thef_pubnets
function so it can spit out different type of networks.
Copyright © 2024 Jochem Tolsma