Introduction
Welcome to the website of the Sunbelt 2024 Workshop: Webscraping
Scientific Co-publishing Networks
In this workshop you will collect co-publishing networks. You will learn to webscrape
scientific metadata of scientific university and departmental websites
(via R packages like rvest and RSelenium), assign name-based gender and retrieve scholars’ publications (via e.g. Google
Scholar and OpenAlex). If time allows, we will based on this construct (longitudinal) co-publishing networks.
For each step, we provide clear (proof-of-principle) coding examples and
output data, ensuring you will not get stuck along the way. Depending on
your skills and progress, you might want to collect and describe your
own chosen universities or departments.
This workshop can be followed as a standalone workshop but in our second
workshop ‘Analyzing the Structure and Evolution of Scientific
Co-publishing Networks’ we will describe and analyze the same type of
webscraped co-publishing network data by employing
RSiena.
You will keep track of your work via a labjournal on GitHub.
Prerequisites:
- Intermediate familiarity with working in R and using R Markdown
- A beginner’s understanding of SNA
- Entry-level of git, and GitHub
Getting started
LabJournal
During the course, students will journal their work and assignments in
their custom lab journal. A template lab journal can be found on
GitHub. Here, you find how
to get started.
Program
The program of this workshop will be as follows:
Introduction
Time: 9:00am - 9:30am
- all
- Jochem, Rob, Dan
- Our goals for today
- Research Questions based on Scientific Co-publishing Networks
- Data requirements for RQs
Lab Journal
Time: 9:30am - 10:30am
BREAK
Time: 10:30am - 10:45am
Webscraping-scholars Part 1
Time: 10:45am - 12:30pm
BREAK
Time: 12:30pm - 1:45pm
genderize
Time: 1:45pm - 2:45pm
Webscraping - publications
Time: 2:45pm - 3:45pm
Errors, bugs and crashes
Time: 3:45pm - 4:30pm
- All
- Own work
- Questions
- Evaluation
LS0tDQp0aXRsZTogIlN1bmJlbHQgMjAyNCBXb3Jrc2hvcDogV2Vic2NyYXBpbmciDQphdXRob3I6IA0KICAtICcqKkFVVEhPUlM6KionDQogIC0gJ1tKb2NoZW0gVG9sc21hXShodHRwczovL3d3dy5qb2NoZW10b2xzbWEubmwpIC0gUmFkYm91ZCBVbml2ZXJzaXR5IC8gVW5pdmVyc2l0eSBvZiBHcm9uaW5nZW4sIHRoZSBOZXRoZXJsYW5kcycNCiAgLSAnW1JvYiBGcmFua2VuXShodHRwczovL3JvYmZyYW5rZW4ubmV0LykgLSBVdHJlY2h0IFVuaXZlcnNpdHksIHRoZSBOZXRoZXJsYW5kcycNCiAgLSAnW0RhbiBDb3dlbl0oaHR0cHM6Ly93d3cucnVnLm5sL3N0YWZmL2Quci5jb3dlbi8pIC0gVW5pdmVyc2l0eSBvZiBHcm9uaW5nZW4sIHRoZSBOZXRoZXJsYW5kcycNCiAgLSAnW0FubmUgTWFhaWtlIE11bGRlcnNdKGh0dHBzOi8vd3d3LnJ1Lm5sL2VuL3Blb3BsZS9tdWxkZXJzLWEpIC0gUmFkYm91ZCBVbml2ZXJzaXR5LCB0aGUgTmV0aGVybGFuZHMnDQogIC0gJ1tCYXMgSG9mc3RyYV0oaHR0cHM6Ly93d3cuYmFzaG9mc3RyYS5jb20vKSAtIFJhZGJvdWQgVW5pdmVyc2l0eSwgdGhlIE5ldGhlcmxhbmRzJw0KZGF0ZTogIkxhc3QgY29tcGlsZWQgb24gYHIgZm9ybWF0KFN5cy5EYXRlKCksIGZvcm1hdD0nJWQgJUIgJVknKWAiDQpiaWJsaW9ncmFwaHk6IHJlZmVyZW5jZXMuYmliDQpsaW5rLWNpdGF0aW9uczogeWVzDQplZGl0b3Jfb3B0aW9uczogDQogIG1hcmtkb3duOiANCiAgICB3cmFwOiA3Mg0KLS0tDQoNCmBgYHtyLCBnbG9iYWxzZXR0aW5ncywgZWNobz1GQUxTRSwgd2FybmluZz1GQUxTRSwgbWVzc2FnZT1GQUxTRSwgcmVzdWx0cz0naGlkZSd9DQpsaWJyYXJ5KGtuaXRyKQ0Kb3B0c19jaHVuayRzZXQodGlkeS5vcHRzPWxpc3Qod2lkdGguY3V0b2ZmPTEwMCksdGlkeT1UUlVFLCB3YXJuaW5nID0gRkFMU0UsIG1lc3NhZ2UgPSBGQUxTRSxjb21tZW50ID0gIiM+IiwgY2FjaGU9VFJVRSwgY2xhc3Muc291cmNlPWMoInRlc3QiKSwgY2xhc3Mub3V0cHV0PWMoInRlc3QyIiksIGNhY2hlLmxhenkgPSBGQUxTRSkNCm9wdGlvbnMod2lkdGggPSAxMDApIA0KcmdsOjpzZXR1cEtuaXRyKCkNCg0KY29sb3JpemUgPC0gZnVuY3Rpb24oeCwgY29sb3IpIHtzcHJpbnRmKCI8c3BhbiBzdHlsZT0nY29sb3I6ICVzOyc+JXM8L3NwYW4+IiwgY29sb3IsIHgpIH0NCg0KYGBgDQoNCmBgYHtyIGtsaXBweSwgZWNobz1GQUxTRSwgaW5jbHVkZT1UUlVFLCBtZXNzYWdlPUZBTFNFfQ0KIyBpbnN0YWxsLnBhY2thZ2VzKCJyZW1vdGVzIikNCiNyZW1vdGVzOjppbnN0YWxsX2dpdGh1Yigicmxlc3VyL2tsaXBweSIpDQprbGlwcHk6OmtsaXBweShwb3NpdGlvbiA9IGMoJ3RvcCcsICdyaWdodCcpKQ0KI2tsaXBweTo6a2xpcHB5KGNvbG9yID0gJ2RhcmtyZWQnKQ0KI2tsaXBweTo6a2xpcHB5KHRvb2x0aXBfbWVzc2FnZSA9ICdDbGljayB0byBjb3B5JywgdG9vbHRpcF9zdWNjZXNzID0gJ0RvbmUnKQ0KYGBgDQoNCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KDQojIEludHJvZHVjdGlvbg0KDQpgYGB7PWh0bWx9DQo8c2NyaXB0Pg0KZnVuY3Rpb24gbXlGdW5jdGlvbigpIHsNCg0KICAgICAgICAgICAgdmFyIGJ0biA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCJteUJ1dHRvbiIpOw0KICAgICAgICAgICAgLy90byBtYWtlIGl0IGZhbmNpZXINCiAgICAgICAgICAgIGlmIChidG4udmFsdWUgPT0gIkNsaWNrIHRvIGhpZGUgY29kZSIpIHsNCiAgICAgICAgICAgICAgICBidG4udmFsdWUgPSAiUmVhZCBtb3JlIjsNCiAgICAgICAgICAgICAgICBidG4uaW5uZXJIVE1MID0gIlJlYWQgbW9yZSI7DQogICAgICAgICAgICB9DQogICAgICAgICAgICBlbHNlIHsNCiAgICAgICAgICAgICAgICBidG4udmFsdWUgPSAiQ2xpY2sgdG8gaGlkZSBjb2RlIjsNCiAgICAgICAgICAgICAgICBidG4uaW5uZXJIVE1MID0gIlJlYWQgbGVzcyI7DQogICAgICAgICAgICB9DQogICAgICAgICAgICAvL3RoaXMgaXMgd2hhdCB5b3UncmUgbG9va2luZyBmb3INCiAgICAgICAgICAgIHZhciB4ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoIm15RElWIik7DQogICAgICAgICAgICBpZiAoeC5zdHlsZS5kaXNwbGF5ID09PSAibm9uZSIpIHsNCiAgICAgICAgICAgICAgICB4LnN0eWxlLmRpc3BsYXkgPSAiYmxvY2siOw0KICAgICAgICAgICAgfSBlbHNlIHsNCiAgICAgICAgICAgICAgICB4LnN0eWxlLmRpc3BsYXkgPSAibm9uZSI7DQogICAgICAgICAgICB9DQogICAgICAgIH0NCiAgICAgICAgICANCjwvc2NyaXB0Pg0KYGBgDQpXZWxjb21lIHRvIHRoZSB3ZWJzaXRlIG9mIHRoZSBTdW5iZWx0IDIwMjQgV29ya3Nob3A6ICoqV2Vic2NyYXBpbmcNClNjaWVudGlmaWMgQ28tcHVibGlzaGluZyBOZXR3b3JrcyoqDQoNCjxicj4NCg0KSW4gdGhpcyB3b3Jrc2hvcCB5b3Ugd2lsbCBjb2xsZWN0IGNvLXB1Ymxpc2hpbmcgbmV0d29ya3MuIFlvdSB3aWxsIGxlYXJuIHRvIHdlYnNjcmFwZQ0Kc2NpZW50aWZpYyBtZXRhZGF0YSBvZiBzY2llbnRpZmljIHVuaXZlcnNpdHkgYW5kIGRlcGFydG1lbnRhbCB3ZWJzaXRlcw0KKHZpYSBSIHBhY2thZ2VzIGxpa2UgcnZlc3QgYW5kIFJTZWxlbml1bSksIGFzc2lnbiBuYW1lLWJhc2VkIGdlbmRlciBhbmQgcmV0cmlldmUgc2Nob2xhcnMnIHB1YmxpY2F0aW9ucyAodmlhIGUuZy4gR29vZ2xlDQpTY2hvbGFyIGFuZCBPcGVuQWxleCkuIElmIHRpbWUgYWxsb3dzLCB3ZSB3aWxsIGJhc2VkIG9uIHRoaXMgY29uc3RydWN0IChsb25naXR1ZGluYWwpIGNvLXB1Ymxpc2hpbmcgbmV0d29ya3MuDQoNCkZvciBlYWNoIHN0ZXAsIHdlIHByb3ZpZGUgY2xlYXIgKHByb29mLW9mLXByaW5jaXBsZSkgY29kaW5nIGV4YW1wbGVzIGFuZA0Kb3V0cHV0IGRhdGEsIGVuc3VyaW5nIHlvdSB3aWxsIG5vdCBnZXQgc3R1Y2sgYWxvbmcgdGhlIHdheS4gRGVwZW5kaW5nIG9uDQp5b3VyIHNraWxscyBhbmQgcHJvZ3Jlc3MsIHlvdSBtaWdodCB3YW50IHRvIGNvbGxlY3QgYW5kIGRlc2NyaWJlIHlvdXINCm93biBjaG9zZW4gdW5pdmVyc2l0aWVzIG9yIGRlcGFydG1lbnRzLg0KDQpUaGlzIHdvcmtzaG9wIGNhbiBiZSBmb2xsb3dlZCBhcyBhIHN0YW5kYWxvbmUgd29ya3Nob3AgYnV0IGluIG91ciBzZWNvbmQNCndvcmtzaG9wICdBbmFseXppbmcgdGhlIFN0cnVjdHVyZSBhbmQgRXZvbHV0aW9uIG9mIFNjaWVudGlmaWMNCkNvLXB1Ymxpc2hpbmcgTmV0d29ya3MnIHdlIHdpbGwgZGVzY3JpYmUgYW5kIGFuYWx5emUgdGhlIHNhbWUgdHlwZSBvZg0Kd2Vic2NyYXBlZCBjby1wdWJsaXNoaW5nIG5ldHdvcmsgZGF0YSBieSBlbXBsb3lpbmcNClJTaWVuYS4NCg0KWW91IHdpbGwga2VlcCB0cmFjayBvZiB5b3VyIHdvcmsgdmlhIGEgbGFiam91cm5hbCBvbiBHaXRIdWIuIA0KDQpQcmVyZXF1aXNpdGVzOg0KDQotICAgSW50ZXJtZWRpYXRlIGZhbWlsaWFyaXR5IHdpdGggd29ya2luZyBpbiBSIGFuZCB1c2luZyBSIE1hcmtkb3duXA0KLSAgIEEgYmVnaW5uZXIncyB1bmRlcnN0YW5kaW5nIG9mIFNOQVwNCi0gICBFbnRyeS1sZXZlbCBvZiBnaXQsIGFuZCBHaXRIdWINCg0KYGBgez1odG1sfQ0KPCEtLQ0KDQo8YnV0dG9uIGNsYXNzPSJidXR0b24xIiBvbmNsaWNrPSJteUZ1bmN0aW9uKCkiIGlkPSJteUJ1dHRvbiIgdmFsdWU9IkNsaWNrIFRvIE9wZW4gSW5zdHJ1Y3Rpb25zIj4NCg0KUmVhZCBtb3JlDQoNCjwvYnV0dG9uPg0KDQo6OjogeyNteURJViBzdHlsZT0iZGlzcGxheTpub25lOyJ9DQo8YnI+DQoNCm1vcmUgdGV4dCBoZXJlDQoNCjxicj4NCjo6Og0KDQotLS0+DQpgYGANCg0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQoNCiMgR2V0dGluZyBzdGFydGVkDQoNCiMjIExhYkpvdXJuYWwNCg0KRHVyaW5nIHRoZSBjb3Vyc2UsIHN0dWRlbnRzIHdpbGwgam91cm5hbCB0aGVpciB3b3JrIGFuZCBhc3NpZ25tZW50cyBpbg0KdGhlaXIgY3VzdG9tIGxhYiBqb3VybmFsLiBBIHRlbXBsYXRlIGxhYiBqb3VybmFsIGNhbiBiZSBmb3VuZCBvbg0KW0dpdEh1Yl0oaHR0cHM6Ly9naXRodWIuY29tL3JvYmZyYW5rZW4vTGFiSm91cm5hbCkuIEhlcmUsIHlvdSBmaW5kIGhvdw0KdG8gZ2V0IHN0YXJ0ZWQuDQoNCjwhLS0gIyMgRGlzY29yZCAtLT4NCg0KPCEtLSBQbGVhc2Ugam9pbiB0aGUgW2Rpc2NvcmQgY2hhbm5lbF0oaHR0cHM6Ly9kaXNjb3JkLmdnL0FDY1lVOG51KS4gVXNlIHRoZSAtLT4NCg0KPCEtLSBjaGFubmVsIHRvIGNoYXQsIGFzayBxdWVzdGlvbnMgYW5kIHNoYXJlIChzbWFsbCkgZmlsZXMuIC0tPg0KDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCg0KIyBQcm9ncmFtDQoNClRoZSBwcm9ncmFtIG9mIHRoaXMgd29ya3Nob3Agd2lsbCBiZSBhcyBmb2xsb3dzOg0KDQpgciBjb2xvcml6ZSgiSW50cm9kdWN0aW9uIiwgInJlZCIpYFwNCipUaW1lOiA5OjAwYW0gLSA5OjMwYW0qDQoNCi0gICAqKmFsbCoqDQogICAgLSAgIHJvdW5kIG9mIGludHJvZHVjdGlvbnMNCi0gICAqKkpvY2hlbSwgUm9iLCBEYW4qKg0KICAgIC0gICBPdXIgZ29hbHMgZm9yIHRvZGF5DQogICAgLSAgIFJlc2VhcmNoIFF1ZXN0aW9ucyBiYXNlZCBvbiBTY2llbnRpZmljIENvLXB1Ymxpc2hpbmcgTmV0d29ya3MNCiAgICAtICAgRGF0YSByZXF1aXJlbWVudHMgZm9yIFJRc1wNCg0KYHIgY29sb3JpemUoIkxhYiBKb3VybmFsIiwgInJlZCIpYFwNCipUaW1lOiA5OjMwYW0gLSAxMDozMGFtKg0KDQotICAgKipSb2IqKg0KICAgIC0gICBbTGFiIEpvdXJuYWxdKGh0dHBzOi8vZ2l0aHViLmNvbS9yb2JmcmFua2VuL0xhYkpvdXJuYWwpDQogICAgLSAgIFNvbWUgR2l0IHBvaW50ZXJzDQogICAgLSAgIFNvbWUgUk1hcmtkb3duIHBvaW50ZXJzDQoNCmByIGNvbG9yaXplKCJCUkVBSyIsICJyZWQiKWBcDQoqVGltZTogMTA6MzBhbSAtIDEwOjQ1YW0qDQoNCmByIGNvbG9yaXplKCJXZWJzY3JhcGluZy1zY2hvbGFycyBQYXJ0IDEiLCAicmVkIilgXA0KKlRpbWU6IDEwOjQ1YW0gLSAxMjozMHBtKg0KDQotICAgKipKb2NoZW0qKg0KICAgIC0gICBXZWJzY3JhcGluZyAtIFtUdXRvcmlhbCBJXSgxMHNjaG9sYXJzLmh0bWwpDQotICAgKipEYW4qKg0KICAgIC0gICBXZWJzY3JhcGluZyAtIFtUdXRvcmlhbCBJSV0oMTFzY2hvbGFycy5odG1sKQ0KLSAgICoqUm9iKioNCiAgICAtICAgV2Vic2NyYXBpbmcgLSBbVHV0b3JpYWwgSUlJXSgxMnNjaG9sYXJzLmh0bWwpDQoNCmByIGNvbG9yaXplKCJCUkVBSyIsICJyZWQiKWBcDQoqVGltZTogMTI6MzBwbSAtIDE6NDVwbSoNCg0KYHIgY29sb3JpemUoImdlbmRlcml6ZSIsICJyZWQiKWBcDQoqVGltZTogMTo0NXBtIC0gMjo0NXBtKg0KDQotICAgKipKb2NoZW0qKg0KICAgIC0gICBuYW1lcyBjdXJhdGluZyAtIFt0dXRvcmlhbF0oMjBuYW1lcy5odG1sKQ0KICAgIC0gICBuYW1lcyB0byBnZW5kZXIgLSBbdHV0b3JpYWxdKDMwZ2VuZGVyaXplUi5odG1sKQ0KDQpgciBjb2xvcml6ZSgiV2Vic2NyYXBpbmcgLSBwdWJsaWNhdGlvbnMiLCAicmVkIilgXA0KKlRpbWU6IDI6NDVwbSAtIDM6NDVwbSoNCg0KLSAgICoqSm9jaGVtKioNCiAgICAtICAgU2NyYXBpbmcgcHVibGljYXRpb25zIC0gW3R1dG9yaWFsXSg0MHB1YmxpY2F0aW9ucy5odG1sKQ0KDQpgciBjb2xvcml6ZSgiRXJyb3JzLCBidWdzIGFuZCBjcmFzaGVzIiwgInJlZCIpYFwNCipUaW1lOiAzOjQ1cG0gLSA0OjMwcG0qDQoNCi0gICAqKkFsbCoqDQogICAgLSAgIE93biB3b3JrDQogICAgLSAgIFF1ZXN0aW9ucw0KICAgIC0gICBFdmFsdWF0aW9uDQo=
Copyright © 2024 Jochem Tolsma