1 Introduction

Welcome to the website of the Sunbelt 2024 Workshop: Webscraping Scientific Co-publishing Networks


In this workshop you will collect co-publishing networks. You will learn to webscrape scientific metadata of scientific university and departmental websites (via R packages like rvest and RSelenium), assign name-based gender and retrieve scholars’ publications (via e.g. Google Scholar and OpenAlex). If time allows, we will based on this construct (longitudinal) co-publishing networks.

For each step, we provide clear (proof-of-principle) coding examples and output data, ensuring you will not get stuck along the way. Depending on your skills and progress, you might want to collect and describe your own chosen universities or departments.

This workshop can be followed as a standalone workshop but in our second workshop ‘Analyzing the Structure and Evolution of Scientific Co-publishing Networks’ we will describe and analyze the same type of webscraped co-publishing network data by employing RSiena.

You will keep track of your work via a labjournal on GitHub.

Prerequisites:

  • Intermediate familiarity with working in R and using R Markdown
  • A beginner’s understanding of SNA
  • Entry-level of git, and GitHub

2 Getting started

2.1 LabJournal

During the course, students will journal their work and assignments in their custom lab journal. A template lab journal can be found on GitHub. Here, you find how to get started.


3 Program

The program of this workshop will be as follows:

Introduction
Time: 9:00am - 9:30am

  • all
    • round of introductions
  • Jochem, Rob, Dan
    • Our goals for today
    • Research Questions based on Scientific Co-publishing Networks
    • Data requirements for RQs

Lab Journal
Time: 9:30am - 10:30am

  • Rob
    • Lab Journal
    • Some Git pointers
    • Some RMarkdown pointers

BREAK
Time: 10:30am - 10:45am

Webscraping-scholars Part 1
Time: 10:45am - 12:30pm

BREAK
Time: 12:30pm - 1:45pm

genderize
Time: 1:45pm - 2:45pm

Webscraping - publications
Time: 2:45pm - 3:45pm

  • Jochem

Errors, bugs and crashes
Time: 3:45pm - 4:30pm

  • All
    • Own work
    • Questions
    • Evaluation
LS0tDQp0aXRsZTogIlN1bmJlbHQgMjAyNCBXb3Jrc2hvcDogV2Vic2NyYXBpbmciDQphdXRob3I6IA0KICAtICcqKkFVVEhPUlM6KionDQogIC0gJ1tKb2NoZW0gVG9sc21hXShodHRwczovL3d3dy5qb2NoZW10b2xzbWEubmwpIC0gUmFkYm91ZCBVbml2ZXJzaXR5IC8gVW5pdmVyc2l0eSBvZiBHcm9uaW5nZW4sIHRoZSBOZXRoZXJsYW5kcycNCiAgLSAnW1JvYiBGcmFua2VuXShodHRwczovL3JvYmZyYW5rZW4ubmV0LykgLSBVdHJlY2h0IFVuaXZlcnNpdHksIHRoZSBOZXRoZXJsYW5kcycNCiAgLSAnW0RhbiBDb3dlbl0oaHR0cHM6Ly93d3cucnVnLm5sL3N0YWZmL2Quci5jb3dlbi8pIC0gVW5pdmVyc2l0eSBvZiBHcm9uaW5nZW4sIHRoZSBOZXRoZXJsYW5kcycNCiAgLSAnW0FubmUgTWFhaWtlIE11bGRlcnNdKGh0dHBzOi8vd3d3LnJ1Lm5sL2VuL3Blb3BsZS9tdWxkZXJzLWEpIC0gUmFkYm91ZCBVbml2ZXJzaXR5LCB0aGUgTmV0aGVybGFuZHMnDQogIC0gJ1tCYXMgSG9mc3RyYV0oaHR0cHM6Ly93d3cuYmFzaG9mc3RyYS5jb20vKSAtIFJhZGJvdWQgVW5pdmVyc2l0eSwgdGhlIE5ldGhlcmxhbmRzJw0KZGF0ZTogIkxhc3QgY29tcGlsZWQgb24gYHIgZm9ybWF0KFN5cy5EYXRlKCksIGZvcm1hdD0nJWQgJUIgJVknKWAiDQpiaWJsaW9ncmFwaHk6IHJlZmVyZW5jZXMuYmliDQpsaW5rLWNpdGF0aW9uczogeWVzDQplZGl0b3Jfb3B0aW9uczogDQogIG1hcmtkb3duOiANCiAgICB3cmFwOiA3Mg0KLS0tDQoNCmBgYHtyLCBnbG9iYWxzZXR0aW5ncywgZWNobz1GQUxTRSwgd2FybmluZz1GQUxTRSwgbWVzc2FnZT1GQUxTRSwgcmVzdWx0cz0naGlkZSd9DQpsaWJyYXJ5KGtuaXRyKQ0Kb3B0c19jaHVuayRzZXQodGlkeS5vcHRzPWxpc3Qod2lkdGguY3V0b2ZmPTEwMCksdGlkeT1UUlVFLCB3YXJuaW5nID0gRkFMU0UsIG1lc3NhZ2UgPSBGQUxTRSxjb21tZW50ID0gIiM+IiwgY2FjaGU9VFJVRSwgY2xhc3Muc291cmNlPWMoInRlc3QiKSwgY2xhc3Mub3V0cHV0PWMoInRlc3QyIiksIGNhY2hlLmxhenkgPSBGQUxTRSkNCm9wdGlvbnMod2lkdGggPSAxMDApIA0KcmdsOjpzZXR1cEtuaXRyKCkNCg0KY29sb3JpemUgPC0gZnVuY3Rpb24oeCwgY29sb3IpIHtzcHJpbnRmKCI8c3BhbiBzdHlsZT0nY29sb3I6ICVzOyc+JXM8L3NwYW4+IiwgY29sb3IsIHgpIH0NCg0KYGBgDQoNCmBgYHtyIGtsaXBweSwgZWNobz1GQUxTRSwgaW5jbHVkZT1UUlVFLCBtZXNzYWdlPUZBTFNFfQ0KIyBpbnN0YWxsLnBhY2thZ2VzKCJyZW1vdGVzIikNCiNyZW1vdGVzOjppbnN0YWxsX2dpdGh1Yigicmxlc3VyL2tsaXBweSIpDQprbGlwcHk6OmtsaXBweShwb3NpdGlvbiA9IGMoJ3RvcCcsICdyaWdodCcpKQ0KI2tsaXBweTo6a2xpcHB5KGNvbG9yID0gJ2RhcmtyZWQnKQ0KI2tsaXBweTo6a2xpcHB5KHRvb2x0aXBfbWVzc2FnZSA9ICdDbGljayB0byBjb3B5JywgdG9vbHRpcF9zdWNjZXNzID0gJ0RvbmUnKQ0KYGBgDQoNCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KDQojIEludHJvZHVjdGlvbg0KDQpgYGB7PWh0bWx9DQo8c2NyaXB0Pg0KZnVuY3Rpb24gbXlGdW5jdGlvbigpIHsNCg0KICAgICAgICAgICAgdmFyIGJ0biA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCJteUJ1dHRvbiIpOw0KICAgICAgICAgICAgLy90byBtYWtlIGl0IGZhbmNpZXINCiAgICAgICAgICAgIGlmIChidG4udmFsdWUgPT0gIkNsaWNrIHRvIGhpZGUgY29kZSIpIHsNCiAgICAgICAgICAgICAgICBidG4udmFsdWUgPSAiUmVhZCBtb3JlIjsNCiAgICAgICAgICAgICAgICBidG4uaW5uZXJIVE1MID0gIlJlYWQgbW9yZSI7DQogICAgICAgICAgICB9DQogICAgICAgICAgICBlbHNlIHsNCiAgICAgICAgICAgICAgICBidG4udmFsdWUgPSAiQ2xpY2sgdG8gaGlkZSBjb2RlIjsNCiAgICAgICAgICAgICAgICBidG4uaW5uZXJIVE1MID0gIlJlYWQgbGVzcyI7DQogICAgICAgICAgICB9DQogICAgICAgICAgICAvL3RoaXMgaXMgd2hhdCB5b3UncmUgbG9va2luZyBmb3INCiAgICAgICAgICAgIHZhciB4ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoIm15RElWIik7DQogICAgICAgICAgICBpZiAoeC5zdHlsZS5kaXNwbGF5ID09PSAibm9uZSIpIHsNCiAgICAgICAgICAgICAgICB4LnN0eWxlLmRpc3BsYXkgPSAiYmxvY2siOw0KICAgICAgICAgICAgfSBlbHNlIHsNCiAgICAgICAgICAgICAgICB4LnN0eWxlLmRpc3BsYXkgPSAibm9uZSI7DQogICAgICAgICAgICB9DQogICAgICAgIH0NCiAgICAgICAgICANCjwvc2NyaXB0Pg0KYGBgDQpXZWxjb21lIHRvIHRoZSB3ZWJzaXRlIG9mIHRoZSBTdW5iZWx0IDIwMjQgV29ya3Nob3A6ICoqV2Vic2NyYXBpbmcNClNjaWVudGlmaWMgQ28tcHVibGlzaGluZyBOZXR3b3JrcyoqDQoNCjxicj4NCg0KSW4gdGhpcyB3b3Jrc2hvcCB5b3Ugd2lsbCBjb2xsZWN0IGNvLXB1Ymxpc2hpbmcgbmV0d29ya3MuIFlvdSB3aWxsIGxlYXJuIHRvIHdlYnNjcmFwZQ0Kc2NpZW50aWZpYyBtZXRhZGF0YSBvZiBzY2llbnRpZmljIHVuaXZlcnNpdHkgYW5kIGRlcGFydG1lbnRhbCB3ZWJzaXRlcw0KKHZpYSBSIHBhY2thZ2VzIGxpa2UgcnZlc3QgYW5kIFJTZWxlbml1bSksIGFzc2lnbiBuYW1lLWJhc2VkIGdlbmRlciBhbmQgcmV0cmlldmUgc2Nob2xhcnMnIHB1YmxpY2F0aW9ucyAodmlhIGUuZy4gR29vZ2xlDQpTY2hvbGFyIGFuZCBPcGVuQWxleCkuIElmIHRpbWUgYWxsb3dzLCB3ZSB3aWxsIGJhc2VkIG9uIHRoaXMgY29uc3RydWN0IChsb25naXR1ZGluYWwpIGNvLXB1Ymxpc2hpbmcgbmV0d29ya3MuDQoNCkZvciBlYWNoIHN0ZXAsIHdlIHByb3ZpZGUgY2xlYXIgKHByb29mLW9mLXByaW5jaXBsZSkgY29kaW5nIGV4YW1wbGVzIGFuZA0Kb3V0cHV0IGRhdGEsIGVuc3VyaW5nIHlvdSB3aWxsIG5vdCBnZXQgc3R1Y2sgYWxvbmcgdGhlIHdheS4gRGVwZW5kaW5nIG9uDQp5b3VyIHNraWxscyBhbmQgcHJvZ3Jlc3MsIHlvdSBtaWdodCB3YW50IHRvIGNvbGxlY3QgYW5kIGRlc2NyaWJlIHlvdXINCm93biBjaG9zZW4gdW5pdmVyc2l0aWVzIG9yIGRlcGFydG1lbnRzLg0KDQpUaGlzIHdvcmtzaG9wIGNhbiBiZSBmb2xsb3dlZCBhcyBhIHN0YW5kYWxvbmUgd29ya3Nob3AgYnV0IGluIG91ciBzZWNvbmQNCndvcmtzaG9wICdBbmFseXppbmcgdGhlIFN0cnVjdHVyZSBhbmQgRXZvbHV0aW9uIG9mIFNjaWVudGlmaWMNCkNvLXB1Ymxpc2hpbmcgTmV0d29ya3MnIHdlIHdpbGwgZGVzY3JpYmUgYW5kIGFuYWx5emUgdGhlIHNhbWUgdHlwZSBvZg0Kd2Vic2NyYXBlZCBjby1wdWJsaXNoaW5nIG5ldHdvcmsgZGF0YSBieSBlbXBsb3lpbmcNClJTaWVuYS4NCg0KWW91IHdpbGwga2VlcCB0cmFjayBvZiB5b3VyIHdvcmsgdmlhIGEgbGFiam91cm5hbCBvbiBHaXRIdWIuIA0KDQpQcmVyZXF1aXNpdGVzOg0KDQotICAgSW50ZXJtZWRpYXRlIGZhbWlsaWFyaXR5IHdpdGggd29ya2luZyBpbiBSIGFuZCB1c2luZyBSIE1hcmtkb3duXA0KLSAgIEEgYmVnaW5uZXIncyB1bmRlcnN0YW5kaW5nIG9mIFNOQVwNCi0gICBFbnRyeS1sZXZlbCBvZiBnaXQsIGFuZCBHaXRIdWINCg0KYGBgez1odG1sfQ0KPCEtLQ0KDQo8YnV0dG9uIGNsYXNzPSJidXR0b24xIiBvbmNsaWNrPSJteUZ1bmN0aW9uKCkiIGlkPSJteUJ1dHRvbiIgdmFsdWU9IkNsaWNrIFRvIE9wZW4gSW5zdHJ1Y3Rpb25zIj4NCg0KUmVhZCBtb3JlDQoNCjwvYnV0dG9uPg0KDQo6OjogeyNteURJViBzdHlsZT0iZGlzcGxheTpub25lOyJ9DQo8YnI+DQoNCm1vcmUgdGV4dCBoZXJlDQoNCjxicj4NCjo6Og0KDQotLS0+DQpgYGANCg0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQoNCiMgR2V0dGluZyBzdGFydGVkDQoNCiMjIExhYkpvdXJuYWwNCg0KRHVyaW5nIHRoZSBjb3Vyc2UsIHN0dWRlbnRzIHdpbGwgam91cm5hbCB0aGVpciB3b3JrIGFuZCBhc3NpZ25tZW50cyBpbg0KdGhlaXIgY3VzdG9tIGxhYiBqb3VybmFsLiBBIHRlbXBsYXRlIGxhYiBqb3VybmFsIGNhbiBiZSBmb3VuZCBvbg0KW0dpdEh1Yl0oaHR0cHM6Ly9naXRodWIuY29tL3JvYmZyYW5rZW4vTGFiSm91cm5hbCkuIEhlcmUsIHlvdSBmaW5kIGhvdw0KdG8gZ2V0IHN0YXJ0ZWQuDQoNCjwhLS0gIyMgRGlzY29yZCAtLT4NCg0KPCEtLSBQbGVhc2Ugam9pbiB0aGUgW2Rpc2NvcmQgY2hhbm5lbF0oaHR0cHM6Ly9kaXNjb3JkLmdnL0FDY1lVOG51KS4gVXNlIHRoZSAtLT4NCg0KPCEtLSBjaGFubmVsIHRvIGNoYXQsIGFzayBxdWVzdGlvbnMgYW5kIHNoYXJlIChzbWFsbCkgZmlsZXMuIC0tPg0KDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCg0KIyBQcm9ncmFtDQoNClRoZSBwcm9ncmFtIG9mIHRoaXMgd29ya3Nob3Agd2lsbCBiZSBhcyBmb2xsb3dzOg0KDQpgciBjb2xvcml6ZSgiSW50cm9kdWN0aW9uIiwgInJlZCIpYFwNCipUaW1lOiA5OjAwYW0gLSA5OjMwYW0qDQoNCi0gICAqKmFsbCoqDQogICAgLSAgIHJvdW5kIG9mIGludHJvZHVjdGlvbnMNCi0gICAqKkpvY2hlbSwgUm9iLCBEYW4qKg0KICAgIC0gICBPdXIgZ29hbHMgZm9yIHRvZGF5DQogICAgLSAgIFJlc2VhcmNoIFF1ZXN0aW9ucyBiYXNlZCBvbiBTY2llbnRpZmljIENvLXB1Ymxpc2hpbmcgTmV0d29ya3MNCiAgICAtICAgRGF0YSByZXF1aXJlbWVudHMgZm9yIFJRc1wNCg0KYHIgY29sb3JpemUoIkxhYiBKb3VybmFsIiwgInJlZCIpYFwNCipUaW1lOiA5OjMwYW0gLSAxMDozMGFtKg0KDQotICAgKipSb2IqKg0KICAgIC0gICBbTGFiIEpvdXJuYWxdKGh0dHBzOi8vZ2l0aHViLmNvbS9yb2JmcmFua2VuL0xhYkpvdXJuYWwpDQogICAgLSAgIFNvbWUgR2l0IHBvaW50ZXJzDQogICAgLSAgIFNvbWUgUk1hcmtkb3duIHBvaW50ZXJzDQoNCmByIGNvbG9yaXplKCJCUkVBSyIsICJyZWQiKWBcDQoqVGltZTogMTA6MzBhbSAtIDEwOjQ1YW0qDQoNCmByIGNvbG9yaXplKCJXZWJzY3JhcGluZy1zY2hvbGFycyBQYXJ0IDEiLCAicmVkIilgXA0KKlRpbWU6IDEwOjQ1YW0gLSAxMjozMHBtKg0KDQotICAgKipKb2NoZW0qKg0KICAgIC0gICBXZWJzY3JhcGluZyAtIFtUdXRvcmlhbCBJXSgxMHNjaG9sYXJzLmh0bWwpDQotICAgKipEYW4qKg0KICAgIC0gICBXZWJzY3JhcGluZyAtIFtUdXRvcmlhbCBJSV0oMTFzY2hvbGFycy5odG1sKQ0KLSAgICoqUm9iKioNCiAgICAtICAgV2Vic2NyYXBpbmcgLSBbVHV0b3JpYWwgSUlJXSgxMnNjaG9sYXJzLmh0bWwpDQoNCmByIGNvbG9yaXplKCJCUkVBSyIsICJyZWQiKWBcDQoqVGltZTogMTI6MzBwbSAtIDE6NDVwbSoNCg0KYHIgY29sb3JpemUoImdlbmRlcml6ZSIsICJyZWQiKWBcDQoqVGltZTogMTo0NXBtIC0gMjo0NXBtKg0KDQotICAgKipKb2NoZW0qKg0KICAgIC0gICBuYW1lcyBjdXJhdGluZyAtIFt0dXRvcmlhbF0oMjBuYW1lcy5odG1sKQ0KICAgIC0gICBuYW1lcyB0byBnZW5kZXIgLSBbdHV0b3JpYWxdKDMwZ2VuZGVyaXplUi5odG1sKQ0KDQpgciBjb2xvcml6ZSgiV2Vic2NyYXBpbmcgLSBwdWJsaWNhdGlvbnMiLCAicmVkIilgXA0KKlRpbWU6IDI6NDVwbSAtIDM6NDVwbSoNCg0KLSAgICoqSm9jaGVtKioNCiAgICAtICAgU2NyYXBpbmcgcHVibGljYXRpb25zIC0gW3R1dG9yaWFsXSg0MHB1YmxpY2F0aW9ucy5odG1sKQ0KDQpgciBjb2xvcml6ZSgiRXJyb3JzLCBidWdzIGFuZCBjcmFzaGVzIiwgInJlZCIpYFwNCipUaW1lOiAzOjQ1cG0gLSA0OjMwcG0qDQoNCi0gICAqKkFsbCoqDQogICAgLSAgIE93biB3b3JrDQogICAgLSAgIFF1ZXN0aW9ucw0KICAgIC0gICBFdmFsdWF0aW9uDQo=


Copyright © 2024 Jochem Tolsma