World Scientific Community on Ebola

Web of Science, 2000-2014


Methods

Here is a breif description of the methods applied in the development of this project.

Geo-location of publications

The addresses reported in the publications were assigned to one administrative unit reported, which in most cases was a city. This process was performed using computer algorithms and refined by hand until the target precision (>97%) was reached (see the QA method described below). The geo-location enabled analyzing the distribution of publications per regions i. et. continents, countries, etc.

Attribution of publications to organizations

The addresses were manually mapped to respective organizations. During this process the variants of the name of an organization were unified to a single organization entity in the database of the study. The names used to create the organizations in the database were obtained from the official websites, as well as from publications on journals.

In general, the preferred official name was recorded both in the local language, as well as in English whenever possible. Hierarchy relationships such as university campuses and institutes, hospital affiliations or research centers pertaining to councils or academies were reconstructed in the database of the study from the information contained in the addresses, previous verification with the information in official websites. Further, all organizations were associated to their respective locations.

The information on hierarchy and location was used during the unification to solve cases were mother organizations were not present, as well as to unify variants when the location was the only piece of information leading to the organization. Casuistry of changes in the structure of organizations within the analyzed period (centers' life cycle) was treated as follow:


Quality assessment (QA)

The QA consisted of categorizing a number of randomly sampled addresses as “correctly” or “wrongly” mapped to an organization. Given that only one of the two outcomes is possible in each examination, a binomial distribution of the outcomes was expected. Thus, a Negative binomial distribution X~NB(r,p) can be used to determine the number of “wrong” cases (successes, “p”) before a specified number of “correct” (failures, “r”) occur.

In the present study we applied the QA to ensure with 95% confidence that addresses were unified with a maximum percentage of error of 5%.

Classification of publications into scientific disciplines

Clarivate Analytics, propietary of the Web of Science, sort publications according to their publication journal into roughly 250 scientific disciplines included in the Journal Citation Report (JCR), which in turn are classified into 22 fields and 7 broad areas.

Detecting networks of authors

Normally, the detection of the groups of researchers is performed through an iterative process combining phases of co-authorship analysis and normalization of authors. In some cases the procedure also involves the intervention of experts in the field to gauge the plasability of the resulting groups.

Co-authorship analysis (phase 1)

This process is performed by a computer program designed to maximize the quality of the resulting groups of authors by executing two different in-house-developed co-authorship algorithms.
In our experience the results of the co-authorship analysis provide a good estimation of the number of groups resulting at the end of teh whole process.

Normalization of authors’ names (phase 2)

This phase involves two different processes:
- 1) the unification of the various names (signatures) with which a given author is recorded in the source dataset (unification of variants), and
- 2), the extraction of his/her publications from signatures created with common family names that group the production of several authors in the source (disambiguation of homonym).

The difficulty to perform these two processes depends entirely on the availability of specific pieces of information on the authors like (current and past) full name, host institutions, main research fields, email addresses and coauthors.
In general, the most significant effect of the normalization of authors' name is the reductions of the number of members in the detected groups.