Geoscience Reference
In-Depth Information
at least one capital letter in the fi rst position of each word. 4 Three main analyses
were conducted using place-name and concept N-grams as input data: place-names,
research topics, and publication venues.
13.2.2
Where Are ILTER Researchers Based, Which Regions
Do They Study?
Plausible place-names matched the names of a political geographic unit (including
countries, autonomous regions, and major sub-national states) or a major geograph-
ical feature (such as the Andes, the Arctic, or the Pacifi c Ocean). The automatically
coded and uncoded data were then inspected manually. Plausible place-names that
appeared fi ve 5 or more times in the data were given manual coding rules (e.g., place-
names ending in “-shan” were coded as occurring in China since “-shan” is a com-
mon Romanization of the Chinese word for mountain).
A single title may include more than one place-name (such as “Kruger National
Park, South Africa”). No attempts were made to identify any hierarchical or other
relationships among such place-names. Errors of automatic coding were culled by
adding manual coding rules (e.g., excluding matches based on the n-gram “Rio”
alone, which matched many rivers in Latin and South America and parts of Europe).
A small number of endemic species, such as the Adelie penguin endemic to
Antarctica, were also used to geo-locate publications. From over 60,000 plausible
place-names, over 11,000 place-names were coded from 10,228 publication titles.
The vast majority of capitalized words in titles not accurately identifi able as place-
names were excluded from the place-name analysis. Over 90 different countries and
regions were identifi ed from titles and abstracts in this way.
The geographic origin of researcher and the geographic areas that are studied by
researchers were both coded into one of the following six geographic zones (A-F)
(Fig. 13.2 ):
A = Arctic (> 66° N), north of the Arctic Circle;
B = North Temperate (66° N - 23° N) , south of the Arctic Circle and north of the
Tropic of Cancer;
C = North Equator (23 °N - 0 °), south of the Tropic of Cancer and north of the
Equator;
D = South Equator (0 ° - 23° S), south of the Equator and north of the Tropic of
Capricorn;
4 For example, N-grams including “Antarctic”, “Cascade Mountains”, and “Wisconsin United
States” were identifi ed as plausible place-names. These plausible place-names are the basis of
further analysis.
5 The lower limit of fi ve is arbitrarily chosen, but reasonable in light of other place-names and kinds
of place-names that appear dozens or hundreds of times. Frequent non-place-names included any
word that appeared at the beginning of the title, such as “Assessing” and “The”, along with genus
names.
Search WWH ::




Custom Search