Sources

Source of Data

Dyen, I., Kruskal, J., and Black, P. (1992). An indoeuropean classification: A lexicostatistical experiment. Transactions of the American Philosophical Society. 82, (5).

How to Download Yourself

You can download Dyen, Kruskal and Black’s (1992) dataset from github user jorgeklz’s public mstknnclust repository. Here’s how you can access the data to work with it yourself:

  1. > install.packages(“devtools”)

  2. > library(usethis)

  3. > library(devtools)

  4. > install_github(“jorgeklz/mstknnclust”)

    1. Note: If this doesn't work then look around on Github for other versions of mstknnclust
  5. > library(mstknnclust)

  6. > data(“dslanguages”)

  7. > dslanguages

    1. Running this should give you names of languages and numbers (in decimal form) representing their lexical similarity (how similar their words are to each other)
  8. > view(dslanguages)

    1. This will create a window in the top left of RStudio that lets you see all the data at once, letting you know if you have downloaded it correctly.