Goals and outcomes
The National Library of Scotland will soon be publishing a text-based dataset containing the first eight editions of the Encyclopaedia Britannica, from 1771 to the mid-nineteenth century. We will let you know when the dataset is published so you can have a look around and familiarise yourself with it.
In the meantime, you can brush up on your natural language processing skills, or look at the existing information on the collection to begin thinking about interesting questions you can ask of the data.
The analysis of text-based data is becoming an increasingly important part of data science and digital scholarship. Digital scholars are using new techniques in natural language processing to make sense of enormous datasets and gain more knowledge about our past, present and future. By contributing to this challenge, you will be developing solutions that will help future scholars gain more information on how to work with these datasets in productive ways.
What is the possible impact of a good solution?
A good solution for this challenge will not only allow you to begin creating solutions for working with historical databases, but it might help learn more about how education has been evolving for the past two centuries.
Expectations and requirements for the solutions and participants
The expected outcome of this challenge will be the development of ways in which we can learn more from text-based historical datasets. This doesn't have to be limited to code that processes the data, it can also be a set of questions that can be asked from the data to better understand the information contained in it and build the foundations for further analysis.