The Bergamo Unit is currently compiling the following two corpora:
1. CADIS (Corpus of Academic English)
As an aid in the analysis of variation in intercultural communication and in the identification of textual variants arising from the use of English as a first language, second language, or lingua franca of the scientific community, we have devised a corpus formed by English - and in part Italian - texts for academic communication, produced by scholars and academic institutions in various parts of the world.
CADIS can thus enable researchers to analyse the most significant macro/microlinguistic variants in terms of identity, evaluation and interpretation in the light of recent linguistic scholarship. More specifically, the data allows an in-depth analysis of the following aspects:
Besides including two alternative languages and representing native as well as non-native speakers, CADIS also represents four different disciplinary areas:
For each disciplinary area, four different textual genres have been considered:
At the moment of writing (July 2011) CADIS comprises 2,761 academic texts, reaching a total of about 12 million tokens, selected and classified by disciplinary area, genre, language, year of publication and source journal.
Project leader: Prof. Maurizio Gotti
2. 19CSC (Corpus of 19th-Century Scottish Correspondence)
The Corpus of 19th-Century Scottish Correspondence (19CSC) aims to be a 'second-generation corpus': i.e., one allowing clearly defined, focused studies in which scholars can concentrate on relatively few authentic texts, rather than edited ones, in order to highlight the specificities of linguistic traits without the risk of interfering 'noise' created by editorial choices. At the moment of writing (July 2011) 19CSC comprises ca. 400 letters (between drafts, fair copies and copies, equally distributed between familiar and business letters), for a total of ca. 110,000 orthographic units. All the letters have been diplomatically transcribed from authentic manuscripts written by (or on behalf of) men and women of varying ages and of varying levels of education, for different purposes.
The corpus lends itself to interesting investigations of early business English, historical pragmatics and 'language history from below', as it comprises a significant section consisting of emigrants' letters sent from the US, Canada, India and Australia. This branch of the corpus is supplemented with the (partial) transcriptions of emigrants' diaries and reports (not necessarily in manuscript form).
The corpus is expected to interact fruitfully with similar collections aiming to facilitate the study of Late Modern English, especially as far as Scottish and American English are concerned: see the Corpus of Modern Scottish Writing (CMSW, http://www.scottishcorpus.ac.uk/cmsw/, based at the University of Glasgow) and the Corpus of Historical American English (COHA, http://corpus.byu.edu/coha/, based at Brigham Young University).
Project leader: Prof. Marina Dossena