Language, Context and Location

Language, Context and Location Svenja Adolphs

Language and Context Everyday communication has evolved rapidly over the past decade with an increase in the use of digital devices. Techniques for capturing and representing language in context are also changing. Contexts are dynamic and always changing. They can be defined as everything outside the expression itself that is necessary for unambiguous interpretation of [that] expression (Heylighen and Dewaele 2003: 293).

Language and Context Although the notion of language in context has long been perceived to be of critical importance to linguistic research, facilities for examining this relationship in terms of conventional corpus linguistic methodology are limited. There is now a need to take account of this change and to develop corpora which include information about context and its dynamic nature.

Language and Context This presentation will discuss some of the different ways in which we may relate measurements of different aspects of context gathered from multiple sensors (e.g. position, movement and time) to people s use of language: Design (record) Representation Analysis (replay)

DReSS I: Multimodal Corpora

DReSS II: Ubiquitous Corpora

Recording of discourse beyond the text. Record Adding video and audio to basic transcription records in multimodal corpora represents a step towards enriching the textual rendering of a discourse event with further contextual data (in multimodal corpora). The development of heterogeneous corpora depends on our ability to record and represent different modes of discourse and different types of contextual information in an integrated manner.

(Re)present DRS provides a framework for the organization and representation of qualitative fieldwork data, supporting: Synchronisation Transcription Coding Annotation Visualisation Filtering Thick description

VIDEO 23 Replay: DRS

VIDEO 25 Replay: DRS

Thrill A 55,000 word corpus of fairground discourse, comprised of synchronised records of audio, video and sensory (i.e. heart rate) data. 55 participants (mainly recorded in pairs) 19 women, 26 men Ages range from teens to late 50s Over 11 hours video

Thrill Data has been transcribed and divided into 4 key phases: Aims: Pre-ride phase The elevation of the ride Start of the ride Ride terminus To examine whether any patterns emerge in specific language used within/ across the phases. To outline and test an appropriate to the analysis of heterogeneous data sets for linguistic enquiry.

VIDEO 26 SEGMENT Thrill

Thrill

(Oh) my God Phase 3 (Oh) my god is used 85 times by 21 different speakers. It occurs most often at phases 2 and 3 of the ride- ride elevation and movement.

Location based data Provides the means for exploring patterns of language use across speakers, modes of interaction (i.e. with the use of computer devices), time and place. This provides the foundations for providing a better understanding of the importance of contextual features of discourse.

Location based data Early efforts: utilising separate recording devices to collect data on the move

Early visions

Field Work Tracker A bespoke mobile application which creates detailed location based logs. This was developed to support the capture for qualitative analysis of fieldwork data, providing a cheap and simple multi-function recorder which allows for automated synchronisation of data. Studies can be tracked from the users perspective or the researchers perspective. Users can take photographs, audio recordings or movies and make textual notes as well as the recorded locations.

DRS and Field Work Tracker Fieldwork Tracker application

Location data in DRS DRS supports the analysis and creation of descriptive categories of location: in a cafe, at home and so on. These can be searched, sorted, filtered and queried using the DRS analysis tools. Logs from the measured locations (obtained using the Fieldwork Tracker) can have descriptive labels assigned to them in DRS (as a form of metadata), allowing the analyst to the investigate patterns across a larger and more context relevant dataset.

VIDEO 2 SEGMENT Location data in DRS

British Art Show

British Art Show 10+ hours of transcribed audio data collected from 3 pairs of visitors (1 M-M, 1 M-F, 1 F-F), capturing: Physical movements Interactions focused on planning, logistics Interactions focused on the socially negotiated goal of seeing art How they plan, negotiate & find each other Variation in language through changing contexts (home, street, gallery, friends & strangers)

British Art Show Video clips were recorded by participants and researcher. Photographs were also taken by participants. The BAS study data was collected using the Fieldwork Tracker application, thus have all the necessary synchronisation to enable DRS to, with one click, import all data from a Fieldwork Tracker session into a project in DRS.

VIDEOS 3, 28, 29, 30 British Art Show

DRS allows users: To generate word frequency lists Analysing data Run concordance searches over multiple different data sources. View specific concordance outputs on a map. Add metadata codes to map, allowing users to query data by searching for co-occurrences of codes and/or lexical items. Tabulate coded features. Use coded elements of the map as a means for drilling into the data.

VIDEO 9 Analysing data

VIDEO 10 Analysing data

Crowdsourcing Crowd sourcing is a method by which we can gather a large amount of data collected by the population at large. People contribute in some way to a large database of information that is made publically available. As researchers, this gives us access to potentially incredibly rich and varied datasets.

Crowdsourcing The OED provides one of the earliest examples of crowd sourcing. An open call was made to the community for contributions by volunteers to index all words in the English language and example quotations for each of their usages. In the 70 year project, they received over 6 million submissions. With the advent pervasive technology, crowd sourcing is an increasingly viable approach to data gathering.

Crowdsourcing Ushahidi is a crowdsourcing website which was used to collect messages from a wide range of individuals following the Haiti earthquake in 2009. Users can send messages to the site to report incidents which occur at a specific time and place. 3600 incidents were reported on this site. The database is stored as a CSV file which can be accessed by anyone.

VIDEO 16 SEGMENT Crowdsourcing

VIDEO 17 SEGMENT Crowdsourcing

Concluding remarks Developing a better, multifaceted picture of context (Bazzanella, 2002: 239) is an ongoing challenge. This is crucial to the development of better descriptions of language-in-use and to the development of applications based on those descriptions. The ability to generate more contextually sensitive descriptions of language in use will shed new light on the relationship between form and function.

Concluding remarks Access to heterogeneous corpora inevitably requires us to rethink the notion of the unit of analysis in corpus linguistics research. As we develop a better understanding of the nature of the co-dependencies between language and context, the focus of the unit of analysis may shift from the word or sequence of words, to a contextually defined episode of interaction which may include multiple modes of discourse and which is dynamic in nature.

Concluding remarks Ongoing developments in this research space would represent a departure from traditional corpus linguistic approaches but it should strengthen the explanatory power of any results that emerge from the study of large principled collections of text in context.

Acknowledgements Research team The Digital Records for e-social Science Project is funded by the ESRC.