The Project | LInfoVis programs | Other resources | Our presentations | Partners
Comparison Arcs: Proof of concept of a way to compare linguistic properties of multiple documents by means of graphical displays.
Corpus Clouds: A novel interface to a corpus query engine, with the aim of aiding the user in exploratory search by providing, in an easy to comprehend visual form, information about the frequency and distribution of the search results in combination with the standard KWIC elements.
Double Tree: A visualization component targeted to support the exploratory analysis of concordances. It provides a new way to display a KWIC as a double sided tree, that can dynamically be expanded and browsed.
End to End: A visualization targeted to the exploratory analysis of corpora, with special focus to analyzing collocations in context. Occurrences of two search terms and their intermediate context are represented as networks.
Extended Linguistic Dependency Diagrams: a visualization tool specialized for the graphical presentation of dependency structures and the dynamic interaction with these visualizations.
interHist: a visualization targeted to the exploratory analysis of concordance data. It provides an interactive overview of statistically enhanced query structures by providing an intermediate layer between the search query, as defined by the user, and KWIC results as well as frequencies derived from the corpus.
Structured Parallel Coordinates: Specialization of the Parallel Coordinates visualization (cf. e.g. Inselberg 2009) designed for presentation and analysis of structured multidimensional data.
Word Clouds: Simple example of using Word Clouds to display linguistic information.
Comparison Arcs is a proof of concept of a way to compare linguistic properties of multiple documents by means of graphical displays. In this version we compare the first chapter of Pinocchio in English and in Italian. It is possible to search both texts in parallel for words, lemmas, and parts of speech. For both texts the results are shown as arcs linking successive occurrences of the units found within a text. By looking at the diagrams for comparable searches in English and Italian (e.g. looking for "wood" and "legno" or "I" and "io"), we can get a quick idea of the distribution of the words in the texts.
While this program is merely a proof of concept, we have many ideas for how it could be extended. For example, the searches could be more complex or even linked to a corpus query engine. As well, different numbers and types of texts could be compared
We are looking for partners. We would like to develop Comparison Arcs into applications for specific uses, e.g. in terminology, language teaching, literary analysis, etc. If you see how something like Comparison Arcs could help you, let us know and let's see what we can do together.
Try Comparison Arcs (needs at least Java 1.5 Web Start).
Corpus Clouds is a novel interface to a corpus query engine, with the aim of aiding the user in exploratory search by providing, in an easy to comprehend visual form, information about the frequency and distribution of the search results in combination with the standard KWIC elements. In this demo, we use the Corpus Query Processor (CQP) from Corpus Workbench (IMS Stuttgart University), but it could be adapted for other corpus query engines that provide the same relevant functionalities.
Download a demo version of the program (needs at least Java 1.5)
This fully functional demo version provides access to two small corpora of press releases from EURAC. To run the demo, unzip the archive to a local directory, then double click on the file CorpusClouds.jar
C. Culy & V. Lyding. 2009. "Corpus Clouds - facilitating text analysis by means of visualizations". In: Proceedings of the 4th Language & Technology Conference, November 6-8, 2009, Poznan, Poland, pp. 521-525.
Double Tree (version 1.4, 24 August 2011)
Double Tree is a visualization component targeted to support exploratory corpus analysis, with particular focus on analyzing concordances. It shows a new representation of a KWIC for a single word, by collapsing the contexts into a double sided tree. Each side can be expanded independently (but only one path per side at a time), thus allowing for dynamic browsing of the results with respect to the context on both sides of the word.
Double Tree is released as open source software under the new BSD license. Download Double Tree 1.4, with two live demos (Requires Java 6).
One demo uses two small corpus of press releases in German and Italian from EURAC. The search functionality of that demo is currently based on the Corpus Query Processor (CQP) from Corpus Workbench (IMS Stuttgart University).
End to End
End to End is a visualization component targeted to exploratory corpus analysis with special focus on analyzing collocations. Starting from two search words End to End constructs and displays a network of all occurrences of context between these two words in a corpus. The interface allows for interactive zooming and browsing of the graph.
Unfortunately, we cannot currently distribute End To End due to licensing issues, though we hope to remedy this in the near future. In mean time, you can view Screenshots of End to End.
The demo version uses a small corpus of German press releases from EURAC. As search engine we currently use the Corpus Query Processor (CQP) from Corpus Workbench (IMS Stuttgart University).
Extended linguistic dependency (xLDDs) (version v1.2, 18 August 2011)
Extended Linguistic Dependency Diagrams (xLDDs) are a visualization tool specialized for the graphical presentation of dependency structures and the dynamic interaction with these visualizations. xLDDs accommodate different types of linguistic information that are provided with dependency structures and are designed for a linguistically oriented audience, ranging from experts in dependency linguistics to language teachers and learners concerned with dependencies.
In designing xLDDs we wanted to provide a tool that can easily be customized and integrated into targeted applications. To illustrate some of xLDD's various features and interaction possibilites, we are presenting three sample applications that give the user the power to define and alter the visual presentation of the information.(NOTE: Due to limitations of the Protovis toolkit that we use, the applications will not work in older versions (7 and older) of Internet Explorer; new versions have not been tested). You can try out the sample applications here:
a basic application for English and German data as reanalyzed by By (2008, 2009), with two predefined visual encodings
an advanced application for the German TiGerDB data in DECCA format (Boyd et al., 2007) that provides a number of predefined encoding options that the user can choose from
a fully customizable application for data from the Italian PAISÀ corpus, which is integrated with a corpus query system
an example of how to build diagrams interactively, for example for student exercises
an example of how to show functional, expandable thumbnail diagrams along with search views
The applications are also included in the download.
Extended Linguistic Dependency Diagrams are licensed under the new BSD license (see any of the files for details). While it is research grade software, we hope that you find it useful. Please send any feedback or comments to: firstname.lastname@example.org.
Download Extended Linguistic Dependency Diagrams (500 KB).
If you publish work based on Extended Linguistic Dependency Diagrams please cite the following reference:
Culy, C., Lyding, V., and Dittmann, H. (to appear). "xLDD: Extended Linguistic Dependency Diagrams" to be presented and appear in Proceedings of the 15th International Conference on Information Visualisation IV2011, 12, 13 - 15 July 2011, University of London, UK.
We have written converters for several dependency structure formats, so far including:
the CoNLL 2007 Dependency Parsing format
Decca-XML (Boyd et al., 2007)
the format used by By (2008; 2009)
It would be easy to write converters for other formats as well, given the simplicity of JSDS.
xLDD proper contains an extensible visual encoding and interactive component. This component provides both fundamental aspects of the visualization (arcs, nodes, text labels, etc.) as well as default visual encodings and interactions. These extensive defaults can be overridden programmatically, either to provide new, complex functionality or to provide dynamic user customization, as shown in the applications presented above.
interHist (version 1.0, 8 May 2013)
interHist is a visualization prototype specialized for the interactive exploration of concordances as results to corpus queries. It provides an intermediate layer between a search query, as specified by the user, and the KWIC results and corresponding frequencies derived from a corpus. By abstracting over word sequences based on closed-category attributes like parts-of-speech, a visualization is created that accommodates large result sets in one concise display. Interactive functionalities of interHist allow for the dynamic exploration of this overview level to KWIC results.
To illustrate interHist's features and interaction possibilities, we are presenting a sample application for the analysis of Italian noun phrases:
interHist is licensed under the new BSD license (see demo file for details). While it is research grade software, we hope that you find it useful. Please send any feedback or comments to:
Download interHist (107 KB).
If you publish work based on interHist please cite the following reference:
- Lyding, V. / Nicolas, L. / Stemle, E. (2013): "interHist - an interactive visualization for statistically enhanced query structures", presented at the DGfS 2013 Workshop on the Visualization of Linguistic Patterns, 35. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft (DGfS), 13 March 2013.
Structured Parallel Coordinates (version 1.2, 15 February 2012)
In designing Structured Parallel Coordinates, we wanted to make it easy to extend and reuse. To illustrate those possibilities, we have constructed several examples, some of which we and/or colleagues use in our work. The examples show Structured Parallel Coordinates alone and as part of a broader process of data exploration, connected in particular with corpus queries. Rank Comparison shows how to extend Structured Parallel Coordinates to handle a particular type of data, namely sets of rank orderings (e.g. the most popular songs by week over the last year, or the nouns most commonly used with a number of different adjectives).
You can try out the examples here (they are also included in the download). (NOTE: These will not work in Internet Explorer, at least not earlier than IE 9, which has not been tested).
Structured Parallel Coordinates is licensed under the new BSD license (see any of the files for details). While it is research grade software, we hope that you find it useful. Please send any feedback or comments to: email@example.com.
Download Stuctured Parallel Coordinates (327 KB).
If you publish work based on Structured Parallel Coordinates please cite the following reference:
- Culy, C., Lyding, V., and Dittmann, H. (2011). "Structured Parallel Coordinates: a visualization for analyzing structured language data" to appear in Proceedings of the 3rd International Conference on Corpus Linguistics, CILC-11, April 6-9, 2011, Valencia, Spain, 485-493.
Structured Parallel Coordinates (SPC) uses the Protovis toolkit, and in fact is a greatly reworked version of the Parallel Coordinates example on the protovis web site. SPC has been tested with Firefox, Opera, and Safari on Windows, and with Firefox and Safari on OS X. protovis does not currently work with (most?) versions of Internet Explorer, so neither will SPC.
The Project | LInfoVis programs | Other resources | Our presentations | Partners