Multilingual Zotero

A wider world through closer collaboration

document version 1.00a52

25 March 2011

Author of this project
  • Frank G. Bennett, Jr.
Note: The information in this page is somewhat dated. A documentation update is in the works, and will appear on the CitationStylist site when ready. RSS users may wish to follow the site feed to stay abreast of project announcements.
User interface translations
  • Tatsuki Sugiura
  • Sras Hem
Implementation testing
  • Avram Lyon
  • Florian Ziche
  • Manabu Matsunaka
  • Stephan De Spiegeleire
  • Olga Zelinska

Feedback and suggestions

  • Dan Cohen
  • Bruce D'Arcus
  • Claudia Ishikawa
  • Shigeru Kagayama
  • Shoji Kajita
  • Kumiko Kanamura
  • Simon Kornblith
  • Dan Stillman
  • "Ace Strong"
  • Sean Takats
  • Rintze Zelle
  • + numerous users on the Zotero forums

Contents


Preface

A warm welcome to Multilingual Zotero (MLZ), an experimental variant of the widely distributed Zotero reference management tool. This document provides a brief overview of project, and an outline of features special to the Multilingual version.

Multilingual Zotero is a development branch (a variant) of the main Zotero program. It adds the ability to attach transliterations and translations of names, titles and other fields to Zotero records. Once added to a record, citations can be constructed from these variants (such as the romanized form of Japanese), using standard Zotero styles. Bibliographies, citations, and listings in Zotero itself can also be sorted on a field variant (such as the phonetic transliteration of Chinese text); and records containing multilingual data can be exported and imported, for exchange between database systems. If you work with multiple languages, you will want this tool.

This is an experiment, and you should install it only if you are diligent about backing up your data, and are prepared to deal with the occasional problems that arise with software under active development. That said, if you are diligent about backing up your data, the problems are getting smaller, and the program can be used (and has been used) for real-world projects.

The ultimate aim of this project is to see the incorporation of multilingual functionality into mainstream Zotero itself. This will take time, but the core developers have been sympathetic (the code of the multilingual branch is hosted on Zotero's own servers), and the prospects for seeing the functionality described here in a stable Zotero release further down the road are very good indeed. So if you choose to wait, you will not wait in vain.

Frank Bennett @ Nagoya


Installation

MLZ is a drop-in replacement for the standard Zotero client, but if you have an existing database that you use for important projects, it should be installed in a separate profile. The steps are as follows:

Install Firefox
MLZ is compatible with both Firefox 3.6 and Firefox 4.0.
Set up a separate profile for testing
MLZ will not alter your existing Zotero data, and it is possible to switch back to Zotero 2.1 after running the multilingual client; but this is experimental software, and for safety and the safety of your data, it is strongly recommended that MLZ be run in its own Firefox profile, separate from the one that contains your main database. Instructions on how to set up a separate profile are provided in the Firefox documentation.
Install the multilingual Zotero plugin
The multilingual client can be installed by clicking on this link after starting Firefox in your new profile.
Install a word processor plugin
To insert automatically formatted citations and bibliographies into word processor documents, the appropriate word processor plugin must also be installed in Firefox. Multilingual Zotero uses the same plugins as Zotero 2.1. These can be found on the Zotero website under the heading Word Processor Plug-ins for Zotero 2.1.

Once the client and word processor plugin are installed, you are ready to go.

Features

This overview assumes that the reader is familiar with the principle features of mainstream Zotero itself. Please refer to the main Zotero website for screencasts, third-party guides (including offerings in Chinese, Danish and French), user forums and other sources of information.

There is also a set of homegrown screencasts that cover features specific to MLZ:

Defining languages and variants

Multilingual support in MLZ is built around the IANA Language Subtag Registry, a consolidated list of languages, regions and script variants maintained pursuant to RFC 5646, an Internet standards document that sets down rules for maintenance of the Registry, and provides guidelines on the proper usage of the language tags defined in it. Language tags built in conformance with RFC 5646 allow the form of entries to be expressed concisely, uniformly, and with a high degree of precision.

The Subtag Registry contains over 4,000 entries, which may be combined to specify a bewildering variety of language variants. This is rather more choice than is needed by any specific researcher or project, and for usability, the MLZ user interface shows only those tags that have been specifically enabled. Language tags are enabled automatically as required when multilingual data is imported into the MLZ database. Language tags can also be defined and managed explicitly through the Zotero Preferences panel.

Setting language preferences

To access the language preferences panel, click on the cog (gear) icon, and open the Preferences panel in the usual way. MLZ offers a Languages tab in the panel; clicking on it will open the language preferences pane.

Figure1 shows a pane from a fresh installation of MLZ, in which no languages have yet been defined.

zotero-multi-graphics/language-preferences-empty-600.png

Figure 1: Language preferences panel

Defining a language

To define a language, type a few characters of its name in the text box (the one with grayed out text reading Add a language), and select it from the drop-down menu that appears as you type. You can define as many languages as you like, and they can be deleted at any time, without adverse effects on your data. If you happen to delete delete a language from the Preferences panel that is used by an item in your database, the language will be re-added to your Preferences automatically the next time the item is viewed.

zotero-multi-graphics/language-preferences-pulldown-600.png

Figure 2: Selecting a language

Custom labels

After defining a language tag, you can give it a familiar name. Names can be entered in any script or language that is supported by your computer. In Figure3, a native-language label is being added for Japanese. The custom label will be finalized (like the existing ones shown for German and Spanish) when the Enter key is pressed.

zotero-multi-graphics/language-preferences-label-600.png

Figure 3: Editing a language label

For each defined language, a set of tick-boxes appears under the twin headings User interface and Citations and bibliographies. The function of the tick-boxes is largely self-explanatory, but illustrations of their use will be given later in this overview.

Extending entries with language subtags

Primary language tags (such as de, es, or ja) can be extended with script, region, or variant subtags. For example, if we wish to add romanized Japanese text to names and titles in our database, we would extend the ja tag, to create an additional language tag with the meaning Japanese text written in roman script, using [say] the romanization system adopted by the U.S. Library of Congress.

To extend the Japanese primary tag, we click on the + button to the right of the tag, revealing the selection menu shown in Figure 4. Selecting the item for ALA-LC Romanization, 1997 edition will add a new tag with the value ja-alalc97.

zotero-multi-graphics/language-preferences-subtag-600.png

Figure 4: Adding a subtag to a language

The label for the newly added tag can then be edited in the usual way, giving it an easily recognized, human-friendly name such as Roman (ja).

zotero-multi-graphics/language-preferences-finished-600.png

Figure 5: Language preferences with a finished language subtag

Translation versus transliteration

It is worth stressing the importance of distinguishing between translation (into another language) and transliteration (of the same language into another script). A romanized Japanese name, for example, should not be tagged as en (English), but as ja-alalc97 or ja-hepburn (that is, Japanese text transliterated according to the rules followed by the U.S. Library of Congress, or Japanese text transliterated using a system that falls within the rough category of Hepburn transliteration rules).

As will become clear below, this distinction is important when generating citations. If romanized names are incorrectly tagged as English, it becomes impossible to distinguish between these transliterated forms (which may be used to replace the native-language form) and English translations of titles (which are used as supplementary information only, and should not replace the original title).

Adding variants

Once a set of language tags has been registered in the preferences panel, entries for the defined languages can be added to item content. The steps for adding entries and changing field languages are the same for creators (Author, Editor, etc.) and for ordinary fields that are multilingual-aware, with a very slight variation in the placement of menus. The steps for deletion are different for the two types of fields, but the differences are intuitive and largely self-explanatory.

Creators

Creators are added from a right-click context menu on the creator type label. After creation, language tags can be editing from a left-click context menu, and entries can be deleted in the usual way, by clicking on the - menu item to the right of each entry. Deletion of the main creator is blocked on creators that have multilingual entries, to prevent accidental loss of data.

Adding an entry

A right-click over the creator type label reveals the Add Tag context menu. The menu shows only the language tags that have not yet been added to this creator.

zotero-multi-graphics/creators-add-600.png

Figure 6: Adding a multilingual creator entry.

Changing a language

To change the language (in our example, I mistakenly entered Japanese text instead of German), use the left-click context menu from the multilingual language label.

zotero-multi-graphics/creators-edit-600.png

Figure 7: Changing the language of a multilingual creator entry.

Deleting an entry

Multilingual creator entries can be deleted in the usual way.

zotero-multi-graphics/creators-delete-600.png

Figure 8: Deleting a multilingual creator entry.

Ordinary fields

Ordinary fields that are enabled for multilingual data work in the same way as creators. Fields with multilingual support can be identified by hovering the cursor over the field label, which reveals a thin blue outline. A multilingual field can be removed simply by removing its content.

Adding an entry

Entries can be added to a field using the right-click menu from the fieldname label.

zotero-multi-graphics/fields-add-600.png

Figure 9: Adding a multilingual field.

Changing a language

The language of a multilingual field can be changed in the same way as for multilingual creators, by left-clicking over the language label.

zotero-multi-graphics/fields-edit-600.png

Figure 10: Changing the language on a multilingual field.

Deleting an entry

To delete a multilingual field, just delete its content and it will disappear.

zotero-multi-graphics/fields-delete-600.png

Figure 11: Deleting a multilingual field.

Setting a language on headline entries

While not strictly necessary for generating multilingual citations, users ...

Creators

Pass.

Future Development

Pass.

MARC support

Pass.

Possibilities

The advent of full support for multilingual reference management opens up (literally) a world of possibilities, the extent of which will only become clear in the light of experience with such tools. A couple of immediate thoughts are offered below.

International collaboration

There is a network effect in shared reference archives. Multilingual reference archives and collections that are clean, comprehensive, easily extendable by their users, and tied directly to authoring tools will dramatically lower the barriers to collaboration across language divides. What was once a cumbersome and stovepiped process of document swapping and serial revision will give way, over time, to real-time collaboration of shared documents using shared data.

The infrastructure for such a world is yet incomplete, but as researchers explore the possibilities of this new space, the gap in research efficiency between monolingual and multilingual teams will inevitably begin to close.

With Multilingual Zotero now in place, the next target for close collaboration between authors working over a distance (a frequent requirement in cross-language collaborations) will be real-time, citation-supported document authoring. The Abiword word processor, which implements a document sharing model very similar in concept to Zotero itself (i.e. with local copies updated via a synchronization server) is a promising tool in this regard. Lobbying and programming efforts in this quarter -- or indeed in any quarter that can serve this emerging need -- would be very beneficial to the research community at large.

Crowdsourcing multilingual metatdata

The end-to-end consumption of structured metadata in the production of published works gives authors a strong incentive to curate their local databases, to assure that citations are formatted correctly in finished manuscripts. RDF data exchange technology opens the possibility of feeding the result of this effort back into public archives such as CiNii, to improve the quality of available metadata, and further lighten the writing process. The gains from such a cycle would be particularly great in the international context, where human intervention in the drafting and proofreading of previously unavailable multilingual content is unavoidable.

Quality control is a clear concern for any such initiative, and the ownership of the underlying metadata provided by aggregators must be respected. One possible approach, once multilingual sync becomes available, would be for site engineers to harvest user-generated content from zotero.org and other sites capable of publishing RDF metadata, to analyze the results by automated means, and to provide the result to the original content suppliers for (voluntary) review, editing and approval. Such a workflow would avoid workload spikes in the maintenance process, and would allow original submitters to retain control over officially published metadata appearing on the aggregator website.