Transkribus

From CJH Wiki
Jump to navigation Jump to search

The Center implemented Transkribus in the Summer of 2025, together with staff from our Partner institutions, for various pilot projects. The new technology is also being incorporated into reference and research requests in the Lillian Goldman Reading Room, Ackman & Ziff Genealogy Institute, and throughout the Center. This is truly a new endeavor where the Center community will need to learn as a group how to most effectively use Transkribus. To contact the Transkribus User Group at the Center to ask questions, seek advice, or share successes, feel free to email the distribution list.

Collectively, the Center has an Epoch Plan subscription, which gives the Center community access to 15 seats (project log-ins that can leverage language super models), 60,000 credits toward transcription projects, and 1TB of file storage.

To log into Transkribus, please visit https://app.transkribus.org.

Overview

Transkribus harnesses artificial intelligence to help decipher digitized handwritten and printed historical texts. Transkribus employs a credit-based system for analyzing the structure, layout, and handwriting found in uploaded images.

For most projects using Transkribus, whether utilizing a language model or training one of your own, these are the basic workflow steps:

  1. Uploading an image or PDF file
  2. Recognition
  3. Editing transcribed
  4. Sharing or exporting a file

Each step within the workflow presents options to maximize the accuracy of what is transcribed in a translated text. With options comes troubleshooting; consult the Help Center for additional assistance.

Just as with the early days of digitization at the Center, best practices will emerge for undertaking new Transkribus projects and integrating newly translated, discovered information into our other shared library systems. Together, we will discuss, record, and disseminate these recommendations within the wiki.

Credit Usage

Recognition Type=Credit Consumption

  • Handwritten Text + Lines=1 credit
  • Printed Text + Lines=0.5 Credits
  • Lines Recognition=0.25 Credits
  • Tables Recognition=1 Credit
  • Fields Recognition=1 Credit

Credits can only be allocated by an administrator. Please contact Metadata & Discovery Services to discuss your project, coordinate an invite for a project seat, or to allocate credits for working in Transkribus.

Resources

The cooperative developers, Read-Coop, that created Transkribus offer many recorded webinars and tutorials on using the artificial intelligence tool. There is also the Transkribus Help Center, which offers extensive documentation and a search bar for troubleshooting.

Starting to use Transkribus

YouTube playlist The Transkribus team offer a YouTube playlist that will help with learning how to use Transkribus AI, please see Getting Started with Transkribus.

More Advanced Webinars

Past User Conferences

Other pilot projects and use cases

More information on Language Models and Super Models in Transkribus

Selected List of Available Large Language Super Models

Depending on the scope and desired outcome for a Transkribus project, using a language super model may be easier than training AI to transcribe.

  • The Text Titan I (GER, DUT, FRE, FIN, ENG, SWE)
  • Dutch Dean (DUT)
  • Dansk Dokumentalist (DAN)
  • German Genius (GEN)
  • Polski Bizon (POL)
  • English Elder (ENG)
  • Faucon Français (FRE)
  • Spanish Sage (SPA)

A complete list of language models is available here.

To request exported files

Transkribus project users should primarily utilize human-legible JPG files (image/jpeg) for work within the system. A smaller file size speeds file uploads and minimizes shared storage space on Transkribus servers. Metadata & Discovery Services can assist with exporting Rosetta files and converting large preservation master files (usually TIFF files) into more lightweight JPG files.

Please use the CJH Help Desk to contact Metadata Services with a list of IEs that you need for your respective project. The files will be exported from Rosetta, reformatted to JPG, and placed in a mapped SFTP location for you to retrieve. Please note provenance information for your future reference, such as original collection, location, and filename for exported materials.

A new dropdown option has been added in the CJH Help Desk

Upload the requested files to Transkribus or move them to a local desktop to work on your project. Files will be deleted from the SFTP server after 30 days.

Ethical Guidelines examples for Use of Artificial Intelligence in Archives