Edit online

Migrating Various Document Formats to DITA

Read time: 8 minute(s)

Most companies do not start new DITA-based projects from scratch. They already have content written in various other formats and somehow they need that content converted to DITA. In this blog post, I will offer some conversion advice depending on the format of your current project.

Migrating DocBook Content to DITA.

You can migrate one or multiple DocBook documents to DITA using the Oxygen Batch Documents Converter add-on: https://www.oxygenxml.com/doc/ug-addons/topics/batch-converter-addon.html.

The DocBook to DITA conversion contains an option named Create DITA maps from DocBook documents containing multiple sections. When this option is selected, all sections from your DocBook document will be separated into individual DITA topics and referenced in a DITA map.

Migrating Microsoft Word Content to DITA

The Oxygen XML User Manual has a detailed topic enumerating the possibilities to convert Microsoft Word content to DITA: https://www.oxygenxml.com/doc/ug-editor/topics/ooxml-to-dita.html.

Migrating Excel Content to DITA

You can use Oxygen's Smart Paste functionality to copy content from an Excel spreadsheet and paste it inside an opened DITA topic. Also, as an alternate possibility, the Oxygen Batch Documents Converter add-on was updated to be able to batch convert Excel to DITA: https://www.oxygenxml.com/doc/ug-addons/topics/batch-converter-addon.html.

Migrating LibreOffice Content to DITA

LibreOffice documents can be saved in Word format, and once you do that, you can convert the Word content to DITA as described above. Alternatively, you can save the LibreOffice documents to DocBook and then apply the DocBook to DITA conversion technique described above.

Migrating Google Docs to DITA

You have three possibilities to convert Google Docs to DITA using Oxygen:
  • Copy/Pasting from Google Docs to a DITA Topic opened in Oxygen in the Author visual editing mode should work and convert the pasted content to DITA.
  • Save the Google document as OpenDocumentFormat (ODF) then save the ODF document as DocBook with Libre Office, then apply the DocBook to DITA transformation scenario shipped in Oxygen to convert DocBook to DITA.
  • Save the Google document as HTML then use the Oxygen batch converter add-on to convert it to DITA: https://www.oxygenxml.com/doc/ug-addons/topics/batch-converter-addon.html.

Migrating Markdown Content to DITA

The DITA Open Toolkit publishing engine bundled with Oxygen allows you to reference Markdown files directly in a DITA map and either publish them directly or export the Markdown files to DITA one by one: https://www.oxygenxml.com/doc/ug-editor/topics/markdown-dita-2.html. If you want to convert multiple Markdown documents at once, you can use the Oxygen Batch Documents Converter add-on: https://www.oxygenxml.com/doc/ug-addons/topics/batch-converter-addon.html.

Migrating HTML Content to DITA

Using Oxygen's Smart Paste functionality, you can open the HTML documents in a web browser, then copy the contents and paste it in a DITA topic opened in Oxygen's Author visual editing mode. If you want to convert multiple HTML files, you can use the Oxygen Batch Documents Converter add-on: https://www.oxygenxml.com/doc/ug-addons/topics/batch-converter-addon.html.

Migrating Unstructured FrameMaker to DITA

There is a detailed blog post enumerating the possibilities to convert Unstructured FrameMaker content to DITA: Migrating Unstructured Adobe FrameMaker Content to DITA.

Migrating MadCap Content to DITA

This open source project contains such a stylesheet which attempts to convert a Flare project to DITA XML and instructions about how to use it. As an alternative some recent MadCap versions seem to have facilities to export content directly to DITA.

Migrating Confluence Content to DITA

To convert Confluence content to DITA, you can use the Oxygen Batch Documents Converter add-on: https://www.oxygenxml.com/doc/ug-addons/topics/batch-converter-addon.html.

You first need to export the content to HTML. For this, log in to your Confluence account and navigate to the specific space that you want to export. Then go to Space Settings→Export space and choose to export it as HTML. Then, back on Oxygen, you can then use the Confluence to DITA action (available once the add-on is installed) to convert the exported index.html file into a DITA map with topics.

Migrating AsciiDoc to DITA

The Asciidoctor third-party application can be used to convert AsciiDoc content to DocBook. Then, you can convert the DocBook content to DITA using the method described here.

Migrating reStructuredText to DITA

The Pandoc third-party application can be used to convert reStructuredText content to DocBook or HTML. Then, you can convert the DocBook or HTML content to DITA using the Oxygen Batch Documents Converter add-on.

Migrating LaTex to DITA

You may use a third-party application (like Pandoc) to convert LaTex content to Word or HTML. Afterwards use the Oxygen Batch Documents Converter: https://www.oxygenxml.com/doc/ug-addons/topics/batch-converter-addon.html

Migrating Other Formats to DITA

You may find third-party applications (like Pandoc) that can convert your content to HTML or to some kind of XML format like DocBook. Once you have HTML or DocBook content, you can convert them to DITA using the advice above.