AI Positron Assistant - Retrospective Analysis

5 Nov 2024

Read time: 17 minute(s)

It's been more than one year since we started working on the Oxygen AI Positron Assistant Add-on, back when the OpenAI GPT 3.5 AI engine was released, and we are currently on version 4.0.0 of the add-on. Our main direction (which we still keep) was to provide powerful tools based on AI engines for technical documentation writers (targeted mostly towards DITA XML users) and for our XSLT/Schematron developers. I've always envisioned the AI set of tools as an exoskeleton, as a way to add more power to a technical documentation writer's punch throughout the day. The table below contains a list of concepts (mostly add-on specific features or feature categories), each with a short description based on personal experience.

Eating our own dog food?

Are we using AI tools ourselves? Yes, I have started using AI tools more and more to improve readability and fix grammar or logical inconsistencies in articles like this one. I don't ask the AI to create new article ideas, but I do use AI tools to make the content I write easier to read and understand.

Usefulness

Are AI engines useful for a technical writer's work? Definitely yes. People might think they can just send the AI a bunch of content with a simple instruction like Spell check this content and the AI will always return the correct content. The AI engine gives meaningful answers when the prompt is detailed and precise. This is why we include many predefined actions with our add-on.

Platform availability

From the start, we wanted the add-on to work for both the desktop Oxygen application and the WebAuthor in-browser editor. We included core features like predefined actions, chat support, and the ability to create custom actions. This way, we can offer most of these features on both the desktop and the web application.

AI engine choice flexibility

Our initial offer was a subscription-based license using the Oxygen AI Positron Platform as a bridge to OpenAI. We noticed that some companies wanted a choice of which AI engine to use, so we created the Oxygen AI Positron Enterprise add-on. This add-on allows companies to use their own AI engine directly, with support for connectors to OpenAI, Microsoft Azure OpenAI, and Anthropic Claude. There are also proxy servers that can wrap many AI engines like LLama and Mistral, allowing them to be accessed using the OpenAI interface. Is the current flexibility we offer enough? Not in all cases. We plan to allow developers to create their own connectors to use their own APIs to access the engines.

So are all AI engines equal? Our predefined actions have and are actively tested with OpenAI engines and with engines provided by Anthropic Claude. We also tried to use engines like Llama 3.1 or Mistral with our current predefined actions but the results are not fantastic, as the possibility to obtain broken DITA XML content with these open source engines is much higher.

Predefined actions

Over time, we improved how we write prompts for the many predefined actions that come with the add-on. For each action, we try to give a clear prompt with lots of details so the AI engine can produce the best results. We test our predefined actions using both manual and automatic tests. We got the best results with Open AI GPT-4 engines and the engines provided by Anthropic Claude, like Sonnet.

Custom actions

We wanted to give companies the flexibility to create their own custom actions using specific prompts to solve their unique problems. With the add-on, we provide a JSON schema that defines all aspects of implementing custom AI actions. The key to a successful action is to make it as precise and detailed as possible.

Content generation

There are two main ways AI-based content generation can be useful:

Summarizing a large amount of text. For example, predefined actions like Short Description try to summarize a big piece of content. These actions usually work very well.
Expanding a small amount of text. Content generation actions like Add Structured Content or Generate Documentation Draft use the AI engine to expand existing content and create valid DITA XML content. Since AI engines can make mistakes if they don't have enough details, we added support for RAG (retrieval augmented generation) functions. These functions use the existing project content to give the AI engine enough useful information to create new DITA XML content. A common use for a writer is to gather data from different sources and create a first draft of a DITA XML topic, which they can refine later. This is what we aim to provide with our predefined actions.

Vision support

Certain actions like Generate Image Alternate Text, Generate Documentation Draft, or Update Content Based on Images use the capabilities of engines like GPT-4o. These engines read content from images provided by the user and then use that information to write new DITA XML content or update existing content. In our experience, these kinds of actions are useful and work quite well to update DITA XML content based on application screenshots.

Rewrite actions

We created useful actions to Correct Grammar, Improve Readability, Use Active Voice, and Improve Structure. These actions are applied to existing content, they have precise prompts that instruct the AI to preserve the existing content as much as possible, and they are solid examples of how the AI can help a writer during their daily work.

Review actions

At some point, we had the idea that the AI, instead of fixing problems, could leave the original content as it is and just add comments around the content that needs to be changed. So, the Proofread action appeared as a way to allow the AI engine to give advice on existing content without making any changes. In a way, the AI engine becomes a reviewer of a first document draft. The Resolve Comments action appeared as a way to allow the AI engine to fix existing comments by changing the content.

Overview actions

We created some useful actions to Summarize or report on the Readability of a DITA XML topic. We also created actions to use AI to better understand what questions a certain topic answers. In the future, using an AI engine might also include asking the AI to find logical inconsistencies in already written content that may be too large for a person to fully digest easily.

Translate actions

There are studies that say the translation abilities in AI engines are better than Google Translate or DeepL. The available translation actions have clear prompts that tell the AI engine to skip certain DITA XML tags like <codeblock> or <codeph>. In general, they seem to do a good job of keeping the existing DITA XML structure, including the parts of reusable content and key references.

Intelligent agents

In the most recent add-on release, we added new actions like Split Topic and Create Topics. These actions give the AI more freedom to process existing content and create new topics on disk. The AI uses a feedback loop where it first proposes a plan of action, evaluates how the content could be changed, and then makes the necessary changes. These actions try to use the AI engine's potential to the max by letting it plan and execute more complex tasks.

Reuse actions

DITA XML is about content reuse. So, how can the AI engine help with this? For now, we have created two predefined actions. These actions use functions to let the AI engine access already reusable components. One action replaces product names with existing key references. The other action retrieves the closest reusable component to the currently selected paragraph. But I think more can be done in this area in the future.

Chat

Using AI engines means accepting that an action might not always give the right content. We give you control to preview the changes suggested by the AI and then keep chatting with the AI to fine-tune the answer until it is correct.

Chatting with the AI can also be a useful way to learn how different concepts work. We added RAG (retrieval augmented generation) support in the chat window. This lets people search for content and get precise answers based on their own documentation project.

Specializations support

So, how does the AI work with DITA XML specializations that might have custom DOCTYPE declarations or custom DITA XML elements?

Most of the built-in actions should work well with DITA XML specializations. For example, actions that rewrite the content or translate it should keep the existing DITA XML elements in the selection unchanged. Actions like New DITA Topic create basic DITA XML topics. However, there are actions like Add Structured Content that expand the content of an already created DITA XML topic. This allows the user to first create the DITA XML topic based on their specialization before using the AI engine. Actions like Improve Structure add base DITA XML elements to the selected content, so they might not use all the specialized DITA XML elements.

Third Party CMS support

Both Oxygen desktop and in-browser editing tools can be integrated with various content management systems. Most AI predefined actions should work when Oxygen is integrated with a third-party CMS. However, actions that write content on disk (like actions from the Intelligent agents category) might not be able to create new topics on the CMS. Actions that are improved by using RAG (retrieval augmented generation) might also not work well since the project content is stored on the CMS. Oxygen desktop has an Enable indexing for remote resources checkbox in the Open/Find Resource preferences page. This allows our indexing engine to index content from the CMS. But the in-browser Web Author editing tool does not yet support indexing CMS-specific content.

I hope this article, based on my experience working on AI editing tools and with AI editing tools, will be useful to you. As always, you can leave feedback using the Oxygen Feedback form, which appears below this published article.