# OpenAI Summary (ChatGPT) Nuxeo Integration

## What
Generates a text summary from a file using OpenAI ChatGPT

![Summary example](files/Summary-example.png)

## Architecture

**Input**: Upload a file in Nuxeo

**Concepts**:


a. **Blob Conversion**: If not already a PDF, the input text file is converted to PDF using the registered convertors.

b. **Nuxeo Stream(Bulk Actions)**: Nuxeo Stream(Bulk Actions): A Nuxeo Bulk Action is used to process the text chunks and assembly the summary asynchronously:

* A Producer/Consumer pattern is used to produce the summary
* SummaryServiceImpl.summaryProducer() -> Converts document to PDF (if not a PDF already), splits text into pages and produces records
* PageSummaryComputation -> For each text chunk, calls the OpenAI summary endpoint and saves the result to a KVS
* SummaryDoneComputation -> Once all consumers are done, saves the merged summary from the KVS of all pages in the correct order

c. **Event Listeners**:
* SummaryListener -> listener is notified as soon as a document is created or modified. Checks if the main blob is dirty and than fires an "extractSummaryEvent" event
* BulkSummarizeListener -> catches "extractSummaryEvent", converts the blob to a PDF and splits the document into pages (eq the producer). This listener is configured to run into a dedicated queue: "bulkSummarizeListener"


**Output**: A Nuxeo document with the merged summary of all PDF pages. A facet  "SummaryFacet" is dynamically added to all documents that can be summarized.  The summary is saved on the document in a new property: "summary:summary". The following properties are also set: 

"summary:lastComputed" -> last date the summary was computed 

"summary:status" -> DONE or ERROR if the cummary could not be generated

The architectural design provides a high-level overview of the solution, illustrating the key components and their interactions. It ensures that the solution processes the PDF pages one at a time, minimizing memory usage, and leverages Nuxeo's Stream Service for asynchronous processing.

## Configuration
1. Add openai api token in nuxeo.conf
* openai.token:<api-token>

2. Default configurations:


* summary.extraction.openai.url=https://api.openai.com/v1/completions
* summary.extraction.openai.model=text-davinci-003
* summary.extraction.openai.temperature=0.3
* summary.extraction.openai.max-tokens=200
* summary.extraction.openai.top-p=1
* summary.extraction.openai.presence-penalty=0
* maretha.summary.maxRetries=10
* ##Enables or disables generating the summary automatically at document creation
* feature.summary.auto.generation.enabled=true
## Supported mime-types
* summary.extraction.enable.mime-types=text/plain,application/pdf,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document

The concurrency and retry policies are configured in summary-stream-contrib.xml. 
By default, maxRetries is set to 10, but this value can be increased. Nuxeo retries every 10 seconds.



## OpenApi documentation
https://platform.openai.com/docs/api-reference/completions

####License
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

## DEBUG
https://doc.nuxeo.com/rest-api/1/stream-endpoint/

* GET http://localhost:8080/nuxeo/api/v1/management/stream/consumer/position?stream=bulk/summaryStream&consumer=bulk/summaryStream
* GET http://localhost:8080/nuxeo/api/v1/management/stream/consumer/position?stream=bulk/summaryStream&consumer=bulk/summaryCompletion

* GET http://localhost:8080/nuxeo/api/v1/management/stream/consumer/stop?consumer=bulk/summaryStream
* GET http://localhost:8080/nuxeo/api/v1/management/stream/consumer/start?consumer=bulk/summaryStream

* GET http://localhost:8080/nuxeo/api/v1/management/stream/cat?stream=bulk/summaryStream&fromGroup=bulk/scroller&rewind=1&timeout=60s

* PUT http://localhost:8080/nuxeo/api/v1/management/stream/consumer/position/end?stream=bulk/summaryStream&consumer=bulk/summaryCompletion
* PUT http://localhost:8080/nuxeo/api/v1/management/stream/consumer/position/end?stream=bulk/summaryCompletion&consumer=bulk/done