An addon to bulk map, remap, chain-transform and send metadata and binaries to the Ingest service.
Get up and running quickly!
This will ingest all the files that are under the given
<my-root-doc-id>:
curl -ss -u foo:bar -H 'Content-Type: application/json' <myNuxeoUrl>/nuxeo/api/v1/automation/Bulk.RunAction -d \
'{"params":{
"query":"SELECT * FROM Document WHERE ecm:ancestorId = '\''<my-root-doc-id>'\''",
"action":"ingest"
}
}'However, The IngestAction is very flexible and can be parameterized very finely as you will see as you go through other examples.
Ingestion offers a lot of possibilities via mapping and
transformation. You certainly want to stay in dryRun mode
until you have nailed your parameters:
Write the following content in myParameters.json and
inject it the smart way:
{
"dryRun": true
}This will remove all mapping so you can build a new one piece by piece:
false.true.Write the following content in myParameters.json and
inject it the smart way:
{
"dryRun": true,
"aggregateDefaultMapping": false,
"aggregateDefaultTransformer": false,
"replaceMapping": true
}This will get you back to default mapping on a document as if it was going to be ingested for the first time:
true.true.Write the following content in myParameters.json and
inject it the smart way:
{
"dryRun": true,
"aggregateDefaultMapping": true,
"aggregateDefaultTransformer": true,
"replaceMapping": true
}Let’s say you want to adjust your current mapping. You can override it this way:
Write the following content in myParameters.json and
inject it the smart way:
{
"dryRun": true,
"inlineMapping": "!dc:contributors,!dc:description",
"inlineTransformer": "dc:title=meta:name=_Flag"
}That will remove a couple properties and map dc:title to
meta:name while changing its value with the
_Flag function. See Mapping
Documents and Transforming
Documents for more info.
Strip any mapping and transformer and add yours.
Write the following content in myParameters.json and
inject it the smart way:
{
"dryRun": true,
"inlineMapping": "dc:contributors,dc:description",
"inlineTransformer": "dc:title=meta:name=_Flag",
"aggregateDefaultMapping": false,
"aggregateDefaultTransformer": false,
"replaceMapping": true
}That will add a couple properties and map dc:title to
meta:name while changing its value with the
_Flag function. See Mapping
Documents and Transforming
Documents for more info.
Strip any mapping and transformer except defaults and add yours.
Write the following content in myParameters.json and
inject it the smart way:
{
"dryRun": true,
"inlineMapping": "dc:contributors,dc:description",
"inlineTransformer": "dc:title=meta:name=_Flag",
"aggregateDefaultMapping": true,
"aggregateDefaultTransformer": true,
"replaceMapping": true
}That will add a couple properties and map dc:title to
meta:name while changing its value with the
_Flag function. See Mapping
Documents and Transforming
Documents for more info.
Nuxeo associates metadata and content (text,
binaries…).Nuxeo indexes documents and has powerfull search
capabilities.Nuxeo’s metadata are stored in schemas:<schema xmlns:common="http://www.nuxeo.org/ecm/schemas/common/" name="common">
<common:icon>/icons/pdf.png</common:icon>
</schema>
<schema xmlns:dc="http://www.nuxeo.org/ecm/schemas/dublincore/" name="dublincore">
<dc:contributors>
<item>Administrator</item>
</dc:contributors>
<dc:created>2024-11-21T15:38:08.620Z</dc:created>
<dc:creator>Administrator</dc:creator>
<dc:description>A poem from the heart</dc:description>
<dc:lastContributor>Administrator</dc:lastContributor>
<dc:modified>2024-11-21T15:55:19.496Z</dc:modified>
<dc:nature>article</dc:nature>
<dc:title>testPoem</dc:title>
</schema>The ingest service provides a REST API to send your documents to the Insight content lake
The Ingest payload is an array of “ingest events” with two 2 distinguishable parts.
A part of the schema is mandatory. It contains an object id and required fields.
Data is expected this way:
<NUXEO_HOME>/nuxeoctl mp-install nuxeo-hxai-connectornuxeo.conf with desired properties.Please refer
to list of configuration options in below sectionThe following nuxeo.conf properties are available to configure the plugin.
| Property name | description |
|---|---|
| hxai.api.client.id | The Hxai client ID |
| hxai.api.client.secret | The Hxai client secret |
| hxai.api.auth.baseurl | The IDP base url (ex: https://auth.iam.dev.experience.hyland.com) |
| hxai.api.ingest.baseurl | The Hxai ingest base url (ex: https://ingestion-api.insight.dev.ncp.hyland.com/v1) |
| hxai.api.ingest.env.key | The ingest environment key |
| hxinsight.environment.type | The HxAI environment type |
| hxinsight.environment.id | The HxAI environment id |
| hxinsight.service.name | The HxAI Service name |
| nuxeo.hxai.sourceid | The HxAI source ID |
As of today, Ingest only handles binaries at the root of the properties part. This is fine for
simple properties like file:content but doesn’t work for
complex properties nesting binaries, like files:files.
There are several ways to flatten binaries so they comply to Ingest’s
requirements:
Using custom
Mappers to separate, for example,files:files into
multiple simple properties and omit the initial array containing them.
Custom mapping happens before Mapping and Transforming, so the
properties generated during custom mapping can be transformed as
well.
Post-filtering the outgoing JSON payload allows to flatten unnoticed nested binaries. If a complex type containing binaries does not have a custom mapping, we do move the binaries at the root of properties to avoid the binary to be silently ignored by Ingest.
If this was done with files:files, the array containing
the files would remain empty in the JSON payload. Thus the contribution
of a default
IngestiblePropertyMapper
# original structure with a containing array:
{
"my:complex": [
{
"file": {}
},
{
"file": {}
}
]
}
# with a custom mapper, could become:
{
"renamed:transformed/0": {
"file": {}
},
"renamed:transformed/1": {
"file": {}
}
}
# with post-filtering:
{
"my:complex": [],
"my:complex/0": {
"file": {}
},
"my:complex/1": {
"file": {}
}
}To ingest documents efficiently, the
Nuxeo HxAI Connector does the following:
To be refactored Ingest operations of uploading files and sending events are implemented in the HxAi service
We can leverage Nuxeo’s search capabilities to target
documents and send them to ingestion via a query language called
NXQL.
We leverage the Nuxeo Bulk Action Framework
(BAF) which:
NXQL
query.curl -ss -u foo:bar -H 'Content-Type: application/json' <myNuxeoUrl>/nuxeo/api/v1/automation/Bulk.RunAction -d \
'{"params":{
"query":"SELECT * FROM Document WHERE ecm:ancestorId = '\''<my-root-doc-id>'\''",
"action":"ingest",
}
}' -o /dev/nullThe actual list of parameters taken by the Action:
{
"inlineMapping": "dublincore,common",
"inlineTransformer": "a=b=Function,c=d=OtherFunction",
"replaceMapping": false,
"aggregateDefaultMapping": false,
"aggregateDefaultTransformer": false,
"persistMapping": false
}The parameters of the IngestAction are of two types.
Some can be persisted. Some can’t.
See the HxAI facet for more details about parameters’ persistence.
An inline IngestMappingDescriptor to apply to
Documents matching the NXQL query.
An inline IngestTransformerDescriptor to apply to
Documents matching the NXQL query.
Allows to replace the mapping and transformer previously saved on the
Document. Defaults to false
Leverages the default IngestMapping for the
Document based on type. This adds up to
inlineMapping
Leverages the default IngestTransformer for the
Document based on type. This adds up to
inlineTransformerParam
Allows saving all parameters except itself, so:
persistMapping, replaceMapping and
dryRun. This has no effect in
dryRun.
Ingestion can be automated in 2 distinct ways. Schedule-based (default) or purely Event-based (disabled by default).
Schedules are an historical feature of
Nuxeo. They fire an event following a cron expression. More
info. Schedules are the prefered way to ingest your
documents. Indeed, this approach requires read-only access to
documents.
By setting up multiple schedules, the user could run multiple ingestion jobs on subparts of her repository, each with its own config.
Here are 2 Schedules running every other second:
<?xml version="1.0" encoding="UTF-8"?>
<component name="org.nuxeo.hxai.crons.config" version="1.0.0">
<extension target="org.nuxeo.ecm.core.scheduler.SchedulerService" point="schedule">
<schedule id="ingest1">
<eventId>ingest1</eventId>
<eventCategory>ingest</eventCategory>
<cronExpression>0/2 * * * * ?</cronExpression>
</schedule>
<schedule id="ingest2">
<eventId>ingest2</eventId>
<eventCategory>ingest</eventCategory>
<cronExpression>1/2 * * * * ?</cronExpression>
</schedule>
</extension>
</component><?xml version="1.0" encoding="UTF-8"?>
<component name="org.nuxeo.hxai.cron.events.listeners.config" version="1.0.0">
<extension target="org.nuxeo.ecm.core.event.EventServiceComponent" point="listener">
<listener name="ingest1" async="false" postCommit="false" priority="120" class="org.nuxeo.hxai.listeners.IngestListener1">
<event>ingest1</event>
</listener>
<listener name="ingest2" async="false" postCommit="false" priority="120" class="org.nuxeo.hxai.listeners.IngestListener2">
<event>ingest2</event>
</listener>
</extension>
</component>This sample will always execute. You will want to update documents only if they have been updated in the last X time units depending on your specific use cases.
public class IngestListener1 implements EventListener {
@Override
public void handleEvent(Event event) {
String query = "SELECT * from Document WHERE ecm:path = '/default-domain/workspaces/test/test'";
BulkCommand command = new BulkCommand.Builder(IngestAction.ACTION_NAME, query,
SYSTEM_USERNAME).param(INLINE_MAPPING, "files:files,file:content,dublincore,tags,foo:bar")
.param(INLINE_TRANSFORMER, "files:files/=my:binaries")
.param(REPLACE_MAPPING, true)
.param(DRY_RUN_MODE, true)
.build();
Framework.getService(BulkService.class).submit(command);
}
}The IngestUpdateListener will trigger ingestion on a
document when it is updated if it has the hxai
facet. However, for the document to have the facet, you must have
already sent it for ingestion once with the mapping persistence.
This is not the prefered way to tackle ingestion because:
Nuxeo’s side.It is disabled by default (see Automating Ingestion). You can enable it by contributing the following:
<?xml version="1.0" encoding="UTF-8"?>
<component name="org.nuxeo.hxai.events.listener.config.test" version="1.0.0">
<require>org.nuxeo.hxai.events.listener.config</require>
<extension target="org.nuxeo.ecm.core.event.EventServiceComponent" point="listener">
<listener name="ingestlistener" enabled="true"/>
</extension>
</component>The HxAI facet acts like a flag to tell Nuxeo that a
document’s ingestion has been done already once and is eligible for
ingestion update when necessary.
The hxai schema holds some valuable ingestion-related
informations. Aside from the ingestion status, the following
IngestAction parameters allow to repeat document ingestion
exactly the same way it was last done if desired (for update):
The following IngestAction parameters are not
storable:
Default configuration is based on Document type. If you
want to register a default IngestMappingDescriptor or a
default IngestTransformerDescriptor for a certain
Document type, simply give the type as
Descriptor id.
IngestMappingDescriptor is defined for a
Document type, the fallback will be default:
IngestTransformerDescriptor is
provided.Please, see Contributed Mappings
and keep in mind that if you want to override default (the
default IngestMappingDescriptor) you need to require the
IngestMappingServiceComponent:
<require>"org.nuxeo.hxai.IngestMappingServiceComponent"</require>Those parameters need to be stringified to be sent in our query (in an additional parameters key):
We can write stringified JSON by hand, escaping all sensitive characters:
"{\"inlineMapping\":\"dublincore,common\",\"inlineTransformer\":\"a=b=Function,c=d=OtherFunction\",\"replaceMapping\":false,\"aggregateDefaultMapping\":false,\"aggregateDefaultTransformer\":false,\"persistMapping\":false}"But who wants to do that? Let’s simply do:
$(< myParams.json | jq -c | jq -R)Thus, the complete query becomes as below:
curl -ss -u foo:bar -H 'Content-Type: application/json' <myNuxeoUrl>/nuxeo/api/v1/automation/Bulk.RunAction -d \
'{"params":{
"query":"SELECT * FROM Document WHERE ecm:ancestorId = '\''<my-root-doc-id>'\''",
"action":"ingest",
"parameters": "{\"inlineMapping\":\"dublincore,common\",\"inlineTransformer\":\"a=b=Function,c=d=OtherFunction\",\"replaceMapping\":false,\"aggregateDefaultMapping\":false,\"aggregateDefaultTransformer\":false,\"persistMapping\":false}
}
}'curl -ss -u foo:bar -H 'Content-Type: application/json' <myNuxeoUrl>/nuxeo/api/v1/automation/Bulk.RunAction -d \
'{"params":{
"query":"SELECT * FROM Document WHERE ecm:ancestorId = '\''<my-root-doc-id>'\''",
"action":"ingest",
"parameters": '$(< myParams.json | jq -c | jq -R)'
}
}'Mappings are defined this way:
Although they are supported, it is discouraged to use unprefixed properties:
files # will add files:files to the mappingdc:title # adding single properties, one by one.dublincore # map the 18 properties present in dublincoremore info about Mapping references
@myMappingReference # map all the mappings found in the 'myMappingReference' MappingChaining , separated Mappings will help building
complete mappings in one line:
dc:title,dc:description #,.. there are 18 properties with the dc: prefix...
# I don't want to type them all, just take the whole dublincore schema! (and the common schema! because why not?)
dublincore,common
# I can also get "simples", properties without a prefix
dublincore,icon
# OK I want the whole dublincore and common schemas, except dc:title
dublincore,icon,!dc:title
# Order matters
dublicore,!dc:title # OK: add all dublincore except dc:title
!dc:title,dublincore # Useless: removes dc:title but adds it back!Those baseline default mappings are applied to documents whithout document type specific default mappings:
dublincorefile:contentfiles:filesIf a mapping contribution’s id is a document type, it will be used as default mapping instead of the baseline defaults for that document type. See Contributing Mappings.
The IngestMappingDescriptor can be contributed via
XML, validated and ready to use at runtime.
MappingDescriptors is a centralized way to
define mappings.id<?xml version="1.0"?>
<component name="org.nuxeo.hxai.IngestMappingServiceComponent.test.referencing" version="1.0">
<extension target="org.nuxeo.hxai.IngestMappingServiceComponent" point="ingestMappings">
<!--default for Picture typed documents-->
<ingest id="Picture">
<properties>dc:title,icon,relatedtext:relatedtextresources</properties>
</ingest>
<!--to be referred to as @first-->
<ingest id="first">
<properties>dc:title,icon,relatedtext:relatedtextresources</properties>
</ingest>
<ingest id="second">
<properties>dc:description,uid:major_version,uid:minor_version</properties>
</ingest>
<ingest id="third">
<properties>dc:content-type</properties>
</ingest>
</extension>
</component>Now let’s say I have a contributed Mapping with 45 properties in it:
IngestMappingDescriptor, so I can reuse it at will.I can avoid retyping 44 properties and keep strong connection with the original mapping by nesting it.
<?xml version="1.0"?>
<component name="org.nuxeo.hxai.IngestMappingServiceComponent.test.referencing" version="1.0">
<extension target="org.nuxeo.hxai.IngestMappingServiceComponent" point="ingestMappings">
<ingest id="first">
<properties>@bigMapping,!un:wantedprop</properties>
</ingest>
</extension>
</component>Just like in the XML contributions:
dublincore,@first # duplicate dc:title mapping will be processed only once
# what if I don't want the relatedtext:relatedtextresources brought by @first ?
dublincore,@first,!relatedtext:relatedtextresources # let's take it offIn the same IngestMappingDescriptor, use schemas,
properties, mapping references to add and remove whatever we want.
It is a good practice to add the removal mapping expressions at the end, so they don’t come back by mistake.
<?xml version="1.0"?>
<component name="org.nuxeo.hxai.IngestMappingServiceComponent.test.referencing" version="1.0">
<extension target="org.nuxeo.hxai.IngestMappingServiceComponent" point="ingestMappings">
<ingest id="mixItAllUp">
<properties>common,dc:title,@bigMapping,!@optionalMappings,!un:wantedprop,!uid</properties>
</ingest>
</extension>
</component>Logs are an important part of this module. They can pin point error
in your Descriptors
Allow following recursive instanciation:
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: default
DEBUG [IngestMappingServiceImpl] IngestMapping: 'default' was processed successfully.
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: first
DEBUG [IngestMappingServiceImpl] IngestMapping: first directly depends on: second
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: second
DEBUG [IngestMappingServiceImpl] IngestMapping: second directly depends on: third
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: third
DEBUG [IngestMappingServiceImpl] IngestMapping: 'third' was processed successfully.
DEBUG [IngestMappingServiceImpl] IngestMapping: 'second' was processed successfully.
DEBUG [IngestMappingServiceImpl] IngestMapping: 'first' was processed successfully.Allow to follow what happens for each mapping:
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: default
TRACE [SimpleIngestMapping] the 'dublincore' mapping was identified as a schema.
TRACE [SimpleIngestMapping] processing mapping: 'dublincore'
DEBUG [IngestMappingServiceImpl] IngestMapping: 'default' was processed successfully.
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: first
DEBUG [IngestMappingServiceImpl] IngestMapping: first directly depends on: second
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: second
DEBUG [IngestMappingServiceImpl] IngestMapping: second directly depends on: third
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: third
TRACE [SimpleIngestMapping] the 'dc:content-type' mapping was identified as a property.
TRACE [SimpleIngestMapping] processing mapping: 'dc:content-type'
DEBUG [IngestMappingServiceImpl] IngestMapping: 'third' was processed successfully.
TRACE [SimpleIngestMapping] the 'dc:description' mapping was identified as a property.
TRACE [SimpleIngestMapping] processing mapping: 'dc:description'
TRACE [SimpleIngestMapping] the '@third' mapping was identified as reference to another mapping.
TRACE [SimpleIngestMapping] processing mapping: '@third'
DEBUG [IngestMappingServiceImpl] IngestMapping: 'second' was processed successfully.
TRACE [SimpleIngestMapping] the 'dc:title' mapping was identified as a property.
TRACE [SimpleIngestMapping] processing mapping: 'dc:title'
TRACE [SimpleIngestMapping] the '@second' mapping was identified as reference to another mapping.
TRACE [SimpleIngestMapping] processing mapping: '@second'
DEBUG [IngestMappingServiceImpl] IngestMapping: 'first' was processed successfully.Since we have opened the way for mapping references, we have cycle detection. Let’s consider the following:
<ingest id="first">
<properties>dc:title,@second</properties>
</ingest>
<ingest id="second">
<properties>dc:description,@third</properties>
</ingest>
<ingest id="third">
<properties>dc:content-type,</properties>
</ingest>Let’s break it with an override making third depend on forth and make a cyclic reference from forth to second:
<ingest id="third">
<properties>dc:content-type,@foo,@forth</properties>
</ingest>
<ingest id="forth">
<properties>@second</properties>
</ingest>This will not allow Nuxeo to start (avoiding further
harm) but we also need a way to track the problem.
Easily find cyclic references:
// TL;DR
java.lang.IllegalArgumentException: Detected cycle in IngestMapping: first->second->third->forth->second
// Full stack
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: default
TRACE [SimpleIngestMapping] the 'dublincore' mapping was identified as a schema.
TRACE [SimpleIngestMapping] processing mapping: 'dublincore'
DEBUG [IngestMappingServiceImpl] IngestMapping: 'default' was processed successfully.
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: first
DEBUG [IngestMappingServiceImpl] IngestMapping: first directly depends on: second
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: second
DEBUG [IngestMappingServiceImpl] IngestMapping: second directly depends on: third
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: third
DEBUG [IngestMappingServiceImpl] IngestMapping: third directly depends on: foo->forth
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: foo
TRACE [SimpleIngestMapping] the 'common' mapping was identified as a schema.
TRACE [SimpleIngestMapping] processing mapping: 'common'
DEBUG [IngestMappingServiceImpl] IngestMapping: 'foo' was processed successfully.
DEBUG [IngestMappingServiceImpl] processing mapping descriptor: forth
DEBUG [IngestMappingServiceImpl] IngestMapping: forth directly depends on: second
ERROR [RegistrationInfoImpl] Component service:org.nuxeo.hxai.IngestMappingServiceComponent notification of application started failed: Detected cycle in IngestMapping: first->second->third->forth->second
java.lang.IllegalArgumentException: Detected cycle in IngestMapping: first->second->third->forth->second
at org.nuxeo.hxai.service.IngestMappingServiceImpl.processMappingDescriptor(IngestMappingServiceImpl.java:67) ~[classes/:?]
at org.nuxeo.hxai.service.IngestMappingServiceImpl.lambda$processMappingDescriptor$5(IngestMappingServiceImpl.java:93) ~[classes/:?]IngestiblePropertyMappers allow you to map certain
properties the way you want. It is useful to customize how complex
properties will be mapped. They implement
java.util.function.Consumer<PropertyMappingContext>
which allows them to access any element inside the properties part object of the
IngestibleDocument. This allows to flatten a single
property into multiple ones. This is useful in the case of
files:files for example, which is spread into multiple
files:files/n entries.
Here is the default contribution to map my:property. It
indicates the Mapper to use
my.custom.Mapper.
<?xml version="1.0" encoding="UTF-8"?>
<component name="my.component" version="1.0">
<extension target="org.nuxeo.hxai.IngestMappingServiceComponent" point="ingestPropertyMappers">
<ingestPropertyMappers id="myFileMappers">
<class property="my:property">my.custom.Mapper</class>
</ingestPropertyMappers>
</extension>
</component>The IngestiblePropertyMappers don’t merge but replace
each other. You can still containerize your
IngestiblePropertyMappers by descriptor ID.
The default IngestionMappingService configuration comes
with a single IngestiblePropertyMapper for
files:files. As of today, it does mandatory work to comply
to the Ingest REST API specs and should not be touched: it flattens the
binaries in files:files at the root of the properties part of the JSON object
representing each document.
To target a property, you need to write its prefixed form in the
property attribute of each
IngestPropertyMapper class. See Sample custom contribution for
details.
Transforming regroups two things: remapping keys and actually transforming values.
Transformations are 3-optional-ways parameters.
It is done this way:
# Remap only
dc:=base: # Remap all dublincore properties to prefix them with 'base'
:title=:name # Remap all properties suffixed 'title' and apply Function to them
files:file/=ingest:binaries # Remap all files:files/whatever into ingestion:binaries/whatever
# Transform only
==Function # Apply Function to everything
a==Function # Apply Function to a, don't rename it
# Remap and transform
a=b=Function # Map simple property a to b and apply Function
:title=:name=Function # Remap all properties suffixed 'title' and apply Function to them
a:b=c:d=Function # Exactly map a:b to c:d and apply Function to it
files:files/=ingestion:binaries=Function # Remap all flattened items from files:files/whatever to ingestion:bindaries/whatever and apply Function to them one by oneTransformations can be chained (joined by ,
separators) into a Transformer, which will apply them in
order. Here is an inline IngestTransformerDescriptor:
# ⚠️ The following will not work as expected ⚠️
a=b=Function,a=b=OtherFunction # After being transformed int b, a is not matched by the second transformation and OtherFunction is not applied.
# This would work but there is a better way bellow
a=b=Function,b==OtherFunction # Function will be applied before OtherFunctionThe functions used by Transformationss implement
java.util.Function<Serializable, Serializable> this
allows chaining them:
# The most reliable solution to chain functions on a single property doesn't require you to figure things out:
a=b=Function1=Function2=Function3,c==Function1=Function3 # a is renamed to b and Function1 to 3 are applied to it in order. c will then be transformed by Function1, then Function3
# hard to distinguish both Transformations from each other? Add some comas. It's free!
a=b=Function1=Function2=Function3,,,c==Function1=Function3 # same resultThere is a default package for functions used by
Transformers. If you put your functions there, you don’t
need to specify their package:
// assumed package
org.nuxeo.hxai.ingest.functionsHowever, functions can be anywhere else:
MyFunction # points to org.nuxeo.hxai.ingest.functions.MyFunction
.MyFunction # same thing
.my.sub.package.MyOtherFunction # points to org.nuxeo.hxai.ingest.functions.my.sub.package.MyFunction
my.complete.package.MyFunction # use a cannonical nameA few provided functions:
# The underscore is to differenciate bundled test functions from others.
_Flag # will assure you touched a property
_Concat # will concatenate a distinguishable value to the property value
_Count # initiates or increments a numeric value to tell you how many times it was appliedThe IngestTransformerDescriptor can be contributed via
XML, validated and ready to use at runtime. They are a
centralized way to define remappings and transformations.
<?xml version="1.0"?>
<component name="org.nuxeo.hxai.IngestMappingServiceComponent.test.transforming" version="1.0">
<extension target="org.nuxeo.hxai.IngestMappingServiceComponent" point="ingestTransformers">
<transformer id="example1">
<!-- dc:title will be remapped as foo:bar and transformed with the indicated implementation of function<serializable, serializable> -->
<transformations>dc:title=foo:bar=MyFunction</transformations>
</transformer>
<transformer id="example2"><transformations>foo:bar=dc:title=MyUnfunction</transformations></transformer>
</extension>
</component>Transformations can be malformed too. Malformed
Contributions will be caught at the initialization of
Nuxeo:
// Missing left side
DEBUG [SimpleIngestTransformer$Transformation] Instanciating Transformation: 'inline#=c=_Flag'.
TRACE [SimpleIngestTransformer$Transformation] Transformation: 'inline#=c=_Flag' left side: 'null' is of type: 'STAR' right side: 'c' is of type: 'SIMPLE'.
java.lang.IllegalArgumentException: Malformed Transformation: 'inline#=c=_Flag' with a missing left side.
// Left side only
DEBUG [SimpleIngestTransformer$Transformation] Instanciating Transformation: 'inline#a=='.
TRACE [SimpleIngestTransformer$Transformation] Transformation: 'inline#a==' left side: 'a' is of type: 'SIMPLE' right side: 'null' is of type: 'STAR'.
java.lang.IllegalArgumentException: Malformed Transformation: 'inline#a==' with a left side only.
// Right side only
DEBUG [SimpleIngestTransformer$Transformation] Instanciating Transformation: 'inline#=c='.
TRACE [SimpleIngestTransformer$Transformation] Transformation: 'inline#=c=' left side: 'null' is of type: 'STAR' right side: 'c' is of type: 'SIMPLE'.
java.lang.IllegalArgumentException: Malformed Transformation: 'inline#=c=' with a right side only.As we said, Transformations have a remapping role. This
is parameterized in the left and right side which are none other than
XPaths. Thus, the prefix is like a directory and the suffix
is like a file inside that directory.
So we need to be careful not to make excessive mapping, which means mapping several properties to the same target:
// All a: prefixed properties would end up overriding each other as the simple c property (like a:foo, a:bar, a:baz, a:qux would overlap as 'icon' for example)
XPath: 'a:' cannot be the left side of: 'c' in Transformation: 'inline#a:=c=_Flag'. 'a:' is a prefix and can only be mapped to another prefix.you may also want to see the transformation combinations glossary
There are many possible combinations:
The full form of a remapping looks like so:
1:2=3:4| Symbol | Meaning |
|---|---|
| ✅ | valid remap |
| ⚪️ | no remap |
| ❌ | invalid remap (many possible source, one target) |
| Status | Pattern | Meaning |
|---|---|---|
| ⚪️ | = | star to star |
| ❌ | =3 | star to simple |
| ❌ | =3: | star to prefix |
| ❌ | =:4 | star to suffix |
| ❌ | =3:4 | star to full |
| Status | Pattern | Meaning |
|---|---|---|
| ⚪️ | 1= | simple to star |
| ✅ | 1=3 | simple to simple |
| ✅ | 1=3: | simple to prefix |
| ✅ | 1=:4 | simple to suffix |
| ✅ | 1=3:4 | simple to full |
| Status | Pattern | Meaning |
|---|---|---|
| ⚪️ | 1:= | prefix to star |
| ❌ | 1:=3 | prefix to simple |
| ✅ | 1:=3: | prefix to prefix |
| ❌ | 1:=:4 | prefix to suffix |
| ❌ | 1:=3:4 | prefix to full |
| Status | Pattern | Meaning |
|---|---|---|
| ⚪️ | :2= | suffix to star |
| ❌ | :2=3 | suffix to simple |
| ❌ | :2=3: | suffix to prefix |
| ✅ | :2=:4 | suffix to suffix |
| ❌ | :2=3:4 | suffix to full |
| Status | Pattern | Meaning |
|---|---|---|
| ⚪️ | 1:2= | full to star |
| ✅ | 1:2=3 | full to simple |
| ✅ | 1:2=3: | full to prefix |
| ✅ | 1:2=:4 | full to suffix |
| ✅ | 1:2=3:4 | full to full |
There are less than for the remapping combinations glossary, but still quite a few combinations possible:
The full form of a transformation looks like so:
left=right=function1[[=function2]...]where left and right are parts or a valid remapping
| Symbol | Meaning |
|---|---|
| ✅ | valid transformation |
| ⚪️ | no transformation |
| ❌ | invalid transformation |
| Status | Pattern | Meaning |
|---|---|---|
| ⚪️ | == | no transformation (also valid for
[=,:]*) |
| Status | Pattern | Meaning |
|---|---|---|
| ✅ | ==Function | Transform every value |
| ❌ | =right= | Only right side provided |
| ❌ | =right=Function | Missing left side |
| Status | Pattern | Meaning |
|---|---|---|
| ❌ | left== | Left side only provided |
| ✅ | left==Function | Transform value for keys matching left expression without remapping |
| Status | Pattern | Meaning |
|---|---|---|
| ✅ | left=right= | Remap left matching keys to right expression |
| Status | Pattern | Meaning |
|---|---|---|
| ✅ | left=right=Function | Transform value for keys matching left expression without remapping |
CI/CD workflows are present here and they include:
CI for PR: Build and test of the source code upon raising a PR against 2023 branch.
Deployment of package to Pre-production Marketplace: Merging a PR onto 2023 branch would trigger the CI followed by deployment of the generated package to the nuxeo pre-prod marketplace.
Release to Production Marketplace: A manual release job is available which, when run on a base branch, deploys the latest available minor tag version to the production marketplace.
Once a release of MAJOR.MINOR+1.0 happens to production, the project’s version is bumped automatically to MAJOR.MINOR+1-SNAPSHOT so that, until the next release of MAJOR.MINOR+2 to production, the upcoming PR merges will be deployed to pre-production as MAJOR.MINOR+1.1, MAJOR.MINOR+1.2 and so on.
Do not edit this file directly, it is autogenerated as well as
content.html which serves as the package’s embeded
documentation.
The file to edit is doc/README.md, then you need to
generate its Table Of Content (TOC):
pandoc --toc --toc-depth=6 -s -t gfm -o README.md doc/README.md⚠️ The CI build and test pipeline will verify that this file,
README.md, is equal to the result of the above command.
This is to make sure:
TOC without having to remove it first by
handTOC to be outdatedThe content.html is not a source file, it is generated
in the CI build and test pipeline by the following command and packaged
appropriately:
Responsive to your OS settings:
pandoc --toc --toc-depth=6 -s --embed-resources --css doc/nour-auto-lail-nahar.css --highlight-style doc/nour-lail.theme -o <path-to-be-determined.html> doc/README.mdpandoc --toc --toc-depth=6 -s --embed-resources --css doc/nour-lail.css --highlight-style doc/nour-lail.theme -o <path-to-be-determined.html> doc/README.mdpandoc --toc --toc-depth=6 -s --embed-resources --css doc/nour-nahar.css --highlight-style doc/nour-lail.theme -o <path-to-be-determined.html> doc/README.mdYou can add cosmetics (neon logo and title) with following extras:
-V logo="$(< doc/connector.svg)" --template doc/nuxeo-hxai-connector-template.html