The Entity Extraction transform includes options that control which language, dictionaries, and rules to use for extraction. The Processing Options group includes specific configuration parameters for processing.
Common
The Common option group includes a setting to run the transform as a separate process.
Option |
Description |
---|---|
Run as Separate Process |
Yes: Splits the transform into a separate process. |
Languages
The Languages option group includes settings to process content in different languages, such as English, German, and French. If the input content is in a language other than the specified languages, you might see unexpected results.
Note:
Predefined entities are entities associated with different languages and are part of the language modules. These entities are extracted by default.
Option |
Description |
---|---|
Language |
Specifies the language for processing your content. You may select another language from the list of available languages displayed alphabetically in the drop-down menu. |
Default Language |
Specifies the default language that the transform should assume if the Language option was Auto and the transform could not identify a language. |
Filter By Entity Types |
Specifies a list of entity types (supported by the selected language) to use for filtering the extraction output. |
Processing Options
The Processing Options group includes configuration settings for the transform. They affect how the transform will process the content before generating the extraction output.
The Dictionary Only option is most useful when you want to extract entities based solely on entities defined in a dictionary. For example, you want to match exactly the product and customer names from your custom dictionary and you are not interested in any other extraction output. In such a case, getting predefined entities from the extraction process will not be of interest.
The Processing Timeout option is most useful when you want to limit the amount of time spent on processing large content or content that take a very long time to process.
Option |
Description |
---|---|
Dictionary Only |
Use this option to limit the extraction process to use entities defined only in the specified dictionaries. You must specify a dictionary file to use this option. |
Advanced Parsing |
Specifies whether advanced parsing information should be produced during extraction. Advanced parsing enriches linguistic processing including richer noun phrase structure, noun phrase coordination, and syntactic function attributes that can be leveraged in custom rules. |
Processing Timeout |
Use this option to stop processing the content after a set amount of time. By default, the Processing Timeout option is set to 60 seconds. The Processing Timeout value can be one of the following:
|
Document Properties |
Specifies whether document properties of a binary document should be extracted, if they are present in the document. A value of YES causes the extraction, and a value of NO (the default) causes no extraction.
|
Dictionaries
The Dictionaries option group includes settings to process content by specifying one or more dictionaries that should be used when performing extraction. It also enables filtering by entity types defined in each dictionary.
The Dictionaries option group is comprised of individual dictionaries. You can configure the transform to use multiple dictionaries. These options are found under Dictionaries > Dictionary > Dictionary File.
Option |
Description |
---|---|
Dictionary |
Use this option to add dictionaries that should be used during extraction or delete an existing dictionary. Right-click this option and select the option to duplicate an entry or to delete an entry. |
Dictionary file |
Use the Browse option under the drop-down menu to select a valid, compiled dictionary file to use for extraction. |
Filter By Entity Types |
Specifies a list of entity types (defined in the selected dictionary) to use for filtering the extraction output. |
Rules
The Rules option group includes settings to process content by specifying one or more extraction rules to use when performing extraction. It also enables filtering by rule names defined in each rule file.
The Rules option group includes individual rules. You can configure the transform to use multiple rules. These options are found under Rules > Rule > Rule File.
Option |
Description |
---|---|
Rule |
Use this option if you want to add rules that should be used during extraction or to delete an existing rule. Right-click on this option and select the option to duplicate an |
Rule File |
Use the Browse option under the drop-down menu to select a valid, compiled rule file to use for extraction. |
Filter By Rule Names |
Specifies a list of rule names (defined in the selected rule file) to use for filtering the extraction output. |