Skip to end of metadata
Go to start of metadata
Summary

The following instructions explain how to customize the Voice of the Customer (VoC) rules to better fit a particular domain in each of the four supported languages: English, German, French, Spanish.


The instructions use an example of a plumbing company wanting to learn more about specific problems with their products such as “leaky pipes”. Customizing the rules for this example consists of adding the adjective “leaky” in the rules so that this issue can be extracted as a MajorProblem. The rules need to be recompiled to create new FSM files after making the changes.

IMPORTANT NOTE:  Back up the rule files delivered with the language modules before making any changes so that you can restore the original files if your customizations do not work as expected.

English 

1. Modifying the source rule file

Open LINK_DIR/TextAnalysis/languages/rulesources/english-tf-voc-thesaurus.rul and add the adjective “leaky” to the #define called def_problemMajorAdj:

#define def_problemMajorAdj: {
( abusive
| botched
| instable
…
| leaky
…
)}


2. Renaming the original FSM file

To avoid overwriting the original FSM file delivered with the English language module, go to the LINK_DIR/TextAnalysis/languages folder and rename the file english-tf-voc-sentiment.fsm to english-tf-voc-sentiment.fsm.ORIG.

3. Recompiling

Open a command prompt (or shell) and navigate to the LINK_DIR/bin directory that contains the rule compiler (tf-cgc). Use the following command to recompile the modified rule:

tf-cgc -i ../TextAnalysis/languages/rulesources/english-tf-voc-sentiment.rul -o ../TextAnalysis/languages/english-tf-voc-sentiment.fsm

NOTE: If you receive a compilation error, copy all of the LINK_DIR/TextAnalysis/rulesources/english-tf-voc*.rul files to the LINK_DIR/bin directory.

4. Testing 

a) Create an unstructured text file containing the sentence: The pipes are leaky.

b) Launch the Data Services Designer and create a job to process this file with the Entity Extraction transform under Text Data Processing configured for English with the recompiled rule file.

c) Execute the job and inspect the output which matches the following comma-delimited format:

1,-1,The pipes are leaky!,Sentiment,The pipes are leaky!,RULE,0,20,1,1

2,1,pipes,Topic,pipes,RULE,4,5,1,1

3,1,leaky,MajorProblem,leaky,RULE,14,5,1,1      

The adjective "leaky" is now identified as a major problem due to the VoC customization applied.

German 

1. Modifying the source rule file

Open LINK_DIR/TextAnalysis/languages/rulesources/german-tf-voc-thesaurus.rul and comment out the line “undicht” from the #define def_problemadj:

#define def_problemadj: { (
.+bedürftig
| abbrechen
…
!!| undicht
…
)}

 Add “undicht” to the #define called def_majorproblemadj. This is to make sure “undicht” gets extracted as MajorProblem and not MinorProblem:
 

#define def_majorproblemadj: {( 
defekt
| kaputt
…
| undicht
…
)}

2. Renaming the original FSM file

To avoid overwriting the original FSM file delivered with the German language module, go to the LINK_DIR/TextAnalysis/languages folder and rename the file german-tf-voc-sentiment.fsm to german-tf-voc-sentiment.fsm.ORIG.

3. Recompiling

Open a command prompt (or shell) and navigate to the LINK_DIR/bin directory that contains the rule compiler (tf-cgc). Use the following command to recompile the modified rule:

tf-cgc -i ../TextAnalysis/languages/rulesources/german-tf-voc-sentiment.rul -o ../TextAnalysis/languages/german-tf-voc-sentiment.fsm

NOTE: If you receive a compilation error, copy all of the LINK_DIR/TextAnalysis/rulesources/german-tf-voc*.rul files to the LINK_DIR/bin directory.

4. Testing

a) Create an unstructured text file containing the sentence: Die Rohre sind undicht.

b) Launch the Data Services Designer and create a job to process this file with the Entity Extraction transform under Text Data Processing configured for German with the recompiled rule file.

c) Execute the job and inspect the output which matches the following comma-delimited format:

1,-1,Die Rohre sind undicht,Sentiment,Die Rohre sind undicht,RULE,0,22,1,1

2,1,Die Rohre,Topic,Die Rohre,RULE,0,9,1,1

3,1,undicht,MinorProblem,undicht,RULE,15,7,1,1 

The adjective "undicht" is now identified as a major problem due to the VoC customization applied.

French

1. Modifying the source rule file

Open LINK_DIR/TextAnalysis/languages/rulesources/french-tf-voc-thesaurus.rul and add the expressions “qui fuit” and "qui coule" to the subgroup called ProblemMajorAdj:

#subgroup ProblemAdj_Major: {(<\p{ci}
(affreux|affreuses?|afreux|afreuses?
|atroces?
…
|(<\p{ci}(qui)> <\p{ci}(fuit|fuient|coule|coulent)>)
…
)}                     


2. Renaming the original FSM file

To avoid overwriting the original FSM file delivered with the French language module, go to the LINK_DIR/TextAnalysis/languages folder and rename the file french-tf-voc-sentiment.fsm to french-tf-voc-sentiment.fsm.ORIG.

3. Recompiling

Open a command prompt (or shell) and navigate to the LINK_DIR/bin directory that contains the rule compiler (tf-cgc). Use the following command to recompile the modified rule:

tf-cgc -i ../TextAnalysis/languages/rulesources/french-tf-voc-sentiment.rul -o ../TextAnalysis/languages/french-tf-voc-sentiment.fsm

NOTE: If you receive a compilation error, copy all of the LINK_DIR/TextAnalysis/rulesources/french-tf-voc*.rul files to the LINK_DIR/bin directory.

4. Testing

a) Create an unstructured text file containing the sentence: Nous avons des tuyaux qui fuient.

b) Launch the Data Services Designer and create a job to process this file with the Entity Extraction transform under Text Data Processing configured for French with the recompiled rule file.

c) Execute the job and inspect the output which matches the following comma-delimited format:

1,-1,des tuyaux qui fuient,Sentiment,des tuyaux qui fuient,RULE,11,21,1,1

2,1,des tuyaux,Topic,des tuyaux,RULE,11,10,1,1

3,1,qui fuient,MajorProblem,qui fuient,RULE,22,10,1,1

The expressions "qui fuit" and "qui coule" are now identified as major problems due to the VoC customization applied.

Spanish

1. Modifying the source rule file

Open LINK_DIR/TextAnalysis/languages/rulesources/spanish-tf-voc-thesaurus.rul and add the nouns “agujero" and "fuga" to the #define called MAPnounSoundness:

#define MAPnounSoundness: {agujero
| choque
| falla
| fuga
| quiebra
| riesgo
| ruina
}


 

2. Renaming the original FSM file

To avoid overwriting the original FSM file delivered with the Spanish language module, go to the LINK_DIR/TextAnalysis/languages folder and rename the file spanish-tf-voc-sentiment.fsm to spanish-tf-voc-sentiment.fsm.ORIG.

3. Recompiling

Open a command prompt (or shell) and navigate to the LINK_DIR/bin directory that contains the rule compiler (tf-cgc). Use the following command to recompile the modified rule:

tf-cgc -i ../TextAnalysis/languages/rulesources/spanish-tf-voc-sentiment.rul -o ../TextAnalysis/languages/spanish-tf-voc-sentiment.fsm

NOTE: If you receive a compilation error, copy all of the LINK_DIR/TextAnalysis/rulesources/spanish-tf-voc*.rul files to the LINK_DIR/bin directory.

4. Testing

a) Create an unstructured text file containing the sentence: Tenemos una fuga. Tenemos agujeros.

b) Launch the Data Services Designer and create a job to process this file with the Entity Extraction transform under Text Data Processing configured for Spanish with the recompiled rule file.

c) Execute the job and inspect the output which matches the following comma-delimited format:

1,-1,Tenemos una fuga,Sentiment,Tenemos una fuga,RULE,0,16,1,1

2,1,fuga,MajorProblem,fuga,RULE,12,4,1,1

3,-1,Tenemos agujeros,Sentiment,Tenemos agujeros,RULE,18,16,1,2

4,3,agujeros,MajorProblem,agujeros,RULE,26,8,1,2

The nouns "agujero" and "fuga" are now identified as major problems due to the VoC customization applied.

  • No labels