Support Article

Characters decoded incorrectly when email is sent as HTML

SA-83652

Summary



On sending an email as HTML, the content does not contain special characters for a few email IDs. This occurs despite setting UTF-8 charset for decoding.


Error Messages



Not Applicable


Steps to Reproduce

  1. Send an email to Gmail from Outlook.
  2. Wait for a minute to receive a notification on the case created.
  3. Search for the case created in the Pega instance.


ROOT  CAUSE

The setting in Outlook for Windows has the default encoding for outgoing messages as 'ISO_8859_1'. As a result, the HTML source has the ‘ISO-8859-1’ charset.
Setting the correct content type is important for email accessibility.
UTF-8 is a standard encoding which ensures that all the characters are decoded correctly, especially non-Latin characters.
While, ISO-8859-1 (this is the default on Outlook) only includes Latin-based languages.
Pega application uses UTF-8 for decoding. Therefore, the content must also be encoded with UTF-8.



Resolution



Perform the following local-change:

Modify the code in 'pyExtractHtmlFromAttachment' activity's Step 3 instead of changing the Outlook's default encoding settings (from 'ISO_8859_1' to 'UTF-8'):


String inputStream = htmlBase64;

byte[] bytes = org.apache.commons.codec.binary.Base64.decodeBase64(inputStream);
java.io.InputStream is = new java.io.ByteArrayInputStream(bytes);
java.nio.charset.Charset charset = java.nio.charset.StandardCharsets.UTF_8;
StringBuilder stringBuilder = new StringBuilder();
String line = null;
java.io.InputStreamReader reader = new java.io.InputStreamReader(is, charset);

try (java.io.BufferedReader bufferedReader = new java.io.BufferedReader(reader)) {        
  while ((line = bufferedReader.readLine()) != null) {
    stringBuilder.append(line);
  }
}catch(Exception ex){
  oLog.error("Error while reading file", ex);

FileString = stringBuilder.toString();
oLog.debug("Html string: " + FileString);
//-------------------------------------------
org.jsoup.nodes.Document doc = org.jsoup.Jsoup.parse(FileString);

//change all the anchor tags to have parameter as _blank
org.jsoup.select.Elements elements = doc.select("meta");
if(elements != null && !elements.isEmpty()){
  
  String charsetAttr = elements.get(0).attr("content");
  if(!charsetAttr.toUpperCase().contains("UTF-8")){
    stringBuilder = new StringBuilder();
    charset = java.nio.charset.StandardCharsets.ISO_8859_1;
    is = new java.io.ByteArrayInputStream(bytes);
    reader = new java.io.InputStreamReader(is, charset);

    try (java.io.BufferedReader bufferedReader = new java.io.BufferedReader(reader)) {    
      while ((line = bufferedReader.readLine()) != null) {
        stringBuilder.append(line);
      }
    }catch(Exception ex){
      oLog.error("Error while reading file", ex);
    } 
    FileString = stringBuilder.toString();
  }
}
Suggest Edit

Published July 9, 2019 - Updated August 12, 2019

Did you find this content helpful?

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.