Create an Email Tag Cloud with PowerBI and Cognitive Services

PowerBI and Cognitive Services are a powerful combination. A nice example is a tag cloud based on the key phrases in your daily emails. This example requires the following cloud components:

  • PowerBI (of course)
  • Cognitive Services for Key Phrase extraction
  • Exchange Online
  • Flow and Table Storage in Azure

Cloud Infrastructure

First, go to your Azure portal and create a new Cognitive Services Resource. In the creation wizard place the cognitive services to a data center near your Office subscription. I’d also recomend to creata a seperate resource group where you place all the services.

Cognitive Services in Azure

At the Cognitive Services Overview tab, copy the Endpoint URL. From the Cognitive Services > Key tab also copy the Key1. You need both to connecto to the cognitive services.

Azure Storage Account

Next create a new stroage account. Like in the Cognitive services place it in the same resource group and same data center. After the storage account has been created successfuly go to the overview tab.

Azure Table Storage

Select “Tables” and create a new table. Give it a useful name e.g. keystorage. A table storage can be used to place structured data, which require at least to fields a RowKey and a PartitionKey. It is up to you to provide meaningful values to theses fields when inserting data.

Copy the storage account name and from the Access Keys tab the Key1 value. You will need both to connect to the storage account.

Implement transformation pipeline in Flow (first naive approach)

Now, lets create the extraction logic using Flow. There are some limitations with this approach that will result in errors. A more stable version of the flow is discussed at the end. Go to https://flow.microsoft.com and create a new triggered flow from blank.

Automated Flow from Blank

The trigger for the flow is Outlook > When a new email arrives.

Because almost all my mails are HTML formated, I need to add the Content Conversion > HTML to Text step to remove the HTML code from the email body.

The third step in the flow is the key phrase extraction. Therefore add the Text Analysis > Key Phrase extraction step. There you need to provide the Cognitive Services Account Key and Endpoint. The text to analyze is the output from the HTML to Text step.

The last step writes the key phrases to the Azure Table Storage. Like in the Cognitive Services step, you have to provide the name and a key. From the Table dropdown select the table you have create earlier in the Azure portal. The entity has to be a JSON string. In my example the Partition is always 1 and the Row key is a Guid. Because, one mail will have more than one key phrase, the insert is encapsulated in an Apply-to-each block

{
“PartitionKey”:”1″,
“RowKey”:@{guid()},
“KeyPhrase”:@{items(‘Apply_to_each’)}
}

Keyword Extraction Flow

Test your flow by sending an Email to your account. All the steps should succeed

Keyword Extraction Flow Test

You can use the Azure Storage Explorer in the Azure portal to lookup the phrases extracted from the email. In this example I sent an email from my company account, to my private mail account. The flow extracted the key words from the mail (Signature).

Azure Storage Explorer

Tag Cloud in PowerBI

In PowerBI add a new data source from the Azure Table storage. Again you need to provide the storage name and one of the keys. After connecting successfuly to the table, open the transformation window an take a look at the retrieved keys. You can remove the PartitionKey, RowKey and Timestamp from the data set.

Azure Table Storage in PowerBI

In the PowerBI report window, from the Visuals, klick on the Elipsis (…) and search for the Word Cloud in the marketplace. Add the Word Cloud Visual to PowerBI

Word Cloud Visual for PowerBI

Add the visual to the PowerBI report window. Set the Key Phrases as category in the visual.

Word Cloud in PowerBI Desktop

PowerBI Online Service and automated Refresh

Publish the PowerBI report to your workspace. Within PowerBI Online, go to your workspace and navigate to the dataset. From the Elipsis (…) open the settings page. Provide the Key for Azure Table storage.

Azure Table Storage Connection

Now you can also schedule the automatic refresh

Automatic Refresh from Azure Table Storage in PowerBI Online Services

Implement transformation pipline with a more stable Flow

Unfortunatelly, the text processing in Cognitive Services is limited to 5120 characters. In many cases, Emails contain more characters than this and the flow will fail with an error from the Cognitive Services. One way to address this issue, is to implement a loop that cuts the Email body into pieces of 5120 characters or less before feeding it to Cognitive Services. However, Flow is not very developer focused and requires some workarounds for simple tasks like assigning function calls with a variable to itself e.g substring()

In the first place, delcare 4 variables

Some required variables in Flow

Next execute the HTML to Text block. An optimization is to use the Builtin Data-Operations action Compose to trim() the result to remove blanks from the start and end, and populate the STRLEN and EMAILBODY. Whereas the STRLEN requires a function: length(outputs(‘Trim_Text’))

Set the variables in Flow

Next, create a Do-While Loop from the Control elements in Flow. The condition for the Loop is STRLEN <= 0 because we are cutting the Email into pieces until nothing is left

A loop to cut the Email into pieces of 5120 characters (or less)

Within the Loop, create a IF decision depending on the STRLEN. If the STRLEN variable is less then 5120, the STRLEN is set to 0 to end the Loop. The variable TEXT is set to the EMAILBODY.

Email body is shorter than 5120

If the Emailbody is longer than 5120 characters, the first 5120 characters are copied to the TEXT variable: substring(variables(‘EMAILBODY’),0,5120)

Next the variable STRLEN is reduced by 5120: sub(length(variables(‘EMAILBODY’)),5120)

In the third step, the variable EMAILBODY_SHORT is set to the substring starting at 5121 till the end of the original EMAILBODY. Is is done, because Flow does not support variable asignment by a function that contains the variable itself: substring(variables(‘EMAILBODY’),5121,sub(variables(‘STRLEN’),1))

In the last step the orignial EMAILBODY variable is set to be the EMAILBODY_SHORT. It contains now the body without the first 5120 characters.

Email body is larger than 5120

Within the loop, after the IF condition, Cognitive Services are called with the TEXT variable and the results are written to the Azure Table Storage like in the first naive implementation.

Save Cognitive Services Results to Azure Table Storage

More Optimization

There are three additional ways to optimize this solution.

One may argue, that cutting the text into pieces might cut a releveant word for the Word Cloud into pieces and therefore cannot be recognized by Cognitive Services, e.g. Micros … oft. One way to address this is to modify the substring function, by checking the last index of “_” (Blank) and cut there.

Another issue is that Cognitive Services are not aware of all stop words. Especially if using Non-English Key Phrases you may end up with a messy cloud. However, there are public available lists of stopwords in certain languages out there, that can be loaded into PowerBI and used to exclude certain findings from Cognitive Services. The Word Cloud visual provides an Exclude property where you can provide stop words to exclude.

In the example from above, the language for Cognitive Services is set to DE (german). Howerver, this might not be optimal if you receive Emails in different languages. An optimzation could be to use Cognitive Service to detect the language, and switch the Key Phrase Detection Call for the most common languages in your Email inbox, in my case German and English.

Flow Download (package)

Please find the Flow Package in the Sources Onedrive Folder. Import the .zip File in your Flow Tenant. You need to map Outlook, Cognitive Services, Azure Table Storage, etc. to your configurations.

Dynamic colored R Diagram in Power BI using Earthtone

Power BI integrates R to perform complex analysis and sophisticated visualization. Earthtones is an R library which takes a screenshot from Google Maps of certain geo coordinate and extracts the landscape colors. Earthtones can be used to color diagrams based on the local color schema.

Earthtones

The package can be found on github. There is also a description how to donwload and install the package. Using earthtones is easy. The function get_earththones takes the parameters longitude and latitude, zoom and the number of colors to extract. The earthtones for Steyr look like this:

get_earthtones(latitude=48.045,longitude=14.422,zoom=15,number_of_colors=8)

Steyr Earthtones

Power BI Data Model

The data model in this example is very simple. There are two excel sheets, one for the revenue by city and item group, another for the geo coordinates (longitude / latitude) and optimal zoom level per city.

Excel Sheet Revenue per City and Item Group

City Geo Coordinates

The Power BI model is very simple, both data sources are linked by the city name

Power BI Data Model

R Boxplot diagram in Power BI

In this example a simple boxplot is used to visualize the revenue by item group. A data slicer for the column city is used to filter the data. The R diagram takes the following columns as input:

  • Longitude
  • Latitude
  • Zoom
  • City
  • Price
  • Group

If only one city is selected, the R script shall gather the cities earthtone colors and format the diagram. If more than one city is selected, the diagram shall be formatted in red, blue and green. The following script loads the earthtone library and gets the distinct number of city names from the dataset. If there is more than 1 distinct name in the dataset the color variable is set to red,blue,green. Otherwise, earthtone is used to get the city typical color schema.

library(earthtones)

numCities <- length(unique(dataset$Stadt))
if(numCities > 1) {
color <- c(“red”,”blue”,”green”)
} else {
color <- get_earthtones(latitude = dataset$Lat[1],
longitude=dataset$Lon[1],
zoom= dataset$Zoom[1],
number_of_colors=3,include.map=FALSE)
}

boxplot(Preis~Gruppe,dataset,col=(color),ylab=”Revenue”,xlab = “Item Group”)

The R script in Power BI looks like this:

R Script and Boxplot in Power BI

If a city is selected, for example San Francisco, the diagram is formatted in the colors blue, gray and brown.

R Diagram in Power BI with dynamic color

The colors fit the blue sea, the bay and the city seen from space.

R Earthtone for San Francisco

If another city, for example Cairo, is selected the diagram gets formatted in dark green, dark- and light brown.

R Diagram in Power BI with dynamic color

That fits the cities local color schema, the brown buildings, the green plants along the Nile and the desert sand.

R Earthtone for Cairo