PowerBI dataflow for Self-Service BI

A typical challenge in a BI project is to integrate data from different sources. For example files stored locally, ERP databases, cloud services, etc. On the other hand, PowerBI (desktop) is designed for power users to develop reports quickly. However, power users may have business knowledge but in most cases lack the technical knowledge to integrate all the data they need. With PowerBI dataflow it is possible to break the workload into a technical IT-related part and a business analysis part.

PowerBI dataflow hides the complexity of integrating differnt data sources, but provides an ready-to-use data source for PowerBI desktop. I’ve recorded a video how to integrate Excel expenses from a local folder with Dynamics 365 Sales customers and promote result as certified data source. The power user accesses this promoted data source in the report.

https://youtu.be/1umwqpiozbM

Connect to the SQL database of a Dynamics 365 Finance Test instance

In Dynamics 365 Finance / SCM we can no longer access the SQL database of the production environment directly. However, we can access the SQL database of the Acceptance Test instance. All required information can be found in LCS. I’ve made a video where to find this information in LCS and how to connecto to the SQL database.

XYZ Analysis for Dynamics 365 FO in PowerBI

XYZ analysis is used to categorize products based on the variance of their demand. Products with a low demand variance, i.e. same quantity demanded regulary, are categorized with X, products with an unstable demand Y and products with a high variance in demand as Z.

The categorization is based on a calculated measure, often referred to as variance coeffizient. This coefficient is calculated by the standard deviation of the demand divided by the mean.

Video

Here is a video tutorial how to build the XYZ analysis in PowerBI

Example

Here is an example of three products with different demand over a year. Toilet paper is needed every month in the same quantity, car tires have a higher demand in spring and autum, firework is demanded only on special ocasions.

Demand of different products
Demand of different products in a year

Prepare Data in PowerBI

The basis of the XYZ analysis will be the SalesInvoiceLines entity. At least three columns are needed. The InvoiceDate, the InvoicedQuantity and the Product Name. In this example I renamed the dataset to “Demand”

In PowerQuery create two new columns, one for the year based on the InvoiceDate and one for the month, also based on the InvoiceDate. Afterwards remove the InvoiceDate column

YEAR  = Date.Year([InvoiceDate])
MONTH = Date.Month([InvoiceDate]) 

Next, remove the InvoiceDate column and group the records by ProductName, Year and Month and aggregate the InvoicedQuantity column. Here is an example:

Aggreage demand by product name, year and month
Aggregation in PowerBI

The second dataset contains the XYZ data template, including the ProductName, Year and Month. For simplicity you can enter the 12 records for Year / Month combinations manually. Add an additional column containing the distinct list of ProductNames and expand the rows.

ProductName = List.Distinct(Demand[ProductName])
Product name, year and month

Finally, merge the two datasets to a new one using a left outer join based on the second dataset. As result you get a list of ProductName, Year, Month, Qty combination for each product and every month no matter if there was an acutal demand or not. If there was no demand, the Qty will be null and needs to be converted to 0.

Left Outer Join on calendar and demand
Merge queries in PowerBI

The resulting dataset has records for eacht month and product. It can be used to calculated the mean, standard deviation and variation coefficient. To do so, create a new measure in PowerBI. It will calculate a coefficient value that can be used to categorize the products in X, Y and Z.

Variation Coefficient in PowerBI

Create an Email Tag Cloud with PowerBI and Cognitive Services

PowerBI and Cognitive Services are a powerful combination. A nice example is a tag cloud based on the key phrases in your daily emails. This example requires the following cloud components:

  • PowerBI (of course)
  • Cognitive Services for Key Phrase extraction
  • Exchange Online
  • Flow and Table Storage in Azure

Cloud Infrastructure

First, go to your Azure portal and create a new Cognitive Services Resource. In the creation wizard place the cognitive services to a data center near your Office subscription. I’d also recomend to creata a seperate resource group where you place all the services.

Cognitive Services in Azure

At the Cognitive Services Overview tab, copy the Endpoint URL. From the Cognitive Services > Key tab also copy the Key1. You need both to connecto to the cognitive services.

Azure Storage Account

Next create a new stroage account. Like in the Cognitive services place it in the same resource group and same data center. After the storage account has been created successfuly go to the overview tab.

Azure Table Storage

Select “Tables” and create a new table. Give it a useful name e.g. keystorage. A table storage can be used to place structured data, which require at least to fields a RowKey and a PartitionKey. It is up to you to provide meaningful values to theses fields when inserting data.

Copy the storage account name and from the Access Keys tab the Key1 value. You will need both to connect to the storage account.

Implement transformation pipeline in Flow (first naive approach)

Now, lets create the extraction logic using Flow. There are some limitations with this approach that will result in errors. A more stable version of the flow is discussed at the end. Go to https://flow.microsoft.com and create a new triggered flow from blank.

Automated Flow from Blank

The trigger for the flow is Outlook > When a new email arrives.

Because almost all my mails are HTML formated, I need to add the Content Conversion > HTML to Text step to remove the HTML code from the email body.

The third step in the flow is the key phrase extraction. Therefore add the Text Analysis > Key Phrase extraction step. There you need to provide the Cognitive Services Account Key and Endpoint. The text to analyze is the output from the HTML to Text step.

The last step writes the key phrases to the Azure Table Storage. Like in the Cognitive Services step, you have to provide the name and a key. From the Table dropdown select the table you have create earlier in the Azure portal. The entity has to be a JSON string. In my example the Partition is always 1 and the Row key is a Guid. Because, one mail will have more than one key phrase, the insert is encapsulated in an Apply-to-each block

{
“PartitionKey”:”1″,
“RowKey”:@{guid()},
“KeyPhrase”:@{items(‘Apply_to_each’)}
}

Keyword Extraction Flow

Test your flow by sending an Email to your account. All the steps should succeed

Keyword Extraction Flow Test

You can use the Azure Storage Explorer in the Azure portal to lookup the phrases extracted from the email. In this example I sent an email from my company account, to my private mail account. The flow extracted the key words from the mail (Signature).

Azure Storage Explorer

Tag Cloud in PowerBI

In PowerBI add a new data source from the Azure Table storage. Again you need to provide the storage name and one of the keys. After connecting successfuly to the table, open the transformation window an take a look at the retrieved keys. You can remove the PartitionKey, RowKey and Timestamp from the data set.

Azure Table Storage in PowerBI

In the PowerBI report window, from the Visuals, klick on the Elipsis (…) and search for the Word Cloud in the marketplace. Add the Word Cloud Visual to PowerBI

Word Cloud Visual for PowerBI

Add the visual to the PowerBI report window. Set the Key Phrases as category in the visual.

Word Cloud in PowerBI Desktop

PowerBI Online Service and automated Refresh

Publish the PowerBI report to your workspace. Within PowerBI Online, go to your workspace and navigate to the dataset. From the Elipsis (…) open the settings page. Provide the Key for Azure Table storage.

Azure Table Storage Connection

Now you can also schedule the automatic refresh

Automatic Refresh from Azure Table Storage in PowerBI Online Services

Implement transformation pipline with a more stable Flow

Unfortunatelly, the text processing in Cognitive Services is limited to 5120 characters. In many cases, Emails contain more characters than this and the flow will fail with an error from the Cognitive Services. One way to address this issue, is to implement a loop that cuts the Email body into pieces of 5120 characters or less before feeding it to Cognitive Services. However, Flow is not very developer focused and requires some workarounds for simple tasks like assigning function calls with a variable to itself e.g substring()

In the first place, delcare 4 variables

Some required variables in Flow

Next execute the HTML to Text block. An optimization is to use the Builtin Data-Operations action Compose to trim() the result to remove blanks from the start and end, and populate the STRLEN and EMAILBODY. Whereas the STRLEN requires a function: length(outputs(‘Trim_Text’))

Set the variables in Flow

Next, create a Do-While Loop from the Control elements in Flow. The condition for the Loop is STRLEN <= 0 because we are cutting the Email into pieces until nothing is left

A loop to cut the Email into pieces of 5120 characters (or less)

Within the Loop, create a IF decision depending on the STRLEN. If the STRLEN variable is less then 5120, the STRLEN is set to 0 to end the Loop. The variable TEXT is set to the EMAILBODY.

Email body is shorter than 5120

If the Emailbody is longer than 5120 characters, the first 5120 characters are copied to the TEXT variable: substring(variables(‘EMAILBODY’),0,5120)

Next the variable STRLEN is reduced by 5120: sub(length(variables(‘EMAILBODY’)),5120)

In the third step, the variable EMAILBODY_SHORT is set to the substring starting at 5121 till the end of the original EMAILBODY. Is is done, because Flow does not support variable asignment by a function that contains the variable itself: substring(variables(‘EMAILBODY’),5121,sub(variables(‘STRLEN’),1))

In the last step the orignial EMAILBODY variable is set to be the EMAILBODY_SHORT. It contains now the body without the first 5120 characters.

Email body is larger than 5120

Within the loop, after the IF condition, Cognitive Services are called with the TEXT variable and the results are written to the Azure Table Storage like in the first naive implementation.

Save Cognitive Services Results to Azure Table Storage

More Optimization

There are three additional ways to optimize this solution.

One may argue, that cutting the text into pieces might cut a releveant word for the Word Cloud into pieces and therefore cannot be recognized by Cognitive Services, e.g. Micros … oft. One way to address this is to modify the substring function, by checking the last index of “_” (Blank) and cut there.

Another issue is that Cognitive Services are not aware of all stop words. Especially if using Non-English Key Phrases you may end up with a messy cloud. However, there are public available lists of stopwords in certain languages out there, that can be loaded into PowerBI and used to exclude certain findings from Cognitive Services. The Word Cloud visual provides an Exclude property where you can provide stop words to exclude.

In the example from above, the language for Cognitive Services is set to DE (german). Howerver, this might not be optimal if you receive Emails in different languages. An optimzation could be to use Cognitive Service to detect the language, and switch the Key Phrase Detection Call for the most common languages in your Email inbox, in my case German and English.

Flow Download (package)

Please find the Flow Package in the Sources Onedrive Folder. Import the .zip File in your Flow Tenant. You need to map Outlook, Cognitive Services, Azure Table Storage, etc. to your configurations.

Connect Azure Data Lake Storage with PowerBI dataflow

PowerBI dataflow performs ETL (Extract Transform Load) workloads in the cloud. PowerBI Pro and Premium Users get dataflow storage without additional charges. However, this storage is managed by PowerBI and you cannot access it directly. Therefor BYOSA (Bring Your Own Storage Account) is support to connect you own Azure storage account with PowerBI dataflow. I’ve made a video, following the documentation, how to connect an Azure storage account with PowerBI. Please find my video youtube:

Configure Azure Data Lake storage with PowerBI dataflow

Find broken reference in Excel

I recently had to find a broken reference in an Excel file, where data was copied from one version of the file to another. In the original version there was a data validation rule with a dropdown to a list of allowed base data from another excel sheet.

Drop down to choose from list of value

In a next step the original file was copied and modified to fit new requirements. Meanwhile the original file was used and data was collected in the old file.

After the new file version was approved, the data from the old version was copied (CTRL+C , CTRL+V) to the new file and the old file was removed. However, when opening the new file excel reported a broken link to the old file.

Broken reference after copying data

While this problem can easily be identified in a small excel, this can be a challenge in a huge file with lot of sheets. One way to identify the problem is to unzip the excel file (e.g. using winrar, 7zip, etc). A excel file contains many XML and other files. The worksheets for examle can be found in the xl/worksheets folder

Excel file opend in Winrar

After extracting the worksheet folder to the file system, you may use a tool like Visual Studio Code and open the folder.

Open extracted excel folder in VS Code

Simple by searching for the name of the broken reference you will find the place where to fix the problem

Broken reference in excel

Configure PowerBI on Dynamics 365 FO developer VM

I’ve created a video tutorial how to configure PowerBI on a stand alone Dynamics 365 Finance and Operations developer VM