# Calculate and Visualize the Lorenz Curve in Power BI

The Lorenz curve (1) is tool in statistics to visualize the concentration of values within a dataset. A typical well known example is the concentration of wealth within the population. For example in 2017 the top 1% of the US population controls 36,8% of wealth (2). However, the Lorenz curve can also be utilized to visualize the product groups share on revenue.

### Demo Data

There are 5 product groups (A,B,C,D,E,F) and multiple sales. The data is stored in the table “Sales” including three columns ProductGroup, Product and Qty. For demonstration purpose, each product group has the same share on total sold products: 20%

 ProductGroup Product Qty A A123 2 A A234 2 B B123 4 C C123 1 C C123 1 C C123 1 C C234 1 D D123 4 E E123 2 E E234 2

The invoices are stored in a separate table called Invoice, it contains three columns: The ProductGroup, SalesId and InvoiceAmount.

 ProductGroup SalesId InvoiceAmount A S-0001 5000 A S-0002 6000 E S-0003 1000 E S-0004 10000 D S-0005 11000 B S-0006 4000 B S-0007 4000 B S-0008 5000 C S-0009 5000 C S-0010 5000 C S-0011 4000

In order to visualize the lorenz curve, a data set is required that holds the cumulated frequency of Qty and InvoiceAmount, in ascending order based on the InvoiceAmount. . The first simple SQL Statements and results look like this

select ProductGroup, sum(Qty) as QtySum
from Sales
group by ProductGroup

 ProductGroup QtySum A 4 B 4 C 4 D 4 E 4

select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
group by ProductGroup
order by InvoiceAmountSum asc

 ProductGroup InvoiceAmountSum A 11000 D 11000 E 11000 B 13000 C 14000

Now comes the tricky part. A SQL statement is required which creates the cumulated sums. To do this the SQL function row_number is used to create a line number in the result set. Then the same result set is joined with all records smaller or equal the current row number and the InvoiceAmount is summed up. The SQL Statement looks like this.

select temp1.idx, temp1.ProductGroup, temp1.InvoiceAmountSum,
sum(temp2.InvoiceAmountSum) as CumulateSum
from
(

select ROW_NUMBER() over (order by InvoiceAmountSum asc) as idx, ProductGroup, InvoiceAmountSum
from (
select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
Group by ProductGroup)
temptable

)
as temp1
inner join
(
select ROW_NUMBER() over (order by InvoiceAmountSum asc) as idx, ProductGroup, InvoiceAmountSUm
from (
select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
Group by ProductGroup)
temptable
)
as temp2 on temp1.idx >= temp2.idx
group by temp1.idx, temp1.ProductGroup, temp1.InvoiceAmountSum

The result looks like this

 Idx ProductGroup InvoiceAmountSum CumulatedSum 1 A 11000 11000 2 D 11000 22000 3 E 11000 33000 4 B 13000 46000 5 C 14000 60000

Finally  the SQL query needs to bee extended to include also the cumulated sales qty. Moreover, a line starting with 0 values is required to make the Lorenz cure look correct. This is done by adding a row using UNIOIN. Here is the complete SQL statement and the result set:

select temp1.idx, temp1.ProductGroup, temp1.InvoiceAmountSum,
sum(temp2.InvoiceAmountSum) as CumulatedSum, s.SalesQty,
sum(s.SalesQty) as CumulatedSales
from
(

select ROW_NUMBER() over (order by InvoiceAmountSum asc) as idx, ProductGroup, InvoiceAmountSum
from (
select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
Group by ProductGroup
)
temptable

)
as temp1
inner join
(
select ROW_NUMBER() over (order by InvoiceAmountSum asc) as idx,
ProductGroup, InvoiceAmountSUm
from (
select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
group by ProductGroup
)
temptable
)
as temp2 on temp1.idx >= temp2.idx

inner join
(
Select ProductGroup, sum(Qty) as SalesQty
from Sales as s
group by ProductGroup
)
as s on s.ProductGroup = temp1.ProductGroup

group by temp1.idx, temp1.ProductGroup, temp1.InvoiceAmountSum, s.SalesQty

union

select 0 as idx,” as ProductGroup, 0 as InvoiceAmountSum,0 as CumulatedSum,
0 as SalesQty,0 as CumulatedSales

 Idx ProuctGroup InvoiceAmountSum CumulatedSum SalesQty CumulatedQty 0 0 0 0 0 1 A 11000 11000 4 4 2 D 11000 22000 4 8 3 E 11000 33000 4 12 4 B 13000 46000 4 16 5 C 14000 60000 4 20

## Power BI

The next step is load the data into Power BI and calculate the relative cumulated frequency  for Sales and Invoice. The is done by diving the cumulated frequency by the total numbers.

1. Open a new Power BI report
2. Select Get Data from SQL Server and provide the connection information
In the details copy & paste the SQL Statement from above
3. Rename the data set to SalesInvoice

After this there should be a dataset with the values from the table above in Power BI. Add two additional columns.to calculate the relative cumulated frequency. Format the columns as percentage.

RelativeFreqInvoice = SalesInvoice[CumulatedSum] /
sum(SalesInvoice[InvoiceAmountSum])

and

RelativeFreqSales = SalesInvoice[CumulatedSales] / sum
(SalesInvoice[SalesQty])

The dataset in Power BI should look like this

 Idx Prouct Group InvoiceAmount Sum Cumulated Sum Sales Qty Cumulated Qty Relative FreqInvoice Relative FreqSales 0 0 0 0 0 0% 0% 1 A 11000 11000 4 4 18% 20% 2 D 11000 22000 4 8 37% 40% 3 E 11000 33000 4 12 55% 60% 4 B 13000 46000 4 16 77% 80% 5 C 14000 60000 4 20 100% 100%

Go to the report designer and add a new line chart. Set as Axis the RelativeFreqSales. To get the diagonal line for the Lorenz curve add the RelativeFreqSales again as Value. Then also add the RelativeFreqInvoice as Value in the diagram properties. Now there are two lines, the 45° line and the lorenz curve underneath. In this example it should look like this: