Calculate and Visualize the Lorenz Curve in Power BI
15. November 2017 Hinterlasse einen Kommentar
The Lorenz curve (1) is tool in statistics to visualize the concentration of values within a dataset. A typical well known example is the concentration of wealth within the population. For example in 2017 the top 1% of the US population controls 36,8% of wealth (2). However, the Lorenz curve can also be utilized to visualize the product groups share on revenue.
Demo Data
There are 5 product groups (A,B,C,D,E,F) and multiple sales. The data is stored in the table “Sales” including three columns ProductGroup, Product and Qty. For demonstration purpose, each product group has the same share on total sold products: 20%
ProductGroup |
Product |
Qty |
A | A123 | 2 |
A | A234 | 2 |
B | B123 | 4 |
C | C123 | 1 |
C | C123 | 1 |
C | C123 | 1 |
C | C234 | 1 |
D | D123 | 4 |
E | E123 | 2 |
E | E234 | 2 |
The invoices are stored in a separate table called Invoice, it contains three columns: The ProductGroup, SalesId and InvoiceAmount.
ProductGroup |
SalesId |
InvoiceAmount |
A | S-0001 | 5000 |
A | S-0002 | 6000 |
E | S-0003 | 1000 |
E | S-0004 | 10000 |
D | S-0005 | 11000 |
B | S-0006 | 4000 |
B | S-0007 | 4000 |
B | S-0008 | 5000 |
C | S-0009 | 5000 |
C | S-0010 | 5000 |
C | S-0011 | 4000 |
In order to visualize the lorenz curve, a data set is required that holds the cumulated frequency of Qty and InvoiceAmount, in ascending order based on the InvoiceAmount. . The first simple SQL Statements and results look like this
select ProductGroup, sum(Qty) as QtySum
from Sales
group by ProductGroup
ProductGroup |
QtySum |
A | 4 |
B | 4 |
C | 4 |
D | 4 |
E | 4 |
select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
group by ProductGroup
order by InvoiceAmountSum asc
ProductGroup |
InvoiceAmountSum |
A | 11000 |
D | 11000 |
E | 11000 |
B | 13000 |
C | 14000 |
Now comes the tricky part. A SQL statement is required which creates the cumulated sums. To do this the SQL function row_number is used to create a line number in the result set. Then the same result set is joined with all records smaller or equal the current row number and the InvoiceAmount is summed up. The SQL Statement looks like this.
select temp1.idx, temp1.ProductGroup, temp1.InvoiceAmountSum,
sum(temp2.InvoiceAmountSum) as CumulateSum
from
(select ROW_NUMBER() over (order by InvoiceAmountSum asc) as idx, ProductGroup, InvoiceAmountSum
from (
select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
Group by ProductGroup)
temptable)
as temp1
inner join
(
select ROW_NUMBER() over (order by InvoiceAmountSum asc) as idx, ProductGroup, InvoiceAmountSUm
from (
select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
Group by ProductGroup)
temptable
)
as temp2 on temp1.idx >= temp2.idx
group by temp1.idx, temp1.ProductGroup, temp1.InvoiceAmountSum
The result looks like this
Idx |
ProductGroup |
InvoiceAmountSum |
CumulatedSum |
1 | A | 11000 | 11000 |
2 | D | 11000 | 22000 |
3 | E | 11000 | 33000 |
4 | B | 13000 | 46000 |
5 | C | 14000 | 60000 |
Finally the SQL query needs to bee extended to include also the cumulated sales qty. Moreover, a line starting with 0 values is required to make the Lorenz cure look correct. This is done by adding a row using UNIOIN. Here is the complete SQL statement and the result set:
select temp1.idx, temp1.ProductGroup, temp1.InvoiceAmountSum,
sum(temp2.InvoiceAmountSum) as CumulatedSum, s.SalesQty,
sum(s.SalesQty) as CumulatedSales
from
(select ROW_NUMBER() over (order by InvoiceAmountSum asc) as idx, ProductGroup, InvoiceAmountSum
from (
select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
Group by ProductGroup
)
temptable)
as temp1
inner join
(
select ROW_NUMBER() over (order by InvoiceAmountSum asc) as idx,
ProductGroup, InvoiceAmountSUm
from (
select ProductGroup, sum(InvoiceAmount) as InvoiceAmountSum
from Invoice
group by ProductGroup
)
temptable
)
as temp2 on temp1.idx >= temp2.idxinner join
(
Select ProductGroup, sum(Qty) as SalesQty
from Sales as s
group by ProductGroup
)
as s on s.ProductGroup = temp1.ProductGroupgroup by temp1.idx, temp1.ProductGroup, temp1.InvoiceAmountSum, s.SalesQty
union
select 0 as idx,“ as ProductGroup, 0 as InvoiceAmountSum,0 as CumulatedSum,
0 as SalesQty,0 as CumulatedSales
Idx |
ProuctGroup |
InvoiceAmountSum |
CumulatedSum |
SalesQty |
CumulatedQty |
0 | 0 | 0 | 0 | 0 | |
1 | A | 11000 | 11000 | 4 | 4 |
2 | D | 11000 | 22000 | 4 | 8 |
3 | E | 11000 | 33000 | 4 | 12 |
4 | B | 13000 | 46000 | 4 | 16 |
5 | C | 14000 | 60000 | 4 | 20 |
Power BI
The next step is load the data into Power BI and calculate the relative cumulated frequency for Sales and Invoice. The is done by diving the cumulated frequency by the total numbers.
- Open a new Power BI report
- Select Get Data from SQL Server and provide the connection information
In the details copy & paste the SQL Statement from above - Rename the data set to SalesInvoice
After this there should be a dataset with the values from the table above in Power BI. Add two additional columns.to calculate the relative cumulated frequency. Format the columns as percentage.
RelativeFreqInvoice = SalesInvoice[CumulatedSum] /
sum(SalesInvoice[InvoiceAmountSum])
and
RelativeFreqSales = SalesInvoice[CumulatedSales] / sum
(SalesInvoice[SalesQty])
The dataset in Power BI should look like this
Idx |
Prouct |
InvoiceAmount |
Cumulated |
Sales |
Cumulated Qty |
Relative |
Relative FreqSales |
0 | 0 | 0 | 0 | 0 | 0% | 0% | |
1 | A | 11000 | 11000 | 4 | 4 | 18% | 20% |
2 | D | 11000 | 22000 | 4 | 8 | 37% | 40% |
3 | E | 11000 | 33000 | 4 | 12 | 55% | 60% |
4 | B | 13000 | 46000 | 4 | 16 | 77% | 80% |
5 | C | 14000 | 60000 | 4 | 20 | 100% | 100% |
Go to the report designer and add a new line chart. Set as Axis the RelativeFreqSales. To get the diagonal line for the Lorenz curve add the RelativeFreqSales again as Value. Then also add the RelativeFreqInvoice as Value in the diagram properties. Now there are two lines, the 45° line and the lorenz curve underneath. In this example it should look like this:
1) https://en.wikipedia.org/wiki/Lorenz_curve
2) http://money.cnn.com/2017/09/27/news/economy/inequality-record-top-1-percent-wealth/index.html