Using R in SQL 2016 to calculate the distance between Cities
6. September 2016 Leave a comment
Since SQL Server 2016, R can be used in T-SQL statements to perform sophisticated calculations. One example I was facing, was to calculate the distance between two cities. Although there are many ways to solve this tasks, R can also be used to perform a exact calculation.
Prerequisites
R services need to be installed in order to execute R scripts within T-SQL. To calculate the distance between two geo coordinates, the geosphere library is required. The procedure to install additional packages is documented at MSDN.
Data model
This example contains 2 tables, Cities and DistanceTable. The Cities table contains the name and geo coordinates of a city, while the DistanceTable contains two references FromCity and ToCtiy to the Cities table.
Column | Datatype |
CityID | int (Primary Key) |
Name | nvarchar(128) |
Longitude | real |
Latitude | real |
Column | Datatype |
JournalID | int (Primary key) |
FromCity | int (Foreign key) |
ToCity | int (Foreign key) |
For example the two Austrian Cities Linz and Vienna look like this:
An entry in the distance table looks like this:
I’ve added another view to output the geo coordinates from both cities which are referenced in the DistanceTable
CREATE VIEW [dbo].[DistanceViewLonLat]
AS
SELECT DT.JournalID,
FC.Longitude AS FromLon, FC.Latitude AS FromLat,
TC.Longitude AS ToLon, TC.Latitude AS ToLat
FROM
dbo.DistanceTable AS DT
INNER JOIN dbo.Cities AS FC ON DT.FromCity = FC.CityID
INNER JOIN dbo.Cities AS TC ON DT.ToCity = TC.CityID
GO
A record from the view looks like this
R Script
The following R script takes a record from the view as input and calculates the distance between two points and rounds the result from meter to kilometer.
exec sp_execute_external_script
@language =N’R’,
@script=N’
library(sp)
library(geosphere)
sqlvalues <- as.matrix(InputDataSet);getDistKm <- function(row)
{
p1 <- c(row[1], row[2])
p2 <- c(row[3], row[4])d <- distGeo(p1,p2) / 1000
c(row[1], row[2], row[3], row[4], d)
}km <- apply(sqlvalues,1,getDistKm)
km <- t(km)OutputDataSet <- as.data.frame(km)
‘,
@input_data_1 =N’select FromLon, FromLat , ToLon, ToLat from DistanceViewLonLat where JournalID = 1;’
with result sets (([fromlng] real, [fromlat] real, [tolng] real, [tolat] real, [km] real not null));
go
The result looks like this: