Visualizing large GPS dataset tracks

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):

Device ID Timestamp Latitude Longitude

Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:

Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp

Simplify the route using ST_Simplify to reduce the number of points in each line

Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom

This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).

How is this typically handled? Should the dataset be further broken down into tracks, resulting in smaller line geometries? Should it be stored as lines instead of points? According to this question, adding new data is slow, but that may be acceptable with a large boost to the visualization performance.

asked Aug 27 at 8:21

smang

1132

add a commentÂ |Â

up vote
2
down vote

favorite

I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):

Device ID Timestamp Latitude Longitude

Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:

Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp

Simplify the route using ST_Simplify to reduce the number of points in each line

Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom

This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).

asked Aug 27 at 8:21

smang

1132

add a commentÂ |Â

up vote
2
down vote

favorite

I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):

Device ID Timestamp Latitude Longitude

Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:

Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp

Simplify the route using ST_Simplify to reduce the number of points in each line

Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom

This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).

asked Aug 27 at 8:21

smang

1132

I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):

Device ID Timestamp Latitude Longitude

Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:

Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp

Simplify the route using ST_Simplify to reduce the number of points in each line

Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom

This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).

asked Aug 27 at 8:21

smang

1132

asked Aug 27 at 8:21

smang

1132

asked Aug 27 at 8:21

smang

1132

asked Aug 27 at 8:21

smang

1132

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:

Build spatial indexes on your data link

Build indexes on time (and any other relevant variable, GPS_ID or what not) link

Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.

Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.

Making a materialised view for a week's worth of tracks:

CREATE MATERIALIZED VIEW tracks_7days
SELECT 
 st_makeline(data_table.point) AS track, 
 data_table.gps_id 
FROM data_table 
WHERE 
 data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval) 
AND 
 data_table.event_time <= now()::timestamp without time zone 
GROUP BY data_table.gps_id

edited Aug 27 at 9:12

answered Aug 27 at 8:47

RedM

14618

Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
â€“Â John Powell aka BarÃ§a
Aug 27 at 8:49

Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
â€“Â smang
Aug 27 at 9:01

this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
â€“Â ThingumaBob
Aug 27 at 9:03

1

It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
â€“Â RedM
Aug 27 at 9:05

Added some SQL to make a materialised view
â€“Â RedM
Aug 27 at 9:09

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "79"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f294067%2fvisualizing-large-gps-dataset-tracks%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:

Build spatial indexes on your data link

Build indexes on time (and any other relevant variable, GPS_ID or what not) link

Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.

Making a materialised view for a week's worth of tracks:

CREATE MATERIALIZED VIEW tracks_7days
SELECT 
 st_makeline(data_table.point) AS track, 
 data_table.gps_id 
FROM data_table 
WHERE 
 data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval) 
AND 
 data_table.event_time <= now()::timestamp without time zone 
GROUP BY data_table.gps_id

edited Aug 27 at 9:12

answered Aug 27 at 8:47

RedM

14618

Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
â€“Â John Powell aka BarÃ§a
Aug 27 at 8:49

Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
â€“Â smang
Aug 27 at 9:01

this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
â€“Â ThingumaBob
Aug 27 at 9:03

1

It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
â€“Â RedM
Aug 27 at 9:05

Added some SQL to make a materialised view
â€“Â RedM
Aug 27 at 9:09

add a commentÂ |Â

up vote
5
down vote

accepted

I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:

Build spatial indexes on your data link

Build indexes on time (and any other relevant variable, GPS_ID or what not) link

Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.

Making a materialised view for a week's worth of tracks:

CREATE MATERIALIZED VIEW tracks_7days
SELECT 
 st_makeline(data_table.point) AS track, 
 data_table.gps_id 
FROM data_table 
WHERE 
 data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval) 
AND 
 data_table.event_time <= now()::timestamp without time zone 
GROUP BY data_table.gps_id

edited Aug 27 at 9:12

answered Aug 27 at 8:47

RedM

14618

Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
â€“Â John Powell aka BarÃ§a
Aug 27 at 8:49

Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
â€“Â smang
Aug 27 at 9:01

this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
â€“Â ThingumaBob
Aug 27 at 9:03

1

It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
â€“Â RedM
Aug 27 at 9:05

Added some SQL to make a materialised view
â€“Â RedM
Aug 27 at 9:09

add a commentÂ |Â

up vote
5
down vote

accepted

I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:

Build spatial indexes on your data link

Build indexes on time (and any other relevant variable, GPS_ID or what not) link

Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.

Making a materialised view for a week's worth of tracks:

CREATE MATERIALIZED VIEW tracks_7days
SELECT 
 st_makeline(data_table.point) AS track, 
 data_table.gps_id 
FROM data_table 
WHERE 
 data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval) 
AND 
 data_table.event_time <= now()::timestamp without time zone 
GROUP BY data_table.gps_id

edited Aug 27 at 9:12

answered Aug 27 at 8:47

RedM

14618

I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:

Build spatial indexes on your data link

Build indexes on time (and any other relevant variable, GPS_ID or what not) link

Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.

Making a materialised view for a week's worth of tracks:

CREATE MATERIALIZED VIEW tracks_7days
SELECT 
 st_makeline(data_table.point) AS track, 
 data_table.gps_id 
FROM data_table 
WHERE 
 data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval) 
AND 
 data_table.event_time <= now()::timestamp without time zone 
GROUP BY data_table.gps_id

edited Aug 27 at 9:12

answered Aug 27 at 8:47

RedM

14618

edited Aug 27 at 9:12

answered Aug 27 at 8:47

RedM

14618

answered Aug 27 at 8:47

RedM

14618

answered Aug 27 at 8:47

RedM

14618

Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
â€“Â John Powell aka BarÃ§a
Aug 27 at 8:49

Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
â€“Â smang
Aug 27 at 9:01

this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
â€“Â ThingumaBob
Aug 27 at 9:03

1

It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
â€“Â RedM
Aug 27 at 9:05

Added some SQL to make a materialised view
â€“Â RedM
Aug 27 at 9:09

add a commentÂ |Â

Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
â€“Â John Powell aka BarÃ§a
Aug 27 at 8:49

Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
â€“Â smang
Aug 27 at 9:01

this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
â€“Â ThingumaBob
Aug 27 at 9:03

1

It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
â€“Â RedM
Aug 27 at 9:05

Added some SQL to make a materialised view
â€“Â RedM
Aug 27 at 9:09

Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
â€“Â John Powell aka BarÃ§a
Aug 27 at 8:49

Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
â€“Â smang
Aug 27 at 9:01

this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
â€“Â ThingumaBob
Aug 27 at 9:03

It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
â€“Â RedM
Aug 27 at 9:05

Added some SQL to make a materialised view
â€“Â RedM
Aug 27 at 9:09

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky