Visualizing large GPS dataset tracks
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):
Device ID Timestamp Latitude Longitude
Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:
- Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp
- Simplify the route using ST_Simplify to reduce the number of points in each line
- Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom
This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).
How is this typically handled? Should the dataset be further broken down into tracks, resulting in smaller line geometries? Should it be stored as lines instead of points? According to this question, adding new data is slow, but that may be acceptable with a large boost to the visualization performance.
postgis
add a comment |Â
up vote
2
down vote
favorite
I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):
Device ID Timestamp Latitude Longitude
Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:
- Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp
- Simplify the route using ST_Simplify to reduce the number of points in each line
- Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom
This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).
How is this typically handled? Should the dataset be further broken down into tracks, resulting in smaller line geometries? Should it be stored as lines instead of points? According to this question, adding new data is slow, but that may be acceptable with a large boost to the visualization performance.
postgis
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):
Device ID Timestamp Latitude Longitude
Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:
- Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp
- Simplify the route using ST_Simplify to reduce the number of points in each line
- Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom
This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).
How is this typically handled? Should the dataset be further broken down into tracks, resulting in smaller line geometries? Should it be stored as lines instead of points? According to this question, adding new data is slow, but that may be acceptable with a large boost to the visualization performance.
postgis
I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):
Device ID Timestamp Latitude Longitude
Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:
- Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp
- Simplify the route using ST_Simplify to reduce the number of points in each line
- Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom
This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).
How is this typically handled? Should the dataset be further broken down into tracks, resulting in smaller line geometries? Should it be stored as lines instead of points? According to this question, adding new data is slow, but that may be acceptable with a large boost to the visualization performance.
postgis
asked Aug 27 at 8:21
smang
1132
1132
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
5
down vote
accepted
I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:
- Build spatial indexes on your data link
- Build indexes on time (and any other relevant variable, GPS_ID or what not) link
- Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link
- Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS
- Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.
In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.
Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.
Making a materialised view for a week's worth of tracks:
CREATE MATERIALIZED VIEW tracks_7days
SELECT
st_makeline(data_table.point) AS track,
data_table.gps_id
FROM data_table
WHERE
data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
AND
data_table.event_time <= now()::timestamp without time zone
GROUP BY data_table.gps_id
Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
– John Powell aka Barça
Aug 27 at 8:49
Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
– smang
Aug 27 at 9:01
this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
– ThingumaBob
Aug 27 at 9:03
1
It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
– RedM
Aug 27 at 9:05
Added some SQL to make a materialised view
– RedM
Aug 27 at 9:09
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:
- Build spatial indexes on your data link
- Build indexes on time (and any other relevant variable, GPS_ID or what not) link
- Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link
- Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS
- Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.
In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.
Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.
Making a materialised view for a week's worth of tracks:
CREATE MATERIALIZED VIEW tracks_7days
SELECT
st_makeline(data_table.point) AS track,
data_table.gps_id
FROM data_table
WHERE
data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
AND
data_table.event_time <= now()::timestamp without time zone
GROUP BY data_table.gps_id
Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
– John Powell aka Barça
Aug 27 at 8:49
Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
– smang
Aug 27 at 9:01
this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
– ThingumaBob
Aug 27 at 9:03
1
It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
– RedM
Aug 27 at 9:05
Added some SQL to make a materialised view
– RedM
Aug 27 at 9:09
add a comment |Â
up vote
5
down vote
accepted
I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:
- Build spatial indexes on your data link
- Build indexes on time (and any other relevant variable, GPS_ID or what not) link
- Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link
- Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS
- Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.
In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.
Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.
Making a materialised view for a week's worth of tracks:
CREATE MATERIALIZED VIEW tracks_7days
SELECT
st_makeline(data_table.point) AS track,
data_table.gps_id
FROM data_table
WHERE
data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
AND
data_table.event_time <= now()::timestamp without time zone
GROUP BY data_table.gps_id
Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
– John Powell aka Barça
Aug 27 at 8:49
Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
– smang
Aug 27 at 9:01
this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
– ThingumaBob
Aug 27 at 9:03
1
It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
– RedM
Aug 27 at 9:05
Added some SQL to make a materialised view
– RedM
Aug 27 at 9:09
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:
- Build spatial indexes on your data link
- Build indexes on time (and any other relevant variable, GPS_ID or what not) link
- Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link
- Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS
- Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.
In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.
Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.
Making a materialised view for a week's worth of tracks:
CREATE MATERIALIZED VIEW tracks_7days
SELECT
st_makeline(data_table.point) AS track,
data_table.gps_id
FROM data_table
WHERE
data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
AND
data_table.event_time <= now()::timestamp without time zone
GROUP BY data_table.gps_id
I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:
- Build spatial indexes on your data link
- Build indexes on time (and any other relevant variable, GPS_ID or what not) link
- Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link
- Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS
- Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.
In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.
Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.
Making a materialised view for a week's worth of tracks:
CREATE MATERIALIZED VIEW tracks_7days
SELECT
st_makeline(data_table.point) AS track,
data_table.gps_id
FROM data_table
WHERE
data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
AND
data_table.event_time <= now()::timestamp without time zone
GROUP BY data_table.gps_id
edited Aug 27 at 9:12
answered Aug 27 at 8:47
RedM
14618
14618
Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
– John Powell aka Barça
Aug 27 at 8:49
Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
– smang
Aug 27 at 9:01
this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
– ThingumaBob
Aug 27 at 9:03
1
It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
– RedM
Aug 27 at 9:05
Added some SQL to make a materialised view
– RedM
Aug 27 at 9:09
add a comment |Â
Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
– John Powell aka Barça
Aug 27 at 8:49
Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
– smang
Aug 27 at 9:01
this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
– ThingumaBob
Aug 27 at 9:03
1
It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
– RedM
Aug 27 at 9:05
Added some SQL to make a materialised view
– RedM
Aug 27 at 9:09
Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
– John Powell aka Barça
Aug 27 at 8:49
Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
– John Powell aka Barça
Aug 27 at 8:49
Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
– smang
Aug 27 at 9:01
Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
– smang
Aug 27 at 9:01
this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
– ThingumaBob
Aug 27 at 9:03
this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
– ThingumaBob
Aug 27 at 9:03
1
1
It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
– RedM
Aug 27 at 9:05
It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
– RedM
Aug 27 at 9:05
Added some SQL to make a materialised view
– RedM
Aug 27 at 9:09
Added some SQL to make a materialised view
– RedM
Aug 27 at 9:09
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f294067%2fvisualizing-large-gps-dataset-tracks%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password