Visualizing large GPS dataset tracks

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):



Device ID Timestamp Latitude Longitude


Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:



  1. Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp

  2. Simplify the route using ST_Simplify to reduce the number of points in each line

  3. Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom

This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).



How is this typically handled? Should the dataset be further broken down into tracks, resulting in smaller line geometries? Should it be stored as lines instead of points? According to this question, adding new data is slow, but that may be acceptable with a large boost to the visualization performance.







share|improve this question


























    up vote
    2
    down vote

    favorite












    I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):



    Device ID Timestamp Latitude Longitude


    Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:



    1. Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp

    2. Simplify the route using ST_Simplify to reduce the number of points in each line

    3. Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom

    This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).



    How is this typically handled? Should the dataset be further broken down into tracks, resulting in smaller line geometries? Should it be stored as lines instead of points? According to this question, adding new data is slow, but that may be acceptable with a large boost to the visualization performance.







    share|improve this question






















      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):



      Device ID Timestamp Latitude Longitude


      Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:



      1. Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp

      2. Simplify the route using ST_Simplify to reduce the number of points in each line

      3. Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom

      This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).



      How is this typically handled? Should the dataset be further broken down into tracks, resulting in smaller line geometries? Should it be stored as lines instead of points? According to this question, adding new data is slow, but that may be acceptable with a large boost to the visualization performance.







      share|improve this question












      I am extremely new to PostGIS, and I am interested in visualizing a large GPS dataset. The dataset is updated daily and contains millions of points in the format (EPSG:4326):



      Device ID Timestamp Latitude Longitude


      Mapbox Vector Tiles seem like a good approach for visualizing GPS tracks on a map, but I do not have good intuition as to the most efficient design. Currently, I have the following approach:



      1. Use St_MakeLine to convert the GPS coordinates for each device to a line geometry, ordering by the timestamp

      2. Simplify the route using ST_Simplify to reduce the number of points in each line

      3. Transform each line into a Mapbox Vector tile using ST_AsMVT and ST_AsMVTGeom

      This seems to work reasonably well with a small dataset, but does not scale well as the number of points for each device grows (the St_MakeLine and ST_Simplify calls become more and more expensive).



      How is this typically handled? Should the dataset be further broken down into tracks, resulting in smaller line geometries? Should it be stored as lines instead of points? According to this question, adding new data is slow, but that may be acceptable with a large boost to the visualization performance.









      share|improve this question











      share|improve this question




      share|improve this question










      asked Aug 27 at 8:21









      smang

      1132




      1132




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          5
          down vote



          accepted










          I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:



          • Build spatial indexes on your data link

          • Build indexes on time (and any other relevant variable, GPS_ID or what not) link

          • Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

          • Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

          • Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

          In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.



          Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.



          Making a materialised view for a week's worth of tracks:



          CREATE MATERIALIZED VIEW tracks_7days
          SELECT
          st_makeline(data_table.point) AS track,
          data_table.gps_id
          FROM data_table
          WHERE
          data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
          AND
          data_table.event_time <= now()::timestamp without time zone
          GROUP BY data_table.gps_id





          share|improve this answer






















          • Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
            – John Powell aka Barça
            Aug 27 at 8:49










          • Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
            – smang
            Aug 27 at 9:01










          • this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
            – ThingumaBob
            Aug 27 at 9:03







          • 1




            It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
            – RedM
            Aug 27 at 9:05










          • Added some SQL to make a materialised view
            – RedM
            Aug 27 at 9:09











          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "79"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f294067%2fvisualizing-large-gps-dataset-tracks%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          5
          down vote



          accepted










          I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:



          • Build spatial indexes on your data link

          • Build indexes on time (and any other relevant variable, GPS_ID or what not) link

          • Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

          • Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

          • Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

          In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.



          Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.



          Making a materialised view for a week's worth of tracks:



          CREATE MATERIALIZED VIEW tracks_7days
          SELECT
          st_makeline(data_table.point) AS track,
          data_table.gps_id
          FROM data_table
          WHERE
          data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
          AND
          data_table.event_time <= now()::timestamp without time zone
          GROUP BY data_table.gps_id





          share|improve this answer






















          • Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
            – John Powell aka Barça
            Aug 27 at 8:49










          • Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
            – smang
            Aug 27 at 9:01










          • this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
            – ThingumaBob
            Aug 27 at 9:03







          • 1




            It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
            – RedM
            Aug 27 at 9:05










          • Added some SQL to make a materialised view
            – RedM
            Aug 27 at 9:09















          up vote
          5
          down vote



          accepted










          I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:



          • Build spatial indexes on your data link

          • Build indexes on time (and any other relevant variable, GPS_ID or what not) link

          • Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

          • Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

          • Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

          In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.



          Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.



          Making a materialised view for a week's worth of tracks:



          CREATE MATERIALIZED VIEW tracks_7days
          SELECT
          st_makeline(data_table.point) AS track,
          data_table.gps_id
          FROM data_table
          WHERE
          data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
          AND
          data_table.event_time <= now()::timestamp without time zone
          GROUP BY data_table.gps_id





          share|improve this answer






















          • Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
            – John Powell aka Barça
            Aug 27 at 8:49










          • Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
            – smang
            Aug 27 at 9:01










          • this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
            – ThingumaBob
            Aug 27 at 9:03







          • 1




            It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
            – RedM
            Aug 27 at 9:05










          • Added some SQL to make a materialised view
            – RedM
            Aug 27 at 9:09













          up vote
          5
          down vote



          accepted







          up vote
          5
          down vote



          accepted






          I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:



          • Build spatial indexes on your data link

          • Build indexes on time (and any other relevant variable, GPS_ID or what not) link

          • Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

          • Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

          • Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

          In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.



          Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.



          Making a materialised view for a week's worth of tracks:



          CREATE MATERIALIZED VIEW tracks_7days
          SELECT
          st_makeline(data_table.point) AS track,
          data_table.gps_id
          FROM data_table
          WHERE
          data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
          AND
          data_table.event_time <= now()::timestamp without time zone
          GROUP BY data_table.gps_id





          share|improve this answer














          I've got the exact same kind of data. Millions of points per day coming and visualisation has always been an issue. Some tips that have helped me:



          • Build spatial indexes on your data link

          • Build indexes on time (and any other relevant variable, GPS_ID or what not) link

          • Partition your tables by month/week. Generally you're only going to be querying for a limited amount of time so having partitioned tables helps out. This is easier if you're using postgres 10. If you're using inherited tables to do partitioning then indexes don't get inherited (in Postgres 9.x). link

          • Don't bother trying to visualise all the tracks at once, it's just a mess. You can hook up QGIS directly to your postgis DB and view subsets of the data in that but your results may vary depending on how big your data gets. QGIS

          • Visualise aggregated data. I built a 0.1 x 0.1 degree grid and laid the tracks over that (for global data, you might want to go smaller). I counted the amount of tracks that intersected each grid point and displayed that. It's pretty expensive but you end up with a heatmap. You can also do some other weird stuff like showing the average speed, bearing etc for each lat/lon bin. Or else you can use the QGIS heatmap tool but that's not great.

          In general this is a pretty tough question. It really depends on what you want to do with the data. There are some datasets out there that are comparible so that could be of some help.



          Building indexes will help with ST_Makeline. You could also save the results into a materialised view so that you don't have to keep building them each time. If you're partitioning your tables then having a precalculated view for each table makes sense.



          Making a materialised view for a week's worth of tracks:



          CREATE MATERIALIZED VIEW tracks_7days
          SELECT
          st_makeline(data_table.point) AS track,
          data_table.gps_id
          FROM data_table
          WHERE
          data_table.event_time >= (now()::timestamp without time zone - '7 days'::interval)
          AND
          data_table.event_time <= now()::timestamp without time zone
          GROUP BY data_table.gps_id






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Aug 27 at 9:12

























          answered Aug 27 at 8:47









          RedM

          14618




          14618











          • Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
            – John Powell aka Barça
            Aug 27 at 8:49










          • Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
            – smang
            Aug 27 at 9:01










          • this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
            – ThingumaBob
            Aug 27 at 9:03







          • 1




            It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
            – RedM
            Aug 27 at 9:05










          • Added some SQL to make a materialised view
            – RedM
            Aug 27 at 9:09

















          • Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
            – John Powell aka Barça
            Aug 27 at 8:49










          • Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
            – smang
            Aug 27 at 9:01










          • this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
            – ThingumaBob
            Aug 27 at 9:03







          • 1




            It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
            – RedM
            Aug 27 at 9:05










          • Added some SQL to make a materialised view
            – RedM
            Aug 27 at 9:09
















          Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
          – John Powell aka Barça
          Aug 27 at 8:49




          Partitioning is likely to be a very big win peformance wise, as well as correct index construction, of course.
          – John Powell aka Barça
          Aug 27 at 8:49












          Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
          – smang
          Aug 27 at 9:01




          Thanks. That is very helpful. I've looked at that Uber write-up previously, and while it is a large dataset, it consists of only pickup and drop-off points (and not the actual track). Do you have any advice on splitting the GPS data into tracks, or are you just doing this naturally on day/week boundaries?
          – smang
          Aug 27 at 9:01












          this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
          – ThingumaBob
          Aug 27 at 9:03





          this answer is indeed good advice. there is a slightly different concept you can built atop that lets you dynamically select levels of simplification for tile generation via ST_SetEffectiveArea.
          – ThingumaBob
          Aug 27 at 9:03





          1




          1




          It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
          – RedM
          Aug 27 at 9:05




          It really depends on your data. Mine has a fixed ID and then GPS points for that ID going over years. I partition my data to YYYY_MM tables and then build indexes on the ID, time and position. If I want a specific track I can usually do a query in a couple milliseconds if it's this month (the table is sitting in RAM). If it's a historical one then it takes maybe 10 seconds to find and get (sitting on disc). I have built up tracks for all of the ID's for specific months and saved them to disc before but never for all my data.
          – RedM
          Aug 27 at 9:05












          Added some SQL to make a materialised view
          – RedM
          Aug 27 at 9:09





          Added some SQL to make a materialised view
          – RedM
          Aug 27 at 9:09


















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f294067%2fvisualizing-large-gps-dataset-tracks%23new-answer', 'question_page');

          );

          Post as a guest













































































          Comments

          Popular posts from this blog

          What does second last employer means? [closed]

          List of Gilmore Girls characters

          Confectionery