How to cluster parts of broken line made of points?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

I am studying clustering techniques and i am pretty new at this topic. Here is my problem: I created a 5 lines which are made of points. This lines are supposed to be continuous and they look like this:

# Create data
n0, x0, y0 = 17, np.linspace(0, 11, n0), np.linspace(10, 3, n0)
n1, x1, y1 = 35, np.linspace(11, 27, n1), np.linspace(3, 3, n1)
n2, x2, y2 = 15, np.linspace(27, 35, n2), np.linspace(3, 15, n2)
n3, x3, y3 = 4, np.linspace(-1, 0, n3), np.linspace(10, 10, n3)
n4, x4, y4 = 10, np.linspace(35, 46, n4), np.linspace(15, 15.5,n4) 

# Plot data
plt.figure()
plt.plot(x0, y0, 'o', color='grey', markersize=1)
plt.plot(x1, y1, 'o', color='grey', markersize=1)
plt.plot(x2, y2, 'o', color='grey', markersize=1)
plt.plot(x3, y3, 'o', color='grey', markersize=1)
plt.plot(x4, y4, 'o', color='grey', markersize=1)
plt.show()

enter image description here

My goal is to use a clustering technique in order to be able to cluster each line that i have created in order to recognize the 5 different lines presented in the plot.

To do so i have opted for the GaussianMixture clustering algorithm which i thought it could be suitable for this sort of data distribution (a line could be seen as a very skewed distribution maybe). Here is what i wrote:

# Prepare data for clustering
X = np.concatenate((x0,x1,x2,x3,x4))
Y = np.concatenate((y0,y1,y2,y3,y4))
data = np.column_stack((X,Y))

# Cluster
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=5, covariance_type='full')
X_ = gmm.fit(data)
y_ = X_.predict(data)
print(set(y_))

And the output that i got is something like this (it changes every time i run the code though...):

# Plot clusters
plt.figure()
plt.plot(data[y_==0][:,0], data[y_==0][:,1], 'o', color='red')
plt.plot(data[y_==1][:,0], data[y_==1][:,1], 'o', color='blue')
plt.plot(data[y_==2][:,0], data[y_==2][:,1], 'o', color='lime')
plt.plot(data[y_==3][:,0], data[y_==3][:,1], 'o', color='orange')
plt.plot(data[y_==4][:,0], data[y_==4][:,1], 'o', color='cyan')
plt.show()

enter image description here

As you can see the colors should represent the 5 clusters (aka lines) that i have originally created but apparently the output is not what i want.

Could you please provide a better way to approach this problem? In case what i am doing is partially correct, what am i mistaken?

edited 3 hours ago

ttnphns

37.4k12132311

asked 5 hours ago

Federico Gentile

1184

New contributor

1

This appears to be a minor generalization of the problem discussed at stats.stackexchange.com/questions/33078.
â€“Â whuberâ™¦
4 hours ago

@whuber yeah maybe i should highlight the fact that i would really need to use a clustering algorithm such as the ones offered by scikit-learn or tensorflow
â€“Â Federico Gentile
3 hours ago

One problem with your data is that in different lines constituting your "zigzag" density of points is different. Must it be so in your data?
â€“Â ttnphns
3 hours ago

...in other words, your data consists of clusters-to-be of very different densities.
â€“Â ttnphns
3 hours ago

Please clarify what you mean by "supposed to be continuous:" does that imply you are given a sequence of points rather than an unordered set of them? How accurately are these points known--is there perhaps some random error in them?
â€“Â whuberâ™¦
2 hours ago

Â |Â
show 4 more comments

up vote
3
down vote

favorite

# Create data
n0, x0, y0 = 17, np.linspace(0, 11, n0), np.linspace(10, 3, n0)
n1, x1, y1 = 35, np.linspace(11, 27, n1), np.linspace(3, 3, n1)
n2, x2, y2 = 15, np.linspace(27, 35, n2), np.linspace(3, 15, n2)
n3, x3, y3 = 4, np.linspace(-1, 0, n3), np.linspace(10, 10, n3)
n4, x4, y4 = 10, np.linspace(35, 46, n4), np.linspace(15, 15.5,n4) 

# Plot data
plt.figure()
plt.plot(x0, y0, 'o', color='grey', markersize=1)
plt.plot(x1, y1, 'o', color='grey', markersize=1)
plt.plot(x2, y2, 'o', color='grey', markersize=1)
plt.plot(x3, y3, 'o', color='grey', markersize=1)
plt.plot(x4, y4, 'o', color='grey', markersize=1)
plt.show()

enter image description here

My goal is to use a clustering technique in order to be able to cluster each line that i have created in order to recognize the 5 different lines presented in the plot.

# Prepare data for clustering
X = np.concatenate((x0,x1,x2,x3,x4))
Y = np.concatenate((y0,y1,y2,y3,y4))
data = np.column_stack((X,Y))

# Cluster
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=5, covariance_type='full')
X_ = gmm.fit(data)
y_ = X_.predict(data)
print(set(y_))

And the output that i got is something like this (it changes every time i run the code though...):

# Plot clusters
plt.figure()
plt.plot(data[y_==0][:,0], data[y_==0][:,1], 'o', color='red')
plt.plot(data[y_==1][:,0], data[y_==1][:,1], 'o', color='blue')
plt.plot(data[y_==2][:,0], data[y_==2][:,1], 'o', color='lime')
plt.plot(data[y_==3][:,0], data[y_==3][:,1], 'o', color='orange')
plt.plot(data[y_==4][:,0], data[y_==4][:,1], 'o', color='cyan')
plt.show()

enter image description here

As you can see the colors should represent the 5 clusters (aka lines) that i have originally created but apparently the output is not what i want.

Could you please provide a better way to approach this problem? In case what i am doing is partially correct, what am i mistaken?

edited 3 hours ago

ttnphns

37.4k12132311

asked 5 hours ago

Federico Gentile

1184

New contributor

1

This appears to be a minor generalization of the problem discussed at stats.stackexchange.com/questions/33078.
â€“Â whuberâ™¦
4 hours ago

@whuber yeah maybe i should highlight the fact that i would really need to use a clustering algorithm such as the ones offered by scikit-learn or tensorflow
â€“Â Federico Gentile
3 hours ago

One problem with your data is that in different lines constituting your "zigzag" density of points is different. Must it be so in your data?
â€“Â ttnphns
3 hours ago

...in other words, your data consists of clusters-to-be of very different densities.
â€“Â ttnphns
3 hours ago

Please clarify what you mean by "supposed to be continuous:" does that imply you are given a sequence of points rather than an unordered set of them? How accurately are these points known--is there perhaps some random error in them?
â€“Â whuberâ™¦
2 hours ago

Â |Â
show 4 more comments

up vote
3
down vote

favorite

# Create data
n0, x0, y0 = 17, np.linspace(0, 11, n0), np.linspace(10, 3, n0)
n1, x1, y1 = 35, np.linspace(11, 27, n1), np.linspace(3, 3, n1)
n2, x2, y2 = 15, np.linspace(27, 35, n2), np.linspace(3, 15, n2)
n3, x3, y3 = 4, np.linspace(-1, 0, n3), np.linspace(10, 10, n3)
n4, x4, y4 = 10, np.linspace(35, 46, n4), np.linspace(15, 15.5,n4) 

# Plot data
plt.figure()
plt.plot(x0, y0, 'o', color='grey', markersize=1)
plt.plot(x1, y1, 'o', color='grey', markersize=1)
plt.plot(x2, y2, 'o', color='grey', markersize=1)
plt.plot(x3, y3, 'o', color='grey', markersize=1)
plt.plot(x4, y4, 'o', color='grey', markersize=1)
plt.show()

enter image description here

My goal is to use a clustering technique in order to be able to cluster each line that i have created in order to recognize the 5 different lines presented in the plot.

# Prepare data for clustering
X = np.concatenate((x0,x1,x2,x3,x4))
Y = np.concatenate((y0,y1,y2,y3,y4))
data = np.column_stack((X,Y))

# Cluster
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=5, covariance_type='full')
X_ = gmm.fit(data)
y_ = X_.predict(data)
print(set(y_))

And the output that i got is something like this (it changes every time i run the code though...):

# Plot clusters
plt.figure()
plt.plot(data[y_==0][:,0], data[y_==0][:,1], 'o', color='red')
plt.plot(data[y_==1][:,0], data[y_==1][:,1], 'o', color='blue')
plt.plot(data[y_==2][:,0], data[y_==2][:,1], 'o', color='lime')
plt.plot(data[y_==3][:,0], data[y_==3][:,1], 'o', color='orange')
plt.plot(data[y_==4][:,0], data[y_==4][:,1], 'o', color='cyan')
plt.show()

enter image description here

As you can see the colors should represent the 5 clusters (aka lines) that i have originally created but apparently the output is not what i want.

Could you please provide a better way to approach this problem? In case what i am doing is partially correct, what am i mistaken?

edited 3 hours ago

ttnphns

37.4k12132311

asked 5 hours ago

Federico Gentile

1184

New contributor

# Create data
n0, x0, y0 = 17, np.linspace(0, 11, n0), np.linspace(10, 3, n0)
n1, x1, y1 = 35, np.linspace(11, 27, n1), np.linspace(3, 3, n1)
n2, x2, y2 = 15, np.linspace(27, 35, n2), np.linspace(3, 15, n2)
n3, x3, y3 = 4, np.linspace(-1, 0, n3), np.linspace(10, 10, n3)
n4, x4, y4 = 10, np.linspace(35, 46, n4), np.linspace(15, 15.5,n4) 

# Plot data
plt.figure()
plt.plot(x0, y0, 'o', color='grey', markersize=1)
plt.plot(x1, y1, 'o', color='grey', markersize=1)
plt.plot(x2, y2, 'o', color='grey', markersize=1)
plt.plot(x3, y3, 'o', color='grey', markersize=1)
plt.plot(x4, y4, 'o', color='grey', markersize=1)
plt.show()

enter image description here

My goal is to use a clustering technique in order to be able to cluster each line that i have created in order to recognize the 5 different lines presented in the plot.

# Prepare data for clustering
X = np.concatenate((x0,x1,x2,x3,x4))
Y = np.concatenate((y0,y1,y2,y3,y4))
data = np.column_stack((X,Y))

# Cluster
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=5, covariance_type='full')
X_ = gmm.fit(data)
y_ = X_.predict(data)
print(set(y_))

And the output that i got is something like this (it changes every time i run the code though...):

# Plot clusters
plt.figure()
plt.plot(data[y_==0][:,0], data[y_==0][:,1], 'o', color='red')
plt.plot(data[y_==1][:,0], data[y_==1][:,1], 'o', color='blue')
plt.plot(data[y_==2][:,0], data[y_==2][:,1], 'o', color='lime')
plt.plot(data[y_==3][:,0], data[y_==3][:,1], 'o', color='orange')
plt.plot(data[y_==4][:,0], data[y_==4][:,1], 'o', color='cyan')
plt.show()

enter image description here

As you can see the colors should represent the 5 clusters (aka lines) that i have originally created but apparently the output is not what i want.

Could you please provide a better way to approach this problem? In case what i am doing is partially correct, what am i mistaken?

time-series clustering scikit-learn gaussian-mixture tensorflow

edited 3 hours ago

ttnphns

37.4k12132311

asked 5 hours ago

Federico Gentile

1184

New contributor

edited 3 hours ago

ttnphns

37.4k12132311

asked 5 hours ago

Federico Gentile

1184

New contributor

edited 3 hours ago

ttnphns

37.4k12132311

edited 3 hours ago

ttnphns

37.4k12132311

edited 3 hours ago

ttnphns

37.4k12132311

asked 5 hours ago

Federico Gentile

1184

New contributor

asked 5 hours ago

Federico Gentile

1184

asked 5 hours ago

Federico Gentile

1184

New contributor

Federico Gentile is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

This appears to be a minor generalization of the problem discussed at stats.stackexchange.com/questions/33078.
â€“Â whuberâ™¦
4 hours ago

@whuber yeah maybe i should highlight the fact that i would really need to use a clustering algorithm such as the ones offered by scikit-learn or tensorflow
â€“Â Federico Gentile
3 hours ago

One problem with your data is that in different lines constituting your "zigzag" density of points is different. Must it be so in your data?
â€“Â ttnphns
3 hours ago

...in other words, your data consists of clusters-to-be of very different densities.
â€“Â ttnphns
3 hours ago

Please clarify what you mean by "supposed to be continuous:" does that imply you are given a sequence of points rather than an unordered set of them? How accurately are these points known--is there perhaps some random error in them?
â€“Â whuberâ™¦
2 hours ago

Â |Â
show 4 more comments

1

This appears to be a minor generalization of the problem discussed at stats.stackexchange.com/questions/33078.
â€“Â whuberâ™¦
4 hours ago

@whuber yeah maybe i should highlight the fact that i would really need to use a clustering algorithm such as the ones offered by scikit-learn or tensorflow
â€“Â Federico Gentile
3 hours ago

One problem with your data is that in different lines constituting your "zigzag" density of points is different. Must it be so in your data?
â€“Â ttnphns
3 hours ago

...in other words, your data consists of clusters-to-be of very different densities.
â€“Â ttnphns
3 hours ago

Please clarify what you mean by "supposed to be continuous:" does that imply you are given a sequence of points rather than an unordered set of them? How accurately are these points known--is there perhaps some random error in them?
â€“Â whuberâ™¦
2 hours ago

This appears to be a minor generalization of the problem discussed at stats.stackexchange.com/questions/33078.
â€“Â whuberâ™¦
4 hours ago

@whuber yeah maybe i should highlight the fact that i would really need to use a clustering algorithm such as the ones offered by scikit-learn or tensorflow
â€“Â Federico Gentile
3 hours ago

One problem with your data is that in different lines constituting your "zigzag" density of points is different. Must it be so in your data?
â€“Â ttnphns
3 hours ago

...in other words, your data consists of clusters-to-be of very different densities.
â€“Â ttnphns
3 hours ago

Please clarify what you mean by "supposed to be continuous:" does that imply you are given a sequence of points rather than an unordered set of them? How accurately are these points known--is there perhaps some random error in them?
â€“Â whuberâ™¦
2 hours ago

Â |Â
show 4 more comments

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

The data appear to be a sequence of $(x,y)$ coordinates and the question concerns how to cluster them into linear segments.

Solution

A simple method uses a brute-force (but efficient) version of the Hough Transform.

The basic idea is that every line in the plane can be parameterized by its distance to a fixed origin and the direction perpendicular to it. Each ordered pair of distinct points, $(x_i,y_i)$ and $(x_i+1,y_i+1),$ determines a line and therefore may be encoded as a (distance, angle) ordered pair. Sequences of points lying on the same line can then be easily recognized as stationary sequences of (distance, angle) pairs. Simply cluster these.

For numerical stability, it is wise to use an origin adapted to the data--perhaps their barycenter.

There are many ways to proceed from this point. For instance, a kernel density estimate of the (distance, angle) pairs is tantamount to the Hough transform itself. After removing outliers, any reasonable clustering method for 2D Euclidean points should readily identify all the segments.

Remark

If the data do not have a definite sequence, compute the Hough transform for all possible pairs $(x_i,y_i)$ and $(x_j,y_j).$ This is more computationally demanding and will produce a much messier transform, but nevertheless any lines formed by more than a few of the points will correspond to the highest-density regions of the transform.

Example

The question presents a sequence of 81 points (not all distinct). Here are portions of the derived information described above:

 x y id Distance Angle
1 0.0000000 10.000000 1 7.999634 -2.1375255
2 0.6875000 9.562500 1 7.999634 -2.1375255
3 1.3750000 9.125000 1 7.999634 -2.1375255
...
16 10.3125000 3.437500 1 7.999634 -2.1375255
17 11.0000000 3.000000 1 NaN 0.0000000
18 11.0000000 3.000000 2 3.703704 -1.5707963
19 11.4705882 3.000000 2 3.703704 -1.5707963
...
51 26.5294118 3.000000 2 3.703704 -1.5707963
52 27.0000000 3.000000 2 NaN 0.0000000
53 27.0000000 3.000000 3 7.812028 -0.5880026
54 27.5714286 3.857143 3 7.812028 -0.5880026
...
66 34.4285714 14.142857 3 7.812028 -0.5880026
67 35.0000000 15.000000 3 6.164931 1.7088024
68 -1.0000000 10.000000 4 -3.296296 -1.5707963
69 -0.6666667 10.000000 4 -3.296296 -1.5707963
70 -0.3333333 10.000000 4 -3.296296 -1.5707963
71 0.0000000 10.000000 4 -6.102943 -1.4288993
72 35.0000000 15.000000 5 -7.610268 -1.5253730
...
80 44.7777778 15.444444 5 -7.610268 -1.5253730

The regions of stationary (distance, angle) values are clear and the breaks between them are obvious.

Here is a plot of the (distance, angle) pairs, colored according to the original segments created in the question:

I applied a hierarchical clustering solution (using the hclust function in R with the default complete linkage) and used that to classify the original data (except for the last point). Here is the classification, shown with colors:

answered 1 hour ago

whuberâ™¦

199k32430795

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Federico Gentile is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f375787%2fhow-to-cluster-parts-of-broken-line-made-of-points%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

The data appear to be a sequence of $(x,y)$ coordinates and the question concerns how to cluster them into linear segments.

Solution

A simple method uses a brute-force (but efficient) version of the Hough Transform.

For numerical stability, it is wise to use an origin adapted to the data--perhaps their barycenter.

Remark

Example

The question presents a sequence of 81 points (not all distinct). Here are portions of the derived information described above:

 x y id Distance Angle
1 0.0000000 10.000000 1 7.999634 -2.1375255
2 0.6875000 9.562500 1 7.999634 -2.1375255
3 1.3750000 9.125000 1 7.999634 -2.1375255
...
16 10.3125000 3.437500 1 7.999634 -2.1375255
17 11.0000000 3.000000 1 NaN 0.0000000
18 11.0000000 3.000000 2 3.703704 -1.5707963
19 11.4705882 3.000000 2 3.703704 -1.5707963
...
51 26.5294118 3.000000 2 3.703704 -1.5707963
52 27.0000000 3.000000 2 NaN 0.0000000
53 27.0000000 3.000000 3 7.812028 -0.5880026
54 27.5714286 3.857143 3 7.812028 -0.5880026
...
66 34.4285714 14.142857 3 7.812028 -0.5880026
67 35.0000000 15.000000 3 6.164931 1.7088024
68 -1.0000000 10.000000 4 -3.296296 -1.5707963
69 -0.6666667 10.000000 4 -3.296296 -1.5707963
70 -0.3333333 10.000000 4 -3.296296 -1.5707963
71 0.0000000 10.000000 4 -6.102943 -1.4288993
72 35.0000000 15.000000 5 -7.610268 -1.5253730
...
80 44.7777778 15.444444 5 -7.610268 -1.5253730

The regions of stationary (distance, angle) values are clear and the breaks between them are obvious.

Here is a plot of the (distance, angle) pairs, colored according to the original segments created in the question:

answered 1 hour ago

whuberâ™¦

199k32430795

add a commentÂ |Â

up vote
3
down vote

accepted

The data appear to be a sequence of $(x,y)$ coordinates and the question concerns how to cluster them into linear segments.

Solution

A simple method uses a brute-force (but efficient) version of the Hough Transform.

For numerical stability, it is wise to use an origin adapted to the data--perhaps their barycenter.

Remark

Example

The question presents a sequence of 81 points (not all distinct). Here are portions of the derived information described above:

 x y id Distance Angle
1 0.0000000 10.000000 1 7.999634 -2.1375255
2 0.6875000 9.562500 1 7.999634 -2.1375255
3 1.3750000 9.125000 1 7.999634 -2.1375255
...
16 10.3125000 3.437500 1 7.999634 -2.1375255
17 11.0000000 3.000000 1 NaN 0.0000000
18 11.0000000 3.000000 2 3.703704 -1.5707963
19 11.4705882 3.000000 2 3.703704 -1.5707963
...
51 26.5294118 3.000000 2 3.703704 -1.5707963
52 27.0000000 3.000000 2 NaN 0.0000000
53 27.0000000 3.000000 3 7.812028 -0.5880026
54 27.5714286 3.857143 3 7.812028 -0.5880026
...
66 34.4285714 14.142857 3 7.812028 -0.5880026
67 35.0000000 15.000000 3 6.164931 1.7088024
68 -1.0000000 10.000000 4 -3.296296 -1.5707963
69 -0.6666667 10.000000 4 -3.296296 -1.5707963
70 -0.3333333 10.000000 4 -3.296296 -1.5707963
71 0.0000000 10.000000 4 -6.102943 -1.4288993
72 35.0000000 15.000000 5 -7.610268 -1.5253730
...
80 44.7777778 15.444444 5 -7.610268 -1.5253730

The regions of stationary (distance, angle) values are clear and the breaks between them are obvious.

Here is a plot of the (distance, angle) pairs, colored according to the original segments created in the question:

answered 1 hour ago

whuberâ™¦

199k32430795

add a commentÂ |Â

up vote
3
down vote

accepted

The data appear to be a sequence of $(x,y)$ coordinates and the question concerns how to cluster them into linear segments.

Solution

A simple method uses a brute-force (but efficient) version of the Hough Transform.

For numerical stability, it is wise to use an origin adapted to the data--perhaps their barycenter.

Remark

Example

The question presents a sequence of 81 points (not all distinct). Here are portions of the derived information described above:

 x y id Distance Angle
1 0.0000000 10.000000 1 7.999634 -2.1375255
2 0.6875000 9.562500 1 7.999634 -2.1375255
3 1.3750000 9.125000 1 7.999634 -2.1375255
...
16 10.3125000 3.437500 1 7.999634 -2.1375255
17 11.0000000 3.000000 1 NaN 0.0000000
18 11.0000000 3.000000 2 3.703704 -1.5707963
19 11.4705882 3.000000 2 3.703704 -1.5707963
...
51 26.5294118 3.000000 2 3.703704 -1.5707963
52 27.0000000 3.000000 2 NaN 0.0000000
53 27.0000000 3.000000 3 7.812028 -0.5880026
54 27.5714286 3.857143 3 7.812028 -0.5880026
...
66 34.4285714 14.142857 3 7.812028 -0.5880026
67 35.0000000 15.000000 3 6.164931 1.7088024
68 -1.0000000 10.000000 4 -3.296296 -1.5707963
69 -0.6666667 10.000000 4 -3.296296 -1.5707963
70 -0.3333333 10.000000 4 -3.296296 -1.5707963
71 0.0000000 10.000000 4 -6.102943 -1.4288993
72 35.0000000 15.000000 5 -7.610268 -1.5253730
...
80 44.7777778 15.444444 5 -7.610268 -1.5253730

The regions of stationary (distance, angle) values are clear and the breaks between them are obvious.

Here is a plot of the (distance, angle) pairs, colored according to the original segments created in the question:

answered 1 hour ago

whuberâ™¦

199k32430795

The data appear to be a sequence of $(x,y)$ coordinates and the question concerns how to cluster them into linear segments.

Solution

A simple method uses a brute-force (but efficient) version of the Hough Transform.

For numerical stability, it is wise to use an origin adapted to the data--perhaps their barycenter.

Remark

Example

The question presents a sequence of 81 points (not all distinct). Here are portions of the derived information described above:

 x y id Distance Angle
1 0.0000000 10.000000 1 7.999634 -2.1375255
2 0.6875000 9.562500 1 7.999634 -2.1375255
3 1.3750000 9.125000 1 7.999634 -2.1375255
...
16 10.3125000 3.437500 1 7.999634 -2.1375255
17 11.0000000 3.000000 1 NaN 0.0000000
18 11.0000000 3.000000 2 3.703704 -1.5707963
19 11.4705882 3.000000 2 3.703704 -1.5707963
...
51 26.5294118 3.000000 2 3.703704 -1.5707963
52 27.0000000 3.000000 2 NaN 0.0000000
53 27.0000000 3.000000 3 7.812028 -0.5880026
54 27.5714286 3.857143 3 7.812028 -0.5880026
...
66 34.4285714 14.142857 3 7.812028 -0.5880026
67 35.0000000 15.000000 3 6.164931 1.7088024
68 -1.0000000 10.000000 4 -3.296296 -1.5707963
69 -0.6666667 10.000000 4 -3.296296 -1.5707963
70 -0.3333333 10.000000 4 -3.296296 -1.5707963
71 0.0000000 10.000000 4 -6.102943 -1.4288993
72 35.0000000 15.000000 5 -7.610268 -1.5253730
...
80 44.7777778 15.444444 5 -7.610268 -1.5253730

The regions of stationary (distance, angle) values are clear and the breaks between them are obvious.

Here is a plot of the (distance, angle) pairs, colored according to the original segments created in the question:

answered 1 hour ago

whuberâ™¦

199k32430795

answered 1 hour ago

whuberâ™¦

199k32430795

answered 1 hour ago

whuberâ™¦

199k32430795

answered 1 hour ago

whuberâ™¦

199k32430795

add a commentÂ |Â

Federico Gentile is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Federico Gentile is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky