STRING_AGG not behaving as expected

Clash Royale CLAN TAG#URR8PPP
up vote
10
down vote
favorite
I have the following query:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]
I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4
Instead I am getting the below output (the 4 values are separated by and):
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4
What am I missing?
Here is another example:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b
Is this a bug in SQL Server?
sql
 |Â
show 2 more comments
up vote
10
down vote
favorite
I have the following query:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]
I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4
Instead I am getting the below output (the 4 values are separated by and):
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4
What am I missing?
Here is another example:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b
Is this a bug in SQL Server?
sql
1
Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
â dnoeth
4 hours ago
This is a beauty of an optimizer bug. Simpler and more striking repro:CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages](uses the1234567case) andCASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages](leave outELSE-- now the match fails and the expression becomesNULL). No matter what the "correct" result should be, surely that's not it.
â Jeroen Mostert
4 hours ago
1
Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
â George Menoutis
4 hours ago
In the resulting execution plan, the secondSTRING_AGGis entirely missing and instead the expression is rebound to the first one, as if theCASEhad saidSTRING_AGG([Language], ' and ')twice. Any subsequentCASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.
â Jeroen Mostert
4 hours ago
1
This bug seems to be specifically tuned toSTRING_AGG. If theELSEis changed to'blargh' + STRING_AGG(...), you'll get'blarghFrench and German..., so it improperly unifies the secondSTRING_AGGwith the first. The simplest workaround is to change theELSEexpression toSTRING_AGG([Language] + '', ', ')-- this defeats CSE, suggesting there's a bug where CSE ignores the second argument toSTRING_AGG.
â Jeroen Mostert
4 hours ago
 |Â
show 2 more comments
up vote
10
down vote
favorite
up vote
10
down vote
favorite
I have the following query:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]
I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4
Instead I am getting the below output (the 4 values are separated by and):
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4
What am I missing?
Here is another example:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b
Is this a bug in SQL Server?
sql
I have the following query:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]
I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4
Instead I am getting the below output (the 4 values are separated by and):
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4
What am I missing?
Here is another example:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b
Is this a bug in SQL Server?
sql
sql
edited 6 mins ago
Salman A
165k63325410
165k63325410
asked 4 hours ago
Tom Hunter
3,20573869
3,20573869
1
Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
â dnoeth
4 hours ago
This is a beauty of an optimizer bug. Simpler and more striking repro:CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages](uses the1234567case) andCASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages](leave outELSE-- now the match fails and the expression becomesNULL). No matter what the "correct" result should be, surely that's not it.
â Jeroen Mostert
4 hours ago
1
Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
â George Menoutis
4 hours ago
In the resulting execution plan, the secondSTRING_AGGis entirely missing and instead the expression is rebound to the first one, as if theCASEhad saidSTRING_AGG([Language], ' and ')twice. Any subsequentCASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.
â Jeroen Mostert
4 hours ago
1
This bug seems to be specifically tuned toSTRING_AGG. If theELSEis changed to'blargh' + STRING_AGG(...), you'll get'blarghFrench and German..., so it improperly unifies the secondSTRING_AGGwith the first. The simplest workaround is to change theELSEexpression toSTRING_AGG([Language] + '', ', ')-- this defeats CSE, suggesting there's a bug where CSE ignores the second argument toSTRING_AGG.
â Jeroen Mostert
4 hours ago
 |Â
show 2 more comments
1
Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
â dnoeth
4 hours ago
This is a beauty of an optimizer bug. Simpler and more striking repro:CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages](uses the1234567case) andCASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages](leave outELSE-- now the match fails and the expression becomesNULL). No matter what the "correct" result should be, surely that's not it.
â Jeroen Mostert
4 hours ago
1
Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
â George Menoutis
4 hours ago
In the resulting execution plan, the secondSTRING_AGGis entirely missing and instead the expression is rebound to the first one, as if theCASEhad saidSTRING_AGG([Language], ' and ')twice. Any subsequentCASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.
â Jeroen Mostert
4 hours ago
1
This bug seems to be specifically tuned toSTRING_AGG. If theELSEis changed to'blargh' + STRING_AGG(...), you'll get'blarghFrench and German..., so it improperly unifies the secondSTRING_AGGwith the first. The simplest workaround is to change theELSEexpression toSTRING_AGG([Language] + '', ', ')-- this defeats CSE, suggesting there's a bug where CSE ignores the second argument toSTRING_AGG.
â Jeroen Mostert
4 hours ago
1
1
Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
â dnoeth
4 hours ago
Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
â dnoeth
4 hours ago
This is a beauty of an optimizer bug. Simpler and more striking repro:
CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages] (uses the 1234567 case) and CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages] (leave out ELSE -- now the match fails and the expression becomes NULL). No matter what the "correct" result should be, surely that's not it.â Jeroen Mostert
4 hours ago
This is a beauty of an optimizer bug. Simpler and more striking repro:
CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages] (uses the 1234567 case) and CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages] (leave out ELSE -- now the match fails and the expression becomes NULL). No matter what the "correct" result should be, surely that's not it.â Jeroen Mostert
4 hours ago
1
1
Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
â George Menoutis
4 hours ago
Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
â George Menoutis
4 hours ago
In the resulting execution plan, the second
STRING_AGG is entirely missing and instead the expression is rebound to the first one, as if the CASE had said STRING_AGG([Language], ' and ') twice. Any subsequent CASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.â Jeroen Mostert
4 hours ago
In the resulting execution plan, the second
STRING_AGG is entirely missing and instead the expression is rebound to the first one, as if the CASE had said STRING_AGG([Language], ' and ') twice. Any subsequent CASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.â Jeroen Mostert
4 hours ago
1
1
This bug seems to be specifically tuned to
STRING_AGG. If the ELSE is changed to 'blargh' + STRING_AGG(...), you'll get 'blarghFrench and German..., so it improperly unifies the second STRING_AGG with the first. The simplest workaround is to change the ELSE expression to STRING_AGG([Language] + '', ', ') -- this defeats CSE, suggesting there's a bug where CSE ignores the second argument to STRING_AGG.â Jeroen Mostert
4 hours ago
This bug seems to be specifically tuned to
STRING_AGG. If the ELSE is changed to 'blargh' + STRING_AGG(...), you'll get 'blarghFrench and German..., so it improperly unifies the second STRING_AGG with the first. The simplest workaround is to change the ELSE expression to STRING_AGG([Language] + '', ', ') -- this defeats CSE, suggesting there's a bug where CSE ignores the second argument to STRING_AGG.â Jeroen Mostert
4 hours ago
 |Â
show 2 more comments
2 Answers
2
active
oldest
votes
up vote
10
down vote
Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.
One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
add a comment |Â
up vote
0
down vote
Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x (Country, Language)
), results AS (
SELECT
Country,
COUNT(Language) AS LanguageCount,
STRING_AGG(Language, ', ') AS Languages
FROM cteCountryLanguageMapping
GROUP BY Country
)
SELECT Country, LanguageCount, CASE LanguageCount
WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
ELSE Languages
END AS Languages_Fixed
FROM results
Result:
| Country | LanguageCount | Languages_Fixed |
|-------------|---------------|----------------------------------|
| Spain | 2 | Spanish and English |
| Sweden | 1 | English |
| Switzerland | 4 | French, German, Italian, English |
DB Fiddle
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
10
down vote
Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.
One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
add a comment |Â
up vote
10
down vote
Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.
One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
add a comment |Â
up vote
10
down vote
up vote
10
down vote
Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.
One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.
One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
edited 4 hours ago
Dan Guzman
21.7k31538
21.7k31538
answered 4 hours ago
Jeroen Mostert
16.3k1949
16.3k1949
add a comment |Â
add a comment |Â
up vote
0
down vote
Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x (Country, Language)
), results AS (
SELECT
Country,
COUNT(Language) AS LanguageCount,
STRING_AGG(Language, ', ') AS Languages
FROM cteCountryLanguageMapping
GROUP BY Country
)
SELECT Country, LanguageCount, CASE LanguageCount
WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
ELSE Languages
END AS Languages_Fixed
FROM results
Result:
| Country | LanguageCount | Languages_Fixed |
|-------------|---------------|----------------------------------|
| Spain | 2 | Spanish and English |
| Sweden | 1 | English |
| Switzerland | 4 | French, German, Italian, English |
DB Fiddle
add a comment |Â
up vote
0
down vote
Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x (Country, Language)
), results AS (
SELECT
Country,
COUNT(Language) AS LanguageCount,
STRING_AGG(Language, ', ') AS Languages
FROM cteCountryLanguageMapping
GROUP BY Country
)
SELECT Country, LanguageCount, CASE LanguageCount
WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
ELSE Languages
END AS Languages_Fixed
FROM results
Result:
| Country | LanguageCount | Languages_Fixed |
|-------------|---------------|----------------------------------|
| Spain | 2 | Spanish and English |
| Sweden | 1 | English |
| Switzerland | 4 | French, German, Italian, English |
DB Fiddle
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x (Country, Language)
), results AS (
SELECT
Country,
COUNT(Language) AS LanguageCount,
STRING_AGG(Language, ', ') AS Languages
FROM cteCountryLanguageMapping
GROUP BY Country
)
SELECT Country, LanguageCount, CASE LanguageCount
WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
ELSE Languages
END AS Languages_Fixed
FROM results
Result:
| Country | LanguageCount | Languages_Fixed |
|-------------|---------------|----------------------------------|
| Spain | 2 | Spanish and English |
| Sweden | 1 | English |
| Switzerland | 4 | French, German, Italian, English |
DB Fiddle
Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x (Country, Language)
), results AS (
SELECT
Country,
COUNT(Language) AS LanguageCount,
STRING_AGG(Language, ', ') AS Languages
FROM cteCountryLanguageMapping
GROUP BY Country
)
SELECT Country, LanguageCount, CASE LanguageCount
WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
ELSE Languages
END AS Languages_Fixed
FROM results
Result:
| Country | LanguageCount | Languages_Fixed |
|-------------|---------------|----------------------------------|
| Spain | 2 | Spanish and English |
| Sweden | 1 | English |
| Switzerland | 4 | French, German, Italian, English |
DB Fiddle
answered 15 mins ago
Salman A
165k63325410
165k63325410
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52533487%2fstring-agg-not-behaving-as-expected%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password

1
Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
â dnoeth
4 hours ago
This is a beauty of an optimizer bug. Simpler and more striking repro:
CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages](uses the1234567case) andCASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages](leave outELSE-- now the match fails and the expression becomesNULL). No matter what the "correct" result should be, surely that's not it.â Jeroen Mostert
4 hours ago
1
Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
â George Menoutis
4 hours ago
In the resulting execution plan, the second
STRING_AGGis entirely missing and instead the expression is rebound to the first one, as if theCASEhad saidSTRING_AGG([Language], ' and ')twice. Any subsequentCASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.â Jeroen Mostert
4 hours ago
1
This bug seems to be specifically tuned to
STRING_AGG. If theELSEis changed to'blargh' + STRING_AGG(...), you'll get'blarghFrench and German..., so it improperly unifies the secondSTRING_AGGwith the first. The simplest workaround is to change theELSEexpression toSTRING_AGG([Language] + '', ', ')-- this defeats CSE, suggesting there's a bug where CSE ignores the second argument toSTRING_AGG.â Jeroen Mostert
4 hours ago