STRING_AGG not behaving as expected

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
10
down vote

favorite
3












I have the following query:



WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]


I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:



 | Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4


Instead I am getting the below output (the 4 values are separated by and):



 | Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4


What am I missing?




Here is another example:



SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y

| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b


Is this a bug in SQL Server?










share|improve this question



















  • 1




    Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
    – dnoeth
    4 hours ago










  • This is a beauty of an optimizer bug. Simpler and more striking repro: CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages] (uses the 1234567 case) and CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages] (leave out ELSE -- now the match fails and the expression becomes NULL). No matter what the "correct" result should be, surely that's not it.
    – Jeroen Mostert
    4 hours ago






  • 1




    Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
    – George Menoutis
    4 hours ago











  • In the resulting execution plan, the second STRING_AGG is entirely missing and instead the expression is rebound to the first one, as if the CASE had said STRING_AGG([Language], ' and ') twice. Any subsequent CASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.
    – Jeroen Mostert
    4 hours ago






  • 1




    This bug seems to be specifically tuned to STRING_AGG. If the ELSE is changed to 'blargh' + STRING_AGG(...), you'll get 'blarghFrench and German..., so it improperly unifies the second STRING_AGG with the first. The simplest workaround is to change the ELSE expression to STRING_AGG([Language] + '', ', ') -- this defeats CSE, suggesting there's a bug where CSE ignores the second argument to STRING_AGG.
    – Jeroen Mostert
    4 hours ago














up vote
10
down vote

favorite
3












I have the following query:



WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]


I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:



 | Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4


Instead I am getting the below output (the 4 values are separated by and):



 | Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4


What am I missing?




Here is another example:



SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y

| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b


Is this a bug in SQL Server?










share|improve this question



















  • 1




    Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
    – dnoeth
    4 hours ago










  • This is a beauty of an optimizer bug. Simpler and more striking repro: CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages] (uses the 1234567 case) and CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages] (leave out ELSE -- now the match fails and the expression becomes NULL). No matter what the "correct" result should be, surely that's not it.
    – Jeroen Mostert
    4 hours ago






  • 1




    Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
    – George Menoutis
    4 hours ago











  • In the resulting execution plan, the second STRING_AGG is entirely missing and instead the expression is rebound to the first one, as if the CASE had said STRING_AGG([Language], ' and ') twice. Any subsequent CASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.
    – Jeroen Mostert
    4 hours ago






  • 1




    This bug seems to be specifically tuned to STRING_AGG. If the ELSE is changed to 'blargh' + STRING_AGG(...), you'll get 'blarghFrench and German..., so it improperly unifies the second STRING_AGG with the first. The simplest workaround is to change the ELSE expression to STRING_AGG([Language] + '', ', ') -- this defeats CSE, suggesting there's a bug where CSE ignores the second argument to STRING_AGG.
    – Jeroen Mostert
    4 hours ago












up vote
10
down vote

favorite
3









up vote
10
down vote

favorite
3






3





I have the following query:



WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]


I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:



 | Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4


Instead I am getting the below output (the 4 values are separated by and):



 | Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4


What am I missing?




Here is another example:



SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y

| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b


Is this a bug in SQL Server?










share|improve this question















I have the following query:



WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]


I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:



 | Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4


Instead I am getting the below output (the 4 values are separated by and):



 | Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4


What am I missing?




Here is another example:



SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y

| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b


Is this a bug in SQL Server?







sql sql-server sql-server-2017 string-aggregation






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 6 mins ago









Salman A

165k63325410




165k63325410










asked 4 hours ago









Tom Hunter

3,20573869




3,20573869







  • 1




    Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
    – dnoeth
    4 hours ago










  • This is a beauty of an optimizer bug. Simpler and more striking repro: CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages] (uses the 1234567 case) and CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages] (leave out ELSE -- now the match fails and the expression becomes NULL). No matter what the "correct" result should be, surely that's not it.
    – Jeroen Mostert
    4 hours ago






  • 1




    Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
    – George Menoutis
    4 hours ago











  • In the resulting execution plan, the second STRING_AGG is entirely missing and instead the expression is rebound to the first one, as if the CASE had said STRING_AGG([Language], ' and ') twice. Any subsequent CASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.
    – Jeroen Mostert
    4 hours ago






  • 1




    This bug seems to be specifically tuned to STRING_AGG. If the ELSE is changed to 'blargh' + STRING_AGG(...), you'll get 'blarghFrench and German..., so it improperly unifies the second STRING_AGG with the first. The simplest workaround is to change the ELSE expression to STRING_AGG([Language] + '', ', ') -- this defeats CSE, suggesting there's a bug where CSE ignores the second argument to STRING_AGG.
    – Jeroen Mostert
    4 hours ago












  • 1




    Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
    – dnoeth
    4 hours ago










  • This is a beauty of an optimizer bug. Simpler and more striking repro: CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages] (uses the 1234567 case) and CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages] (leave out ELSE -- now the match fails and the expression becomes NULL). No matter what the "correct" result should be, surely that's not it.
    – Jeroen Mostert
    4 hours ago






  • 1




    Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
    – George Menoutis
    4 hours ago











  • In the resulting execution plan, the second STRING_AGG is entirely missing and instead the expression is rebound to the first one, as if the CASE had said STRING_AGG([Language], ' and ') twice. Any subsequent CASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.
    – Jeroen Mostert
    4 hours ago






  • 1




    This bug seems to be specifically tuned to STRING_AGG. If the ELSE is changed to 'blargh' + STRING_AGG(...), you'll get 'blarghFrench and German..., so it improperly unifies the second STRING_AGG with the first. The simplest workaround is to change the ELSE expression to STRING_AGG([Language] + '', ', ') -- this defeats CSE, suggesting there's a bug where CSE ignores the second argument to STRING_AGG.
    – Jeroen Mostert
    4 hours ago







1




1




Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
– dnoeth
4 hours ago




Seems to be a bug, it's always returning the 1st STRING_AGG no matter how you write the CASE
– dnoeth
4 hours ago












This is a beauty of an optimizer bug. Simpler and more striking repro: CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages] (uses the 1234567 case) and CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages] (leave out ELSE -- now the match fails and the expression becomes NULL). No matter what the "correct" result should be, surely that's not it.
– Jeroen Mostert
4 hours ago




This is a beauty of an optimizer bug. Simpler and more striking repro: CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') ELSE STRING_AGG([Language], ', ') END AS [Languages] (uses the 1234567 case) and CASE COUNT([Language]) WHEN 1234567 THEN STRING_AGG([Language], ' and ') END AS [Languages] (leave out ELSE -- now the match fails and the expression becomes NULL). No matter what the "correct" result should be, surely that's not it.
– Jeroen Mostert
4 hours ago




1




1




Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
– George Menoutis
4 hours ago





Not sure this is appropriate, but....LOL! Since this is not a lack of knowledge on your part, rather than a real thing that many would come across, I strongly recommend changing the question title to be more accurate than a generic "not working as intended", in order to maximize benefit for all.
– George Menoutis
4 hours ago













In the resulting execution plan, the second STRING_AGG is entirely missing and instead the expression is rebound to the first one, as if the CASE had said STRING_AGG([Language], ' and ') twice. Any subsequent CASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.
– Jeroen Mostert
4 hours ago




In the resulting execution plan, the second STRING_AGG is entirely missing and instead the expression is rebound to the first one, as if the CASE had said STRING_AGG([Language], ' and ') twice. Any subsequent CASEs are absorbed as well. Looks like something very weird is going on with subexpression elimination.
– Jeroen Mostert
4 hours ago




1




1




This bug seems to be specifically tuned to STRING_AGG. If the ELSE is changed to 'blargh' + STRING_AGG(...), you'll get 'blarghFrench and German..., so it improperly unifies the second STRING_AGG with the first. The simplest workaround is to change the ELSE expression to STRING_AGG([Language] + '', ', ') -- this defeats CSE, suggesting there's a bug where CSE ignores the second argument to STRING_AGG.
– Jeroen Mostert
4 hours ago




This bug seems to be specifically tuned to STRING_AGG. If the ELSE is changed to 'blargh' + STRING_AGG(...), you'll get 'blarghFrench and German..., so it improperly unifies the second STRING_AGG with the first. The simplest workaround is to change the ELSE expression to STRING_AGG([Language] + '', ', ') -- this defeats CSE, suggesting there's a bug where CSE ignores the second argument to STRING_AGG.
– Jeroen Mostert
4 hours ago












2 Answers
2






active

oldest

votes

















up vote
10
down vote













Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.



One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:



SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y





share|improve this answer





























    up vote
    0
    down vote













    Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:



    WITH cteCountryLanguageMapping AS (
    SELECT * FROM (
    VALUES
    ('Spain', 'English'),
    ('Spain', 'Spanish'),
    ('Sweden', 'English'),
    ('Switzerland', 'English'),
    ('Switzerland', 'French'),
    ('Switzerland', 'German'),
    ('Switzerland', 'Italian')
    ) x (Country, Language)
    ), results AS (
    SELECT
    Country,
    COUNT(Language) AS LanguageCount,
    STRING_AGG(Language, ', ') AS Languages
    FROM cteCountryLanguageMapping
    GROUP BY Country
    )
    SELECT Country, LanguageCount, CASE LanguageCount
    WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
    ELSE Languages
    END AS Languages_Fixed
    FROM results


    Result:



    | Country | LanguageCount | Languages_Fixed |
    |-------------|---------------|----------------------------------|
    | Spain | 2 | Spanish and English |
    | Sweden | 1 | English |
    | Switzerland | 4 | French, German, Italian, English |


    DB Fiddle






    share|improve this answer




















      Your Answer





      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52533487%2fstring-agg-not-behaving-as-expected%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      10
      down vote













      Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.



      One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:



      SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
      FROM (
      VALUES
      (1, 'a'),
      (1, 'b')
      ) x (y, z)
      GROUP by y





      share|improve this answer


























        up vote
        10
        down vote













        Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.



        One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:



        SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
        FROM (
        VALUES
        (1, 'a'),
        (1, 'b')
        ) x (y, z)
        GROUP by y





        share|improve this answer
























          up vote
          10
          down vote










          up vote
          10
          down vote









          Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.



          One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:



          SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
          FROM (
          VALUES
          (1, 'a'),
          (1, 'b')
          ) x (y, z)
          GROUP by y





          share|improve this answer














          Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.



          One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:



          SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
          FROM (
          VALUES
          (1, 'a'),
          (1, 'b')
          ) x (y, z)
          GROUP by y






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 4 hours ago









          Dan Guzman

          21.7k31538




          21.7k31538










          answered 4 hours ago









          Jeroen Mostert

          16.3k1949




          16.3k1949






















              up vote
              0
              down vote













              Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:



              WITH cteCountryLanguageMapping AS (
              SELECT * FROM (
              VALUES
              ('Spain', 'English'),
              ('Spain', 'Spanish'),
              ('Sweden', 'English'),
              ('Switzerland', 'English'),
              ('Switzerland', 'French'),
              ('Switzerland', 'German'),
              ('Switzerland', 'Italian')
              ) x (Country, Language)
              ), results AS (
              SELECT
              Country,
              COUNT(Language) AS LanguageCount,
              STRING_AGG(Language, ', ') AS Languages
              FROM cteCountryLanguageMapping
              GROUP BY Country
              )
              SELECT Country, LanguageCount, CASE LanguageCount
              WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
              ELSE Languages
              END AS Languages_Fixed
              FROM results


              Result:



              | Country | LanguageCount | Languages_Fixed |
              |-------------|---------------|----------------------------------|
              | Spain | 2 | Spanish and English |
              | Sweden | 1 | English |
              | Switzerland | 4 | French, German, Italian, English |


              DB Fiddle






              share|improve this answer
























                up vote
                0
                down vote













                Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:



                WITH cteCountryLanguageMapping AS (
                SELECT * FROM (
                VALUES
                ('Spain', 'English'),
                ('Spain', 'Spanish'),
                ('Sweden', 'English'),
                ('Switzerland', 'English'),
                ('Switzerland', 'French'),
                ('Switzerland', 'German'),
                ('Switzerland', 'Italian')
                ) x (Country, Language)
                ), results AS (
                SELECT
                Country,
                COUNT(Language) AS LanguageCount,
                STRING_AGG(Language, ', ') AS Languages
                FROM cteCountryLanguageMapping
                GROUP BY Country
                )
                SELECT Country, LanguageCount, CASE LanguageCount
                WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
                ELSE Languages
                END AS Languages_Fixed
                FROM results


                Result:



                | Country | LanguageCount | Languages_Fixed |
                |-------------|---------------|----------------------------------|
                | Spain | 2 | Spanish and English |
                | Sweden | 1 | English |
                | Switzerland | 4 | French, German, Italian, English |


                DB Fiddle






                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:



                  WITH cteCountryLanguageMapping AS (
                  SELECT * FROM (
                  VALUES
                  ('Spain', 'English'),
                  ('Spain', 'Spanish'),
                  ('Sweden', 'English'),
                  ('Switzerland', 'English'),
                  ('Switzerland', 'French'),
                  ('Switzerland', 'German'),
                  ('Switzerland', 'Italian')
                  ) x (Country, Language)
                  ), results AS (
                  SELECT
                  Country,
                  COUNT(Language) AS LanguageCount,
                  STRING_AGG(Language, ', ') AS Languages
                  FROM cteCountryLanguageMapping
                  GROUP BY Country
                  )
                  SELECT Country, LanguageCount, CASE LanguageCount
                  WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
                  ELSE Languages
                  END AS Languages_Fixed
                  FROM results


                  Result:



                  | Country | LanguageCount | Languages_Fixed |
                  |-------------|---------------|----------------------------------|
                  | Spain | 2 | Spanish and English |
                  | Sweden | 1 | English |
                  | Switzerland | 4 | French, German, Italian, English |


                  DB Fiddle






                  share|improve this answer












                  Don't repeat yourself. You are repeating yourself by using MAX(...), LIST_AGG(...', ') and LIST_AGG(...' and '). You could simply rewrite your query like this and might end up with a better plan:



                  WITH cteCountryLanguageMapping AS (
                  SELECT * FROM (
                  VALUES
                  ('Spain', 'English'),
                  ('Spain', 'Spanish'),
                  ('Sweden', 'English'),
                  ('Switzerland', 'English'),
                  ('Switzerland', 'French'),
                  ('Switzerland', 'German'),
                  ('Switzerland', 'Italian')
                  ) x (Country, Language)
                  ), results AS (
                  SELECT
                  Country,
                  COUNT(Language) AS LanguageCount,
                  STRING_AGG(Language, ', ') AS Languages
                  FROM cteCountryLanguageMapping
                  GROUP BY Country
                  )
                  SELECT Country, LanguageCount, CASE LanguageCount
                  WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
                  ELSE Languages
                  END AS Languages_Fixed
                  FROM results


                  Result:



                  | Country | LanguageCount | Languages_Fixed |
                  |-------------|---------------|----------------------------------|
                  | Spain | 2 | Spanish and English |
                  | Sweden | 1 | English |
                  | Switzerland | 4 | French, German, Italian, English |


                  DB Fiddle







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 15 mins ago









                  Salman A

                  165k63325410




                  165k63325410



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52533487%2fstring-agg-not-behaving-as-expected%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      White Anglo-Saxon Protestant

                      BuddyTV

                      Conflict (narrative)