Read through small CSV file returns Regex too complicated

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite
1












I have a method which reads through a CSV file and if any of the columns match my keyset prepares them in an sObject, with my test files which contained 1,000 records but only 4 columns this worked fine, however increasing the number of columns now (22 columns) returns "Regex too complicated" despite only having around 100 rows.



How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.










share|improve this question



























    up vote
    1
    down vote

    favorite
    1












    I have a method which reads through a CSV file and if any of the columns match my keyset prepares them in an sObject, with my test files which contained 1,000 records but only 4 columns this worked fine, however increasing the number of columns now (22 columns) returns "Regex too complicated" despite only having around 100 rows.



    How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.










    share|improve this question























      up vote
      1
      down vote

      favorite
      1









      up vote
      1
      down vote

      favorite
      1






      1





      I have a method which reads through a CSV file and if any of the columns match my keyset prepares them in an sObject, with my test files which contained 1,000 records but only 4 columns this worked fine, however increasing the number of columns now (22 columns) returns "Regex too complicated" despite only having around 100 rows.



      How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.










      share|improve this question













      I have a method which reads through a CSV file and if any of the columns match my keyset prepares them in an sObject, with my test files which contained 1,000 records but only 4 columns this worked fine, however increasing the number of columns now (22 columns) returns "Regex too complicated" despite only having around 100 rows.



      How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.







      apex batch csv






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 3 hours ago









      Deployment Failure

      1,24862553




      1,24862553




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote













          The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:



          Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
          Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');


          This code would fail because of the complexity of the pattern. Realistically, any Apex code that tries to parse a CSV is doomed to fail if it uses any regex matching on any reasonable sized file. Your best bet is to implement a finite state parser to parse your CSV file (note: link goes to a Java-based finite state CSV parser, not written by me).



          Because of the overhead of Apex, I wouldn't necessarily try to directly duplicate the Java-based version above, but hopefully it will get you started in a direction that will allow you to process larger CSV files more efficiently.



          Your processing input loop can make use of a switch to implement a state engine efficiently:



          public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
          Integer index = 0;
          PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
          while(index < input.size())
          String charAtX = input.substring(index, index+1);
          switch on currentState
          when BEGIN_FIELD
          ...

          when FOUND_QUOTE
          ...

          ...







          share|improve this answer






















          • N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
            – cropredy
            2 hours ago










          • @cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
            – sfdcfox
            1 hour ago










          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "459"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsalesforce.stackexchange.com%2fquestions%2f232839%2fread-through-small-csv-file-returns-regex-too-complicated%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote













          The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:



          Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
          Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');


          This code would fail because of the complexity of the pattern. Realistically, any Apex code that tries to parse a CSV is doomed to fail if it uses any regex matching on any reasonable sized file. Your best bet is to implement a finite state parser to parse your CSV file (note: link goes to a Java-based finite state CSV parser, not written by me).



          Because of the overhead of Apex, I wouldn't necessarily try to directly duplicate the Java-based version above, but hopefully it will get you started in a direction that will allow you to process larger CSV files more efficiently.



          Your processing input loop can make use of a switch to implement a state engine efficiently:



          public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
          Integer index = 0;
          PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
          while(index < input.size())
          String charAtX = input.substring(index, index+1);
          switch on currentState
          when BEGIN_FIELD
          ...

          when FOUND_QUOTE
          ...

          ...







          share|improve this answer






















          • N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
            – cropredy
            2 hours ago










          • @cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
            – sfdcfox
            1 hour ago














          up vote
          3
          down vote













          The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:



          Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
          Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');


          This code would fail because of the complexity of the pattern. Realistically, any Apex code that tries to parse a CSV is doomed to fail if it uses any regex matching on any reasonable sized file. Your best bet is to implement a finite state parser to parse your CSV file (note: link goes to a Java-based finite state CSV parser, not written by me).



          Because of the overhead of Apex, I wouldn't necessarily try to directly duplicate the Java-based version above, but hopefully it will get you started in a direction that will allow you to process larger CSV files more efficiently.



          Your processing input loop can make use of a switch to implement a state engine efficiently:



          public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
          Integer index = 0;
          PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
          while(index < input.size())
          String charAtX = input.substring(index, index+1);
          switch on currentState
          when BEGIN_FIELD
          ...

          when FOUND_QUOTE
          ...

          ...







          share|improve this answer






















          • N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
            – cropredy
            2 hours ago










          • @cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
            – sfdcfox
            1 hour ago












          up vote
          3
          down vote










          up vote
          3
          down vote









          The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:



          Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
          Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');


          This code would fail because of the complexity of the pattern. Realistically, any Apex code that tries to parse a CSV is doomed to fail if it uses any regex matching on any reasonable sized file. Your best bet is to implement a finite state parser to parse your CSV file (note: link goes to a Java-based finite state CSV parser, not written by me).



          Because of the overhead of Apex, I wouldn't necessarily try to directly duplicate the Java-based version above, but hopefully it will get you started in a direction that will allow you to process larger CSV files more efficiently.



          Your processing input loop can make use of a switch to implement a state engine efficiently:



          public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
          Integer index = 0;
          PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
          while(index < input.size())
          String charAtX = input.substring(index, index+1);
          switch on currentState
          when BEGIN_FIELD
          ...

          when FOUND_QUOTE
          ...

          ...







          share|improve this answer














          The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:



          Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
          Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');


          This code would fail because of the complexity of the pattern. Realistically, any Apex code that tries to parse a CSV is doomed to fail if it uses any regex matching on any reasonable sized file. Your best bet is to implement a finite state parser to parse your CSV file (note: link goes to a Java-based finite state CSV parser, not written by me).



          Because of the overhead of Apex, I wouldn't necessarily try to directly duplicate the Java-based version above, but hopefully it will get you started in a direction that will allow you to process larger CSV files more efficiently.



          Your processing input loop can make use of a switch to implement a state engine efficiently:



          public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
          Integer index = 0;
          PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
          while(index < input.size())
          String charAtX = input.substring(index, index+1);
          switch on currentState
          when BEGIN_FIELD
          ...

          when FOUND_QUOTE
          ...

          ...








          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 2 hours ago

























          answered 2 hours ago









          sfdcfox

          227k10175389




          227k10175389











          • N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
            – cropredy
            2 hours ago










          • @cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
            – sfdcfox
            1 hour ago
















          • N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
            – cropredy
            2 hours ago










          • @cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
            – sfdcfox
            1 hour ago















          N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
          – cropredy
          2 hours ago




          N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
          – cropredy
          2 hours ago












          @cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
          – sfdcfox
          1 hour ago




          @cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
          – sfdcfox
          1 hour ago

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsalesforce.stackexchange.com%2fquestions%2f232839%2fread-through-small-csv-file-returns-regex-too-complicated%23new-answer', 'question_page');

          );

          Post as a guest













































































          Comments

          Popular posts from this blog

          Long meetings (6-7 hours a day): Being “babysat” by supervisor

          Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

          Confectionery