Read through small CSV file returns Regex too complicated

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
1
down vote

favorite

I have a method which reads through a CSV file and if any of the columns match my keyset prepares them in an sObject, with my test files which contained 1,000 records but only 4 columns this worked fine, however increasing the number of columns now (22 columns) returns "Regex too complicated" despite only having around 100 rows.

How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.

asked 3 hours ago

Deployment Failure

1,24862553

add a commentÂ |Â

up vote
1
down vote

favorite

How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.

asked 3 hours ago

Deployment Failure

1,24862553

add a commentÂ |Â

up vote
1
down vote

favorite

How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.

asked 3 hours ago

Deployment Failure

1,24862553

How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.

apex batch csv

asked 3 hours ago

Deployment Failure

1,24862553

asked 3 hours ago

Deployment Failure

1,24862553

asked 3 hours ago

Deployment Failure

1,24862553

asked 3 hours ago

Deployment Failure

1,24862553

asked 3 hours ago

Deployment Failure

1,24862553

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:

Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');

This code would fail because of the complexity of the pattern. Realistically, any Apex code that tries to parse a CSV is doomed to fail if it uses any regex matching on any reasonable sized file. Your best bet is to implement a finite state parser to parse your CSV file (note: link goes to a Java-based finite state CSV parser, not written by me).

Because of the overhead of Apex, I wouldn't necessarily try to directly duplicate the Java-based version above, but hopefully it will get you started in a direction that will allow you to process larger CSV files more efficiently.

Your processing input loop can make use of a switch to implement a state engine efficiently:

public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
Integer index = 0;
PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
while(index < input.size()) 
 String charAtX = input.substring(index, index+1);
 switch on currentState 
 when BEGIN_FIELD 
 ...
 
 when FOUND_QUOTE 
 ...
 
 ...

edited 2 hours ago

answered 2 hours ago

sfdcfox

227k10175389

N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
â€“Â cropredy
2 hours ago

@cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
â€“Â sfdcfox
1 hour ago

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "459"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsalesforce.stackexchange.com%2fquestions%2f232839%2fread-through-small-csv-file-returns-regex-too-complicated%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:

Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');

Your processing input loop can make use of a switch to implement a state engine efficiently:

public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
Integer index = 0;
PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
while(index < input.size()) 
 String charAtX = input.substring(index, index+1);
 switch on currentState 
 when BEGIN_FIELD 
 ...
 
 when FOUND_QUOTE 
 ...
 
 ...

edited 2 hours ago

answered 2 hours ago

sfdcfox

227k10175389

N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
â€“Â cropredy
2 hours ago

@cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
â€“Â sfdcfox
1 hour ago

add a commentÂ |Â

up vote
3
down vote

The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:

Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');

Your processing input loop can make use of a switch to implement a state engine efficiently:

public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
Integer index = 0;
PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
while(index < input.size()) 
 String charAtX = input.substring(index, index+1);
 switch on currentState 
 when BEGIN_FIELD 
 ...
 
 when FOUND_QUOTE 
 ...
 
 ...

edited 2 hours ago

answered 2 hours ago

sfdcfox

227k10175389

N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
â€“Â cropredy
2 hours ago

@cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
â€“Â sfdcfox
1 hour ago

add a commentÂ |Â

up vote
3
down vote

The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:

Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');

Your processing input loop can make use of a switch to implement a state engine efficiently:

public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
Integer index = 0;
PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
while(index < input.size()) 
 String charAtX = input.substring(index, index+1);
 switch on currentState 
 when BEGIN_FIELD 
 ...
 
 when FOUND_QUOTE 
 ...
 
 ...

edited 2 hours ago

answered 2 hours ago

sfdcfox

227k10175389

The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:

Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');

Your processing input loop can make use of a switch to implement a state engine efficiently:

public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD 
Integer index = 0;
PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
while(index < input.size()) 
 String charAtX = input.substring(index, index+1);
 switch on currentState 
 when BEGIN_FIELD 
 ...
 
 when FOUND_QUOTE 
 ...
 
 ...

edited 2 hours ago

answered 2 hours ago

sfdcfox

227k10175389

edited 2 hours ago

answered 2 hours ago

sfdcfox

227k10175389

answered 2 hours ago

sfdcfox

227k10175389

answered 2 hours ago

sfdcfox

227k10175389

N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
â€“Â cropredy
2 hours ago

@cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
â€“Â sfdcfox
1 hour ago

add a commentÂ |Â

N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
â€“Â cropredy
2 hours ago

@cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
â€“Â sfdcfox
1 hour ago

N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
â€“Â cropredy
2 hours ago

@cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
â€“Â sfdcfox
1 hour ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky