Iyfjky

Question

I have the following function that checks a data record to see if it has met certain criteria specified separately in configuration settings. The configurations are set up as a list of dictionaries with certain trigger keywords that are used by the condition check function ('in','not in', etc).

The condition checker has a lot of duplicated code, but it's just different enough that a better setup is not obvious to me. I would appreciate any feedback.

Here is a minimal working example to illustrate:

def check_if_condition_was_met( row, condition ):
 condition_met = True
 for key, val in condition.iteritems():
 if key == 'in':
 for condition_key, condition_val in val.iteritems():
 if row[condition_key] in condition_val: # IN
 continue
 else:
 condition_met = False
 break
 elif key == 'not in': 
 for condition_key, condition_val in val.iteritems():
 if row[condition_key] not in condition_val: # NOT IN
 continue
 else:
 condition_met = False
 break
 elif key == 'max': 
 for condition_key, condition_val in val.iteritems():
 if not row[condition_key]:
 condition_met = False
 break
 if int(row[condition_key]) <= int(condition_val): # MAX (<=)
 continue
 else:
 condition_met = False
 break
 elif key == 'min':
 for condition_key, condition_val in val.iteritems():
 if int(row[condition_key]) >= int(condition_val): # MIN (>=)
 continue
 else:
 condition_met = False
 break
 return condition_met


if __name__ == '__main__':
 # data
 test_data = [
 'Flag1':'Y', 'Flag2':'Canada','Number':35
 ,'Flag1':'Y', 'Flag2':'United States','Number':35
 ,'Flag1':'N', 'Flag2':'United States','Number':35
 ,'Flag1':'N', 'Flag2':'England','Number':35
 ,'Flag1':'N', 'Flag2':'Canada','Number':35
 ,'Flag1':'N', 'Flag2':'Canada','Number':5
 ]

 # configuration
 test_conditions = [
 'in':'Flag1':['N'], 'Flag2':['United States'] 
 , 'in':'Flag1':['Y'],'Flag2':['Canada'], 'max':'Number':7 
 , 'in':'Flag1':['Y'],'Flag2':['Canada'], 'min':'Number':7 
 , 'not in':'Flag1':['Y'], 'min':'Number':7 
 ]

 for condition_id, condition in enumerate(test_conditions):
 print
 print 'now testing for condition %i' % condition_id
 for data_id, data in enumerate(test_data):
 print '%s | %s' % ( data_id, check_if_condition_was_met(data,condition) )

It's still very vague - lots of my functions test for different conditions of their input data. Can you make it any more specific? I can see that it's difficult, because the function does seem to be a "big switch" (and obviously we want to improve on that). What's the code being used for? Is it part of a test suite? Or is it some utility you're making? — Aug 29 at 15:35

score 4 · Accepted Answer · 2018-08-29 18:44:14Z

Similar to @Peilonrayz, I would map each of the comparators to a function, but I'd go further.

Function design: check_if_condition_was_met(Ã¢Â€Â¦) is an awkward name, partly because of the unusual use of past tense. But why make it so wordy? Couldn't you just call it verify(Ã¢Â€Â¦)?

Furthermore, it would be customary to put the condition parameter first, and the row second. It would certainly read more naturally in English, especially after the function rename. Also, based on the observation that verify(condition) could be considered as a test, it's a general functional-programming principle that the condition parameter should be considered more tightly associated with the verification process and should therefore be put first.

Use all(Ã¢Â€Â¦): You want to express the idea that a function should return True if all of the conditions must be met, and False if any condition fails. You can do that using all() with a generator expression. It's a lot less cumbersome than your condition_met, for, continue, and break. The entire function can be simplified down to a single expression!

Naming: I think that "key" and "val" are not quite descriptive enough. I suggest the following renamings:

key Ã¢Â†Â’ comparator
val Ã¢Â†Â’ criteria
condition_key Ã¢Â†Â’ key
condition_val Ã¢Â†Â’ desired_val

I'm also skeptical about some of the behaviour when given anomalous input:

Why are there int(Ã¢Â€Â¦) casts with 'min' and 'max'? Is it for parsing strings as numbers? None of your example cases needs such parsing, though. Is it for truncating floats towards zero? Probably not, but it might have that unintended effect, if the thresholds or data are already numeric.

What happens if the row is missing a key that is specified in the condition? Maybe it's not a concern to you, but it might be more appropriate to have the function return False rather than raise a KeyError, as your code does?

Consider writing doctests to explain what the function does. This is a situation where examples are more expressive than words.

Suggested solution

COMPARATORS = 
 'in': lambda v, lst: v in lst,
 'not in': lambda v, lst: v not in lst,
 'min': lambda v, n: (v is not None) and (v >= n),
 'max': lambda v, n: (v is not None) and (v <= n),


def verify(condition, row):
 """
 Verify that the specified criteria are all true for the given row.

 >>> rows = [
 ... 'Flag1':'Y', 'Flag2':'Canada', 'Number':35,
 ... 'Flag1':'Y', 'Flag2':'United States', 'Number':35,
 ... 'Flag1':'N', 'Flag2':'United States', 'Number':35,
 ... 'Flag1':'N', 'Flag2':'England', 'Number':35,
 ... 'Flag1':'N', 'Flag2':'Canada', 'Number':35,
 ... 'Flag1':'N', 'Flag2':'Canada', 'Number':5,
 ... ]

 >>> [verify('in': 'Flag1': ['N'], 'Flag2': ['United States'], r)
 ... for r in rows]
 [False, False, True, False, False, False]

 >>> [verify('not in': 'Flag1': ['Y'], 'min': 'Number': 7, r)
 ... for r in rows]
 [False, False, True, True, True, False]

 >>> [verify('not in': 'Blah': ['whatever'], r) for r in rows]
 [True, True, True, True, True, True]
 """
 return all(
 all(
 COMPARATORS[comparator](row.get(key), desired_val)
 for key, desired_val in criteria.iteritems()
 )
 for comparator, criteria in condition.iteritems()
 )

Peilonrayz 24.7k336103 · Answer 2 · 2018-08-29 15:05:22Z

Use return to also return early. It's easier to understand and read then assigning to condition_met and break.

I'm ignoring the extra code that key = 'max' has.

Your code would be smaller and easier to read if you inverted all your ifs.

You need to check what the similarities between each if is:
```
if row[condition_key] in condition_val: # IN
if row[condition_key] not in condition_val: # NOT IN
if int(row[condition_key]) <= int(condition_val): # MAX (<=)
if int(row[condition_key]) >= int(condition_val): # MIN (>=)
```
From this we should see that you use one of four operators in each if. in, not in, <= and >=.
You also cast the input into int on two of your ifs. To add this we can change each to cast to a type.

Checking the operators that we need against the operator comparison table we can see we need:
- contains
- le
- ge
- And one for not contains.
And so we can use:
```
FUNCTIONS = 
 'in': (operator.contains, list),
 'not in': (lambda a, b: a not in b, list),
 'max': (operator.le, int),
 'min': (operator.ge, int),
```

import operator

FUNCTIONS = 
 'in': (operator.contains, list),
 'not in': (lambda a, b: a not in b, list),
 'max': (operator.le, int),
 'min': (operator.ge, int),



def check_if_condition_was_met(row, condition):
 for key, val in condition.iteritems():
 op, cast = FUNCTIONS[key]
 for condition_key, condition_val in val.iteritems():
 if not op(cast(row[condition_key]), cast(condition_val)):
 return False
 return True

thanks for the really helpful feedback! I'll have to dig into your suggestions more carefully. But my first impression is that the output of your refactored version do not match my current setup? For example, test_data[2] and condition[0] was True, but now False (I think True is the right answer). Anyways, the concept is clear - I will look at studying it more carefully. — Aug 29 at 15:26
The bug is that operator.contains(a, b) is equivalent to b in a Ã¢Â€Â” note the reversed operands. Confounding the discovery of the bug is the inappropriate use of the list() cast on both operands. — Aug 29 at 20:53
@200_success They're annoying, I for some reason didn't think of the list error too, but your answer cleans all of that down. — Aug 29 at 20:58

vash_the_stampede 2158 · Answer 3 · 2018-08-30 01:15:57Z

def check_if_condition_was_met( row, condition ):
 condition_met = True
 for key, val in condition.iteritems():
 if key == 'in':
 for condition_key, condition_val in val.iteritems():
 """Since continuing on 'in' and only manipulating 'not in's"""
 if row[condition_key] not in condition_val:
 condition_met = False
 break
 elif key == 'not in': 
 for condition_key, condition_val in val.iteritems():
 """Again continuing on 'not in' only manipulating 'in's"""
 if row[condition_key] in condition_val: 
 condition_met = False
 break
 elif key == 'max': 
 for condition_key, condition_val in val.iteritems():
 if not row[condition_key]:
 condition_met = False
 break
 """Same can eliminate the the continue statements by changing this """
 elif int(row[condition_key]) >= int(condition_val): # elif 
 condition_met = False
 break
 elif key == 'min':
 for condition_key, condition_val in val.iteritems():
 """Again only manipulating the '<=' reached by else"""
 if int(row[condition_key]) <= int(condition_val): 
 condition_met = False
 break
 return condition_met


if __name__ == '__main__':
 # data
 test_data = [
 'Flag1':'Y', 'Flag2':'Canada','Number':35
 ,'Flag1':'Y', 'Flag2':'United States','Number':35
 ,'Flag1':'N', 'Flag2':'United States','Number':35
 ,'Flag1':'N', 'Flag2':'England','Number':35
 ,'Flag1':'N', 'Flag2':'Canada','Number':35
 ,'Flag1':'N', 'Flag2':'Canada','Number':5
 ]

 # configuration
 test_conditions = [
 'in':'Flag1':['N'], 'Flag2':['United States'] 
 , 'in':'Flag1':['Y'],'Flag2':['Canada'], 'max':'Number':7 
 , 'in':'Flag1':['Y'],'Flag2':['Canada'], 'min':'Number':7 
 , 'not in':'Flag1':['Y'], 'min':'Number':7 
 ]

 for condition_id, condition in enumerate(test_conditions):
 # print? is this being used for a newline? if so remove and add n to next print
 print('nnow testing for condition %i' % condition_id) #newline ()'s for print
 for data_id, data in enumerate(test_data):
 print('%s | %s' % ( data_id, check_if_condition_was_met(data,condition) ))
 #wrap print with ()'s again

Here are some things I would look at that can be done to get to the values you are altering more efficiently

@200_success could you please clairy, I may not be familiar with proper format for indentation of comments — Aug 30 at 17:17

score 4 · Accepted Answer · 2018-08-29 18:44:14Z

Similar to @Peilonrayz, I would map each of the comparators to a function, but I'd go further.

Function design: check_if_condition_was_met(Ã¢Â€Â¦) is an awkward name, partly because of the unusual use of past tense. But why make it so wordy? Couldn't you just call it verify(Ã¢Â€Â¦)?

Furthermore, it would be customary to put the condition parameter first, and the row second. It would certainly read more naturally in English, especially after the function rename. Also, based on the observation that verify(condition) could be considered as a test, it's a general functional-programming principle that the condition parameter should be considered more tightly associated with the verification process and should therefore be put first.

Use all(Ã¢Â€Â¦): You want to express the idea that a function should return True if all of the conditions must be met, and False if any condition fails. You can do that using all() with a generator expression. It's a lot less cumbersome than your condition_met, for, continue, and break. The entire function can be simplified down to a single expression!

Naming: I think that "key" and "val" are not quite descriptive enough. I suggest the following renamings:

key Ã¢Â†Â’ comparator
val Ã¢Â†Â’ criteria
condition_key Ã¢Â†Â’ key
condition_val Ã¢Â†Â’ desired_val

I'm also skeptical about some of the behaviour when given anomalous input:

Why are there int(Ã¢Â€Â¦) casts with 'min' and 'max'? Is it for parsing strings as numbers? None of your example cases needs such parsing, though. Is it for truncating floats towards zero? Probably not, but it might have that unintended effect, if the thresholds or data are already numeric.

What happens if the row is missing a key that is specified in the condition? Maybe it's not a concern to you, but it might be more appropriate to have the function return False rather than raise a KeyError, as your code does?

Consider writing doctests to explain what the function does. This is a situation where examples are more expressive than words.

Suggested solution

COMPARATORS = 
 'in': lambda v, lst: v in lst,
 'not in': lambda v, lst: v not in lst,
 'min': lambda v, n: (v is not None) and (v >= n),
 'max': lambda v, n: (v is not None) and (v <= n),


def verify(condition, row):
 """
 Verify that the specified criteria are all true for the given row.

 >>> rows = [
 ... 'Flag1':'Y', 'Flag2':'Canada', 'Number':35,
 ... 'Flag1':'Y', 'Flag2':'United States', 'Number':35,
 ... 'Flag1':'N', 'Flag2':'United States', 'Number':35,
 ... 'Flag1':'N', 'Flag2':'England', 'Number':35,
 ... 'Flag1':'N', 'Flag2':'Canada', 'Number':35,
 ... 'Flag1':'N', 'Flag2':'Canada', 'Number':5,
 ... ]

 >>> [verify('in': 'Flag1': ['N'], 'Flag2': ['United States'], r)
 ... for r in rows]
 [False, False, True, False, False, False]

 >>> [verify('not in': 'Flag1': ['Y'], 'min': 'Number': 7, r)
 ... for r in rows]
 [False, False, True, True, True, False]

 >>> [verify('not in': 'Blah': ['whatever'], r) for r in rows]
 [True, True, True, True, True, True]
 """
 return all(
 all(
 COMPARATORS[comparator](row.get(key), desired_val)
 for key, desired_val in criteria.iteritems()
 )
 for comparator, criteria in condition.iteritems()
 )

Peilonrayz 24.7k336103 · Answer 5 · 2018-08-29 15:05:22Z

Use return to also return early. It's easier to understand and read then assigning to condition_met and break.

I'm ignoring the extra code that key = 'max' has.

Your code would be smaller and easier to read if you inverted all your ifs.

You need to check what the similarities between each if is:
```
if row[condition_key] in condition_val: # IN
if row[condition_key] not in condition_val: # NOT IN
if int(row[condition_key]) <= int(condition_val): # MAX (<=)
if int(row[condition_key]) >= int(condition_val): # MIN (>=)
```
From this we should see that you use one of four operators in each if. in, not in, <= and >=.
You also cast the input into int on two of your ifs. To add this we can change each to cast to a type.

Checking the operators that we need against the operator comparison table we can see we need:
- contains
- le
- ge
- And one for not contains.
And so we can use:
```
FUNCTIONS = 
 'in': (operator.contains, list),
 'not in': (lambda a, b: a not in b, list),
 'max': (operator.le, int),
 'min': (operator.ge, int),
```

import operator

FUNCTIONS = 
 'in': (operator.contains, list),
 'not in': (lambda a, b: a not in b, list),
 'max': (operator.le, int),
 'min': (operator.ge, int),



def check_if_condition_was_met(row, condition):
 for key, val in condition.iteritems():
 op, cast = FUNCTIONS[key]
 for condition_key, condition_val in val.iteritems():
 if not op(cast(row[condition_key]), cast(condition_val)):
 return False
 return True

thanks for the really helpful feedback! I'll have to dig into your suggestions more carefully. But my first impression is that the output of your refactored version do not match my current setup? For example, test_data[2] and condition[0] was True, but now False (I think True is the right answer). Anyways, the concept is clear - I will look at studying it more carefully. — Aug 29 at 15:26
The bug is that operator.contains(a, b) is equivalent to b in a Ã¢Â€Â” note the reversed operands. Confounding the discovery of the bug is the inappropriate use of the list() cast on both operands. — Aug 29 at 20:53
@200_success They're annoying, I for some reason didn't think of the list error too, but your answer cleans all of that down. — Aug 29 at 20:58

vash_the_stampede 2158 · Answer 6 · 2018-08-30 01:15:57Z

def check_if_condition_was_met( row, condition ):
 condition_met = True
 for key, val in condition.iteritems():
 if key == 'in':
 for condition_key, condition_val in val.iteritems():
 """Since continuing on 'in' and only manipulating 'not in's"""
 if row[condition_key] not in condition_val:
 condition_met = False
 break
 elif key == 'not in': 
 for condition_key, condition_val in val.iteritems():
 """Again continuing on 'not in' only manipulating 'in's"""
 if row[condition_key] in condition_val: 
 condition_met = False
 break
 elif key == 'max': 
 for condition_key, condition_val in val.iteritems():
 if not row[condition_key]:
 condition_met = False
 break
 """Same can eliminate the the continue statements by changing this """
 elif int(row[condition_key]) >= int(condition_val): # elif 
 condition_met = False
 break
 elif key == 'min':
 for condition_key, condition_val in val.iteritems():
 """Again only manipulating the '<=' reached by else"""
 if int(row[condition_key]) <= int(condition_val): 
 condition_met = False
 break
 return condition_met


if __name__ == '__main__':
 # data
 test_data = [
 'Flag1':'Y', 'Flag2':'Canada','Number':35
 ,'Flag1':'Y', 'Flag2':'United States','Number':35
 ,'Flag1':'N', 'Flag2':'United States','Number':35
 ,'Flag1':'N', 'Flag2':'England','Number':35
 ,'Flag1':'N', 'Flag2':'Canada','Number':35
 ,'Flag1':'N', 'Flag2':'Canada','Number':5
 ]

 # configuration
 test_conditions = [
 'in':'Flag1':['N'], 'Flag2':['United States'] 
 , 'in':'Flag1':['Y'],'Flag2':['Canada'], 'max':'Number':7 
 , 'in':'Flag1':['Y'],'Flag2':['Canada'], 'min':'Number':7 
 , 'not in':'Flag1':['Y'], 'min':'Number':7 
 ]

 for condition_id, condition in enumerate(test_conditions):
 # print? is this being used for a newline? if so remove and add n to next print
 print('nnow testing for condition %i' % condition_id) #newline ()'s for print
 for data_id, data in enumerate(test_data):
 print('%s | %s' % ( data_id, check_if_condition_was_met(data,condition) ))
 #wrap print with ()'s again

Here are some things I would look at that can be done to get to the values you are altering more efficiently

@200_success could you please clairy, I may not be familiar with proper format for indentation of comments — Aug 30 at 17:17

Search This Blog

Iyfjky

Function to test for different conditions in input data

3 Answers
3

Suggested solution

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Suggested solution

Suggested solution

Suggested solution

Suggested solution

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery

Category

Random preview

Function to test for different conditions in input data

3 Answers 3

Suggested solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Suggested solution

Suggested solution

Suggested solution

Suggested solution

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery

3 Answers
3

3 Answers
3

3 Answers
3