Create a new column only if values differ

up vote
6
down vote

favorite

My dataframe looks like this :

pd.DataFrame([["t1","d2","e3","r4"],
 ["t1","d2","e2","r4"],
 ["t1","d2","e1","r4"]],columns=["a","b","c","d"])

and I want:

pd.DataFrame([["t1","d2","e3","r4","e1","e2"]],
columns=["a","b","c","d","c1","c2"])

ie I have only 1 column that values differs and I want to create a new dataframe with columns added when new values are observed. Is there an easy way to do this ?

asked 47 mins ago

FFL75

97311

add a commentÂ |Â

up vote
6
down vote

favorite

My dataframe looks like this :

pd.DataFrame([["t1","d2","e3","r4"],
 ["t1","d2","e2","r4"],
 ["t1","d2","e1","r4"]],columns=["a","b","c","d"])

and I want:

pd.DataFrame([["t1","d2","e3","r4","e1","e2"]],
columns=["a","b","c","d","c1","c2"])

ie I have only 1 column that values differs and I want to create a new dataframe with columns added when new values are observed. Is there an easy way to do this ?

asked 47 mins ago

FFL75

97311

add a commentÂ |Â

up vote
6
down vote

favorite

My dataframe looks like this :

pd.DataFrame([["t1","d2","e3","r4"],
 ["t1","d2","e2","r4"],
 ["t1","d2","e1","r4"]],columns=["a","b","c","d"])

and I want:

pd.DataFrame([["t1","d2","e3","r4","e1","e2"]],
columns=["a","b","c","d","c1","c2"])

ie I have only 1 column that values differs and I want to create a new dataframe with columns added when new values are observed. Is there an easy way to do this ?

asked 47 mins ago

FFL75

97311

My dataframe looks like this :

pd.DataFrame([["t1","d2","e3","r4"],
 ["t1","d2","e2","r4"],
 ["t1","d2","e1","r4"]],columns=["a","b","c","d"])

and I want:

pd.DataFrame([["t1","d2","e3","r4","e1","e2"]],
columns=["a","b","c","d","c1","c2"])

ie I have only 1 column that values differs and I want to create a new dataframe with columns added when new values are observed. Is there an easy way to do this ?

python pandas dataframe

asked 47 mins ago

FFL75

97311

asked 47 mins ago

FFL75

97311

asked 47 mins ago

FFL75

97311

asked 47 mins ago

FFL75

97311

asked 47 mins ago

FFL75

97311

add a commentÂ |Â

4 Answers
4

active

oldest

votes

up vote
4
down vote

Edit: To generalize for any single non-unique column:

Ucols = df.columns[(df.nunique() == 1)].tolist()
df_out = df.set_index(Ucols).set_index(df.groupby(Ucols).cumcount(), append=True).unstack()
df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]
print(df_out.reset_index())

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

Original Answer

Use:

df_out = df.set_index(['a','b','d',df.groupby(['a','b','d']).cumcount()]).unstack()

df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]

df_out.reset_index()

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

edited 28 mins ago

answered 39 mins ago

Scott Boston

46.2k52351

2

It looks to me that this doesn't generalize. I mean you know already that column c has more values.
â€“Â user32185
31 mins ago

1

Updated solution to handle not knowing 'c' has more than one value. Thanks.
â€“Â Scott Boston
20 mins ago

add a commentÂ |Â

up vote
4
down vote

You can use a dictionary comprehension. For consistency, I've included integer labeling on all columns.

res = pd.DataFrame(f'colidx': val for col in df for idx, val in 
 enumerate(df[col].unique(), 1), index=[0])

print(res)

 a1 b1 c1 c2 c3 d1
0 t1 d2 e3 e2 e1 r4

An alternative to df[col].unique() is df[col].drop_duplicates(), though the latter may incur an overhead for iterating a pd.Series object versus np.ndarray.

edited 24 mins ago

answered 31 mins ago

jpp

65.6k173881

Here you can unique instead of drop_duplicates()
â€“Â user32185
25 mins ago

1

@user32185, Good point, have included that alternative. I believe it should be more efficient as unique returns an array, i.e. no pd.Series overhead.
â€“Â jpp
23 mins ago

I'm curious to know if it's possible to use a comprehension with a for inside the else in order to obtain the OP requested output. See my answer as reference.
â€“Â user32185
8 mins ago

@user32185, You can probably do something like d = 1: ''; f'cold.get(idx, idx)', since you can use dict.get within an f-string. So if it's a 1 it'll just be an empty string, otherwise return the integer.
â€“Â jpp
2 mins ago

add a commentÂ |Â

up vote
2
down vote

Not as beautiful as Scott answer but the logic you are looking for is:

out = pd.DataFrame()
for col in df.columns:
 values =df[col].unique()
 if len(values)==1:
 out[col]=values
 else:
 for i,value in enumerate(values):
 out[col+str(i+1)]= value

answered 36 mins ago

user32185

9941722

add a commentÂ |Â

up vote
0
down vote

Using drop_duplicates

s=df.reset_index().melt('index').drop_duplicates(['variable','value'],keep='first')


pd.DataFrame([s.value.values.tolist()],columns=s['variable']+s['index'].astype(str))
Out[1151]: 
 a0 b0 c0 c1 c2 d0
0 t1 d2 e3 e2 e1 r4

answered 4 mins ago

Wen

81.7k72246

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52480841%2fcreate-a-new-column-only-if-values-differ%23new-answer', 'question_page');

);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
4
down vote

Edit: To generalize for any single non-unique column:

Ucols = df.columns[(df.nunique() == 1)].tolist()
df_out = df.set_index(Ucols).set_index(df.groupby(Ucols).cumcount(), append=True).unstack()
df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]
print(df_out.reset_index())

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

Original Answer

Use:

df_out = df.set_index(['a','b','d',df.groupby(['a','b','d']).cumcount()]).unstack()

df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]

df_out.reset_index()

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

edited 28 mins ago

answered 39 mins ago

Scott Boston

46.2k52351

2

It looks to me that this doesn't generalize. I mean you know already that column c has more values.
â€“Â user32185
31 mins ago

1

Updated solution to handle not knowing 'c' has more than one value. Thanks.
â€“Â Scott Boston
20 mins ago

add a commentÂ |Â

up vote
4
down vote

Edit: To generalize for any single non-unique column:

Ucols = df.columns[(df.nunique() == 1)].tolist()
df_out = df.set_index(Ucols).set_index(df.groupby(Ucols).cumcount(), append=True).unstack()
df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]
print(df_out.reset_index())

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

Original Answer

Use:

df_out = df.set_index(['a','b','d',df.groupby(['a','b','d']).cumcount()]).unstack()

df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]

df_out.reset_index()

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

edited 28 mins ago

answered 39 mins ago

Scott Boston

46.2k52351

2

It looks to me that this doesn't generalize. I mean you know already that column c has more values.
â€“Â user32185
31 mins ago

1

Updated solution to handle not knowing 'c' has more than one value. Thanks.
â€“Â Scott Boston
20 mins ago

add a commentÂ |Â

up vote
4
down vote

Edit: To generalize for any single non-unique column:

Ucols = df.columns[(df.nunique() == 1)].tolist()
df_out = df.set_index(Ucols).set_index(df.groupby(Ucols).cumcount(), append=True).unstack()
df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]
print(df_out.reset_index())

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

Original Answer

Use:

df_out = df.set_index(['a','b','d',df.groupby(['a','b','d']).cumcount()]).unstack()

df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]

df_out.reset_index()

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

edited 28 mins ago

answered 39 mins ago

Scott Boston

46.2k52351

Edit: To generalize for any single non-unique column:

Ucols = df.columns[(df.nunique() == 1)].tolist()
df_out = df.set_index(Ucols).set_index(df.groupby(Ucols).cumcount(), append=True).unstack()
df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]
print(df_out.reset_index())

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

Original Answer

Use:

df_out = df.set_index(['a','b','d',df.groupby(['a','b','d']).cumcount()]).unstack()

df_out.columns = [f'ij' if j != 0 else f'i' for i,j in df_out.columns]

df_out.reset_index()

Output:

 a b d c c1 c2
0 t1 d2 r4 e3 e2 e1

edited 28 mins ago

answered 39 mins ago

Scott Boston

46.2k52351

edited 28 mins ago

answered 39 mins ago

Scott Boston

46.2k52351

answered 39 mins ago

Scott Boston

46.2k52351

answered 39 mins ago

Scott Boston

46.2k52351

2

It looks to me that this doesn't generalize. I mean you know already that column c has more values.
â€“Â user32185
31 mins ago

1

Updated solution to handle not knowing 'c' has more than one value. Thanks.
â€“Â Scott Boston
20 mins ago

add a commentÂ |Â

2

It looks to me that this doesn't generalize. I mean you know already that column c has more values.
â€“Â user32185
31 mins ago

1

Updated solution to handle not knowing 'c' has more than one value. Thanks.
â€“Â Scott Boston
20 mins ago

It looks to me that this doesn't generalize. I mean you know already that column c has more values.
â€“Â user32185
31 mins ago

Updated solution to handle not knowing 'c' has more than one value. Thanks.
â€“Â Scott Boston
20 mins ago

add a commentÂ |Â

up vote
4
down vote

You can use a dictionary comprehension. For consistency, I've included integer labeling on all columns.

res = pd.DataFrame(f'colidx': val for col in df for idx, val in 
 enumerate(df[col].unique(), 1), index=[0])

print(res)

 a1 b1 c1 c2 c3 d1
0 t1 d2 e3 e2 e1 r4

An alternative to df[col].unique() is df[col].drop_duplicates(), though the latter may incur an overhead for iterating a pd.Series object versus np.ndarray.

edited 24 mins ago

answered 31 mins ago

jpp

65.6k173881

Here you can unique instead of drop_duplicates()
â€“Â user32185
25 mins ago

1

@user32185, Good point, have included that alternative. I believe it should be more efficient as unique returns an array, i.e. no pd.Series overhead.
â€“Â jpp
23 mins ago

I'm curious to know if it's possible to use a comprehension with a for inside the else in order to obtain the OP requested output. See my answer as reference.
â€“Â user32185
8 mins ago

@user32185, You can probably do something like d = 1: ''; f'cold.get(idx, idx)', since you can use dict.get within an f-string. So if it's a 1 it'll just be an empty string, otherwise return the integer.
â€“Â jpp
2 mins ago

add a commentÂ |Â

up vote
4
down vote

You can use a dictionary comprehension. For consistency, I've included integer labeling on all columns.

res = pd.DataFrame(f'colidx': val for col in df for idx, val in 
 enumerate(df[col].unique(), 1), index=[0])

print(res)

 a1 b1 c1 c2 c3 d1
0 t1 d2 e3 e2 e1 r4

An alternative to df[col].unique() is df[col].drop_duplicates(), though the latter may incur an overhead for iterating a pd.Series object versus np.ndarray.

edited 24 mins ago

answered 31 mins ago

jpp

65.6k173881

Here you can unique instead of drop_duplicates()
â€“Â user32185
25 mins ago

1

@user32185, Good point, have included that alternative. I believe it should be more efficient as unique returns an array, i.e. no pd.Series overhead.
â€“Â jpp
23 mins ago

I'm curious to know if it's possible to use a comprehension with a for inside the else in order to obtain the OP requested output. See my answer as reference.
â€“Â user32185
8 mins ago

@user32185, You can probably do something like d = 1: ''; f'cold.get(idx, idx)', since you can use dict.get within an f-string. So if it's a 1 it'll just be an empty string, otherwise return the integer.
â€“Â jpp
2 mins ago

add a commentÂ |Â

up vote
4
down vote

You can use a dictionary comprehension. For consistency, I've included integer labeling on all columns.

res = pd.DataFrame(f'colidx': val for col in df for idx, val in 
 enumerate(df[col].unique(), 1), index=[0])

print(res)

 a1 b1 c1 c2 c3 d1
0 t1 d2 e3 e2 e1 r4

An alternative to df[col].unique() is df[col].drop_duplicates(), though the latter may incur an overhead for iterating a pd.Series object versus np.ndarray.

edited 24 mins ago

answered 31 mins ago

jpp

65.6k173881

You can use a dictionary comprehension. For consistency, I've included integer labeling on all columns.

res = pd.DataFrame(f'colidx': val for col in df for idx, val in 
 enumerate(df[col].unique(), 1), index=[0])

print(res)

 a1 b1 c1 c2 c3 d1
0 t1 d2 e3 e2 e1 r4

An alternative to df[col].unique() is df[col].drop_duplicates(), though the latter may incur an overhead for iterating a pd.Series object versus np.ndarray.

edited 24 mins ago

answered 31 mins ago

jpp

65.6k173881

edited 24 mins ago

answered 31 mins ago

jpp

65.6k173881

answered 31 mins ago

jpp

65.6k173881

answered 31 mins ago

jpp

65.6k173881

Here you can unique instead of drop_duplicates()
â€“Â user32185
25 mins ago

1

@user32185, Good point, have included that alternative. I believe it should be more efficient as unique returns an array, i.e. no pd.Series overhead.
â€“Â jpp
23 mins ago

I'm curious to know if it's possible to use a comprehension with a for inside the else in order to obtain the OP requested output. See my answer as reference.
â€“Â user32185
8 mins ago

@user32185, You can probably do something like d = 1: ''; f'cold.get(idx, idx)', since you can use dict.get within an f-string. So if it's a 1 it'll just be an empty string, otherwise return the integer.
â€“Â jpp
2 mins ago

add a commentÂ |Â

Here you can unique instead of drop_duplicates()
â€“Â user32185
25 mins ago

1

@user32185, Good point, have included that alternative. I believe it should be more efficient as unique returns an array, i.e. no pd.Series overhead.
â€“Â jpp
23 mins ago

I'm curious to know if it's possible to use a comprehension with a for inside the else in order to obtain the OP requested output. See my answer as reference.
â€“Â user32185
8 mins ago

@user32185, You can probably do something like d = 1: ''; f'cold.get(idx, idx)', since you can use dict.get within an f-string. So if it's a 1 it'll just be an empty string, otherwise return the integer.
â€“Â jpp
2 mins ago

Here you can unique instead of drop_duplicates()
â€“Â user32185
25 mins ago

@user32185, Good point, have included that alternative. I believe it should be more efficient as unique returns an array, i.e. no pd.Series overhead.
â€“Â jpp
23 mins ago

I'm curious to know if it's possible to use a comprehension with a for inside the else in order to obtain the OP requested output. See my answer as reference.
â€“Â user32185
8 mins ago

@user32185, You can probably do something like d = 1: ''; f'cold.get(idx, idx)', since you can use dict.get within an f-string. So if it's a 1 it'll just be an empty string, otherwise return the integer.
â€“Â jpp
2 mins ago

add a commentÂ |Â

up vote
2
down vote

Not as beautiful as Scott answer but the logic you are looking for is:

out = pd.DataFrame()
for col in df.columns:
 values =df[col].unique()
 if len(values)==1:
 out[col]=values
 else:
 for i,value in enumerate(values):
 out[col+str(i+1)]= value

answered 36 mins ago

user32185

9941722

add a commentÂ |Â

up vote
2
down vote

Not as beautiful as Scott answer but the logic you are looking for is:

out = pd.DataFrame()
for col in df.columns:
 values =df[col].unique()
 if len(values)==1:
 out[col]=values
 else:
 for i,value in enumerate(values):
 out[col+str(i+1)]= value

answered 36 mins ago

user32185

9941722

add a commentÂ |Â

up vote
2
down vote

Not as beautiful as Scott answer but the logic you are looking for is:

out = pd.DataFrame()
for col in df.columns:
 values =df[col].unique()
 if len(values)==1:
 out[col]=values
 else:
 for i,value in enumerate(values):
 out[col+str(i+1)]= value

answered 36 mins ago

user32185

9941722

Not as beautiful as Scott answer but the logic you are looking for is:

out = pd.DataFrame()
for col in df.columns:
 values =df[col].unique()
 if len(values)==1:
 out[col]=values
 else:
 for i,value in enumerate(values):
 out[col+str(i+1)]= value

answered 36 mins ago

user32185

9941722

answered 36 mins ago

user32185

9941722

answered 36 mins ago

user32185

9941722

answered 36 mins ago

user32185

9941722

add a commentÂ |Â

up vote
0
down vote

Using drop_duplicates

s=df.reset_index().melt('index').drop_duplicates(['variable','value'],keep='first')


pd.DataFrame([s.value.values.tolist()],columns=s['variable']+s['index'].astype(str))
Out[1151]: 
 a0 b0 c0 c1 c2 d0
0 t1 d2 e3 e2 e1 r4

answered 4 mins ago

Wen

81.7k72246

add a commentÂ |Â

up vote
0
down vote

Using drop_duplicates

s=df.reset_index().melt('index').drop_duplicates(['variable','value'],keep='first')


pd.DataFrame([s.value.values.tolist()],columns=s['variable']+s['index'].astype(str))
Out[1151]: 
 a0 b0 c0 c1 c2 d0
0 t1 d2 e3 e2 e1 r4

answered 4 mins ago

Wen

81.7k72246

add a commentÂ |Â

up vote
0
down vote

Using drop_duplicates

s=df.reset_index().melt('index').drop_duplicates(['variable','value'],keep='first')


pd.DataFrame([s.value.values.tolist()],columns=s['variable']+s['index'].astype(str))
Out[1151]: 
 a0 b0 c0 c1 c2 d0
0 t1 d2 e3 e2 e1 r4

answered 4 mins ago

Wen

81.7k72246

Using drop_duplicates

s=df.reset_index().melt('index').drop_duplicates(['variable','value'],keep='first')


pd.DataFrame([s.value.values.tolist()],columns=s['variable']+s['index'].astype(str))
Out[1151]: 
 a0 b0 c0 c1 c2 d0
0 t1 d2 e3 e2 e1 r4

answered 4 mins ago

Wen

81.7k72246

answered 4 mins ago

Wen

81.7k72246

answered 4 mins ago

Wen

81.7k72246

answered 4 mins ago

Wen

81.7k72246

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky