Pandas:drop_duplicates() based on condition in python -
having below data set:
data_input: b 1 c13d c07h 2 c07h c13d 3 b42c b65h 4 b65h b42c 5 a45b a47c i.e. row 1 , row 2 in data_input same,i want keep one,so drop row 2.
want output below:
data_output: b 1 c13d c07h 2 b42c b65h 3 a45b a47c
you can create third column 'c' based on 'a' , 'b' , use find duplicates such:
df['c'] = df['a'] + df['b'] df['c'] = df['c'].apply(lambda x: ''.join(sorted(x))) df = df.drop_duplicates(subset='c')[['a', 'b']]
Comments
Post a Comment