Pandas:drop_duplicates() based on condition in python -


having below data set:

data_input:          b 1  c13d  c07h 2  c07h  c13d 3  b42c  b65h 4  b65h  b42c 5  a45b  a47c 

i.e. row 1 , row 2 in data_input same,i want keep one,so drop row 2.

want output below:

data_output:          b 1  c13d  c07h 2  b42c  b65h 3  a45b  a47c 

you can create third column 'c' based on 'a' , 'b' , use find duplicates such:

df['c'] = df['a'] + df['b'] df['c'] = df['c'].apply(lambda x: ''.join(sorted(x))) df = df.drop_duplicates(subset='c')[['a', 'b']] 

Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -