How to detect this unprintable character using regex or other ways in Ruby? -


i came upon strange character (using nokogiri).

irb(main):081:0> sss.dump => "\"\\u{a0}\"" irb(main):082:0> puts sss   => nil  irb(main):083:0> sss  => " "  irb(main):084:0> sss =~ /\s/ => nil  irb(main):085:0> sss =~ /[[:print:]]/ => 0 irb(main):087:0> sss == ' '  => false irb(main):088:0> sss.length => 1 

any idea strange character?

when it's displayed in webpage, it's white space, doesn't match whitespace \s using regular expression. ruby thinks it's printable character!

how detect characters , exclude them or flag them whitespace (if possible)?

thanks

it's non-breaking space. in html, it's used pretty , written  . 1 way find out identity of character "\u{a0}" search web u+00a0 (using 4 or more hexadecimal digits) because that's how unicode specification notates unicode code points.

the non-breaking space , other things included in regex /[[:space:]]/.


Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -