How to detect this unprintable character using regex or other ways in Ruby? -
i came upon strange character (using nokogiri).
irb(main):081:0> sss.dump => "\"\\u{a0}\"" irb(main):082:0> puts sss => nil irb(main):083:0> sss => " " irb(main):084:0> sss =~ /\s/ => nil irb(main):085:0> sss =~ /[[:print:]]/ => 0 irb(main):087:0> sss == ' ' => false irb(main):088:0> sss.length => 1 any idea strange character?
when it's displayed in webpage, it's white space, doesn't match whitespace \s using regular expression. ruby thinks it's printable character!
how detect characters , exclude them or flag them whitespace (if possible)?
thanks
it's non-breaking space. in html, it's used pretty , written . 1 way find out identity of character "\u{a0}" search web u+00a0 (using 4 or more hexadecimal digits) because that's how unicode specification notates unicode code points.
the non-breaking space , other things included in regex /[[:space:]]/.
Comments
Post a Comment