python - Transliterator fails at certain digraphs -

here code:

def digraph(chars): als = "шжяеёющчШЖЯЕЁЮЩЧ" new = {'sh':als[0],'zh':als[1],'ja':als[2],'je':als[3],'jo':als[4],        'ju':als[5],'sx':als[6],'ch':als[7],'sh':als[8],'zh':als[9],        'ja':als[10],'je':als[11],'jo':als[12],'ju':als[13],'sx':als[14],        'ch':als[15],'sh':als[8],'zh':als[9],'ja':als[10],'je':als[11],        'jo':als[12],'ju':als[13],'sx':als[14],'ch':als[15]} try:     return new[chars] except:     return "[error]"  def trans_cyr(inp): cyrillic = "абцдэфгхийклмнопрстувъыьзАБЦДЭФГХИЙКЛМНОПРСТУВЫЗ " latin = "abcdefghijklmnoprstuv$y'zabcdefghijklmnoprstuvyz " digs = ['sh','zh','ja','je','jo','ju','sx','ch','sh','zh',         'ja','je','jo','ju','sx','ch','sh','zh','ja','je','jo','ju','sx',         'ch'] prevc = "" e, char in enumerate(inp):     if(prevc != ""):         comb = prevc + char         newdig = digraph(comb)         if(comb in digs):             print(newdig, end="")             prevc = ""         else:             pos = latin.index(char)             posp = latin.index(inp[e - 1])             if(inp[e-1] in "szjcszjc"):                 print(cyrillic[posp] + cyrillic[pos], end="")                 prevc = ""             else:                 prevc=""                 continue     elif(char not in "szjcszjc"):         try:             pos = latin.index(char)             print(cyrillic[pos], end="")         except:             print(char, end="")     else:         prevc = char  while true: cyrinp = input("\n> ") trans_cyr(cyrinp)

the code supposed transliterate latin alphabet cyrillic, first getting each character input (if not 'szjc' or uppercase equivalents), getting position of using index() function , acquiring cyrillic equivalent in same position latin one. however, cyrillic has letters such Я, Е, Ё, Ю, Ж, Ш, Щ, Ч, digraphs (ya, ye, yo, yu, zh, sh, shch (sx), ch) , therefore cannot transliterated 1 character. do, check whether current letter equal of 'szjcszjc', , if not print instead assign name prevc if next character combined prevc in array 'digs'. works perfectly, if type in 'jojajo' print "ёяё" should, - if there unfinished digraph (c without h, s without h, x, z without h, j without a, e, u, , o) next digraph not become transliterated. example: sjo : if enter sjo, expected output сё, instead getting сйо. there way can fix this?

edit:

i wrote code:

while true: cyrillic = "абцдэфгхийклмнопрстувъыьзАБЦДЭФГХИЙКЛМНОПРСТУВЫЗ " latin = "abcdefghijklmnoprstuv$y'zabcdefghijklmnoprstuvyz " als = "шжяеёющчШЖЯЕЁЮЩЧ" new = {'sh':als[0],'zh':als[1],'ja':als[2],'je':als[3],'jo':als[4],        'ju':als[5],'sx':als[6],'ch':als[7],'sh':als[8],'zh':als[9],        'ja':als[10],'je':als[11],'jo':als[12],'ju':als[13],'sx':als[14],        'ch':als[15],'sh':als[8],'zh':als[9],'ja':als[10],'je':als[11],        'jo':als[12],'ju':als[13],'sx':als[14],'ch':als[15]} inp = input("\n> ") + " " digraph = "" prevc = "" e, char in enumerate(inp):     part_j = "jj"     part_v = "aeouaeou"     part_z = "zz"     part_h = "hh"     part_s = "ss"     part_x = "hxhx"     part_c = "cc"     if((char in part_j , inp[e+1] in part_v) or (char in part_z , inp[e+1] in part_h) or (char in part_s , inp[e+1] in part_x) or (char in part_c , inp[e+1] in part_h)):         digraph = "yes"     else:         digraph = "no"      if((char in part_v , inp[e-1] in part_j) or (char in part_h , inp[e-1] in part_z) or (char in part_x , inp[e-1] in part_s) or (char in part_h , inp[e-1] in part_c)):         comb = inp[e-1] + char         dig = new[comb]         print(dig, end="")     elif(digraph == "yes"):         prevc = char     else:         try:             print(cyrillic[latin.index(char)],end="")         except:             print(char, end="")

which appears have same sort of logic answer selected correct, , works :)

here solution uses same code logic approach, more written.

cyrillic = u"абцдэфгхийклмнопрстувъыьзАБЦДЭФГХИЙКЛМНОПРСТУВЫЗ " latin = u"abcdefghijklmnoprstuv$y'zabcdefghijklmnoprstuvyz "  digraphs = u"шжяеёющчШЖЯЕЁЮЩЧШЖЯЕЁЮЩЧ" latin_digraphs = [u'sh', u'zh', u'ja', u'je', u'jo', u'ju', u'sx', u'ch',                   u'sh', u'zh', u'ja', u'je', u'jo', u'ju', u'sx', u'ch',                   u'sh', u'zh', u'ja', u'je', u'jo', u'ju', u'sx', u'ch']  mapping = dict(zip(list(latin) + latin_digraphs, cyrillic + digraphs)) digraph_first_letter = u'szjcszjc'  def latin_to_cyrillic(word):     translation = []     possible_digraph = false      letter in word:         if possible_digraph:             combination = previous_letter + letter             if combination in latin_digraphs:                 translation.append(mapping[combination])                 possible_digraph = false             else:                 translation.append(mapping[previous_letter])                 if letter in digraph_first_letter:                     previous_letter = letter                 else:                     translation.append(letter)                     possible_digraph = false         else:             if letter in digraph_first_letter:                 possible_digraph = true                 previous_letter = letter             else:                 translation.append(mapping[letter])     if possible_digraph:         translation.append(mapping[previous_letter])     return ''.join(translation)  print latin_to_cyrillic('sjo') print latin_to_cyrillic('jojajo')

the logic follows.

for each letter, check whether previous letter beginning of digraph.
if yes, check whether previous , current letters legitimate digraph , translate them. if not, translate previous letter single letter , check whether current letter beginning of digraph. if is, store in buffer , move on, else translate that, too.
if previous letter not beginning of digraph, check whether current letter beginning of digraph. if yes, store in buffer , move, else translate it.

instead of finding index of letter in latin alphabet , using index cyrillic alphabet, can use dictionary maps each letter 1 language other. create list of latin symbols (single , digraphs) , 1 cyrillic. have ensure respective order of symbols both same. dict(zip(alphabet1, alphabet2)) create mapping each letter of alphabet1 of same index in alphabet2.

Search This Blog

If code

python - Transliterator fails at certain digraphs -

Comments

Post a Comment

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

React Native allow user to reorder elements in a scrollview list -