python - Transliterator fails at certain digraphs -
here code:
def digraph(chars): als = "шжяеёющчШЖЯЕЁЮЩЧ" new = {'sh':als[0],'zh':als[1],'ja':als[2],'je':als[3],'jo':als[4], 'ju':als[5],'sx':als[6],'ch':als[7],'sh':als[8],'zh':als[9], 'ja':als[10],'je':als[11],'jo':als[12],'ju':als[13],'sx':als[14], 'ch':als[15],'sh':als[8],'zh':als[9],'ja':als[10],'je':als[11], 'jo':als[12],'ju':als[13],'sx':als[14],'ch':als[15]} try: return new[chars] except: return "[error]" def trans_cyr(inp): cyrillic = "абцдэфгхийклмнопрстувъыьзАБЦДЭФГХИЙКЛМНОПРСТУВЫЗ " latin = "abcdefghijklmnoprstuv$y'zabcdefghijklmnoprstuvyz " digs = ['sh','zh','ja','je','jo','ju','sx','ch','sh','zh', 'ja','je','jo','ju','sx','ch','sh','zh','ja','je','jo','ju','sx', 'ch'] prevc = "" e, char in enumerate(inp): if(prevc != ""): comb = prevc + char newdig = digraph(comb) if(comb in digs): print(newdig, end="") prevc = "" else: pos = latin.index(char) posp = latin.index(inp[e - 1]) if(inp[e-1] in "szjcszjc"): print(cyrillic[posp] + cyrillic[pos], end="") prevc = "" else: prevc="" continue elif(char not in "szjcszjc"): try: pos = latin.index(char) print(cyrillic[pos], end="") except: print(char, end="") else: prevc = char while true: cyrinp = input("\n> ") trans_cyr(cyrinp) the code supposed transliterate latin alphabet cyrillic, first getting each character input (if not 'szjc' or uppercase equivalents), getting position of using index() function , acquiring cyrillic equivalent in same position latin one. however, cyrillic has letters such Я, Е, Ё, Ю, Ж, Ш, Щ, Ч, digraphs (ya, ye, yo, yu, zh, sh, shch (sx), ch) , therefore cannot transliterated 1 character. do, check whether current letter equal of 'szjcszjc', , if not print instead assign name prevc if next character combined prevc in array 'digs'. works perfectly, if type in 'jojajo' print "ёяё" should, - if there unfinished digraph (c without h, s without h, x, z without h, j without a, e, u, , o) next digraph not become transliterated. example: sjo : if enter sjo, expected output сё, instead getting сйо. there way can fix this?
edit:
i wrote code:
while true: cyrillic = "абцдэфгхийклмнопрстувъыьзАБЦДЭФГХИЙКЛМНОПРСТУВЫЗ " latin = "abcdefghijklmnoprstuv$y'zabcdefghijklmnoprstuvyz " als = "шжяеёющчШЖЯЕЁЮЩЧ" new = {'sh':als[0],'zh':als[1],'ja':als[2],'je':als[3],'jo':als[4], 'ju':als[5],'sx':als[6],'ch':als[7],'sh':als[8],'zh':als[9], 'ja':als[10],'je':als[11],'jo':als[12],'ju':als[13],'sx':als[14], 'ch':als[15],'sh':als[8],'zh':als[9],'ja':als[10],'je':als[11], 'jo':als[12],'ju':als[13],'sx':als[14],'ch':als[15]} inp = input("\n> ") + " " digraph = "" prevc = "" e, char in enumerate(inp): part_j = "jj" part_v = "aeouaeou" part_z = "zz" part_h = "hh" part_s = "ss" part_x = "hxhx" part_c = "cc" if((char in part_j , inp[e+1] in part_v) or (char in part_z , inp[e+1] in part_h) or (char in part_s , inp[e+1] in part_x) or (char in part_c , inp[e+1] in part_h)): digraph = "yes" else: digraph = "no" if((char in part_v , inp[e-1] in part_j) or (char in part_h , inp[e-1] in part_z) or (char in part_x , inp[e-1] in part_s) or (char in part_h , inp[e-1] in part_c)): comb = inp[e-1] + char dig = new[comb] print(dig, end="") elif(digraph == "yes"): prevc = char else: try: print(cyrillic[latin.index(char)],end="") except: print(char, end="") which appears have same sort of logic answer selected correct, , works :)
here solution uses same code logic approach, more written.
cyrillic = u"абцдэфгхийклмнопрстувъыьзАБЦДЭФГХИЙКЛМНОПРСТУВЫЗ " latin = u"abcdefghijklmnoprstuv$y'zabcdefghijklmnoprstuvyz " digraphs = u"шжяеёющчШЖЯЕЁЮЩЧШЖЯЕЁЮЩЧ" latin_digraphs = [u'sh', u'zh', u'ja', u'je', u'jo', u'ju', u'sx', u'ch', u'sh', u'zh', u'ja', u'je', u'jo', u'ju', u'sx', u'ch', u'sh', u'zh', u'ja', u'je', u'jo', u'ju', u'sx', u'ch'] mapping = dict(zip(list(latin) + latin_digraphs, cyrillic + digraphs)) digraph_first_letter = u'szjcszjc' def latin_to_cyrillic(word): translation = [] possible_digraph = false letter in word: if possible_digraph: combination = previous_letter + letter if combination in latin_digraphs: translation.append(mapping[combination]) possible_digraph = false else: translation.append(mapping[previous_letter]) if letter in digraph_first_letter: previous_letter = letter else: translation.append(letter) possible_digraph = false else: if letter in digraph_first_letter: possible_digraph = true previous_letter = letter else: translation.append(mapping[letter]) if possible_digraph: translation.append(mapping[previous_letter]) return ''.join(translation) print latin_to_cyrillic('sjo') print latin_to_cyrillic('jojajo') the logic follows.
- for each letter, check whether previous letter beginning of digraph.
- if yes, check whether previous , current letters legitimate digraph , translate them. if not, translate previous letter single letter , check whether current letter beginning of digraph. if is, store in buffer , move on, else translate that, too.
- if previous letter not beginning of digraph, check whether current letter beginning of digraph. if yes, store in buffer , move, else translate it.
instead of finding index of letter in latin alphabet , using index cyrillic alphabet, can use dictionary maps each letter 1 language other. create list of latin symbols (single , digraphs) , 1 cyrillic. have ensure respective order of symbols both same. dict(zip(alphabet1, alphabet2)) create mapping each letter of alphabet1 of same index in alphabet2.
Comments
Post a Comment