# HOMOPHONE-SP, An encryption software employing homophonic substitution and # transpositions (with authentication) # Prologue. # Monoalphabetical substitution is liable to be easily analyzed due to the fact # that the frequency distribution of the plaintext letters and the corresponding # ciphertext letters are identical, since the mapping is one-to-one. Homophonic # substitution attempts to mitigate this defect through employing one-to-many # mapping instead. It is interesting that already in the 1400s a homophonic # substitution cipher was used in the city of Mantua in Italy [1, p.35], with # the set of homophones being the 26 lower-case letters plus the 10 digit # symbols, such that the more frequent letters a, e, i, o, and t can map to more # than one homophones. It is evident from the frequency distribution of the # letters of common English texts that a much larger set of homophones is # required in order that a more or less uniform statistical frequency # distribution of the homophones in the results of substitution could be # achieved. If, however, the ciphertext letters are to be human readable, which # is commonly desirable, the simple method of the genre of Mantua would limit # the set of homophones to a size of only about 70 and thus leaves the said # issue of frequency distribution of the homophones yet much to be desired. # A versatile method of obtaining larger sets of homophones -- albeit with the # trade-off of data expansion to a volume that is double of that of the # plaintext -- is shown in [1, p.35-36], where an example demonstrates the use # of the 100 pairs of decimal digits as the set of homophones. (Note that in # this case the ciphertext letters are in a rather tiny alphabet of 0-9.) # {1] D. Salomon, Data Privacy and Security, Springer-Verlag, 2003. # In designing the present software, we stipulate that the trade-off of data # expansion is practically acceptable to our targeted users, the common people, # since their message volumes are as a rule fairly small. With the method # mentioned above, any practically desirably large set of homophones can be # generated for the substitution. In order to obtain higher security, our design # is not a pure homophonic substitution but a combination of homophonic # substitution and pseudo-random transpositions. It consists of an initial # transposition of the sequence of letters of the user-given plaintext # (a pseudo-random checktext is appended to it for purpose of authentication # (integrity check)), a homophonic substitution of plaintext letters to pairs # of ciphertext letters, and a final transposition of the sequence of # ciphertext letters. # An illustrative example is given further below. # Version 2.1, released on 20.06.2017. # Update notes: # Version 1.0: Released on 06.06.2017. # Version 2.0: 15.06.2017: In order to enhance the complexity of analysis and # better cater for breaches of integrity, the sequence of letters resulting # from the homophonic substitution are chained in a sense analogous to block # chaining of modern block ciphers. See the function encryption(). # Version 2.1: 20.06.2017: The chaining processing of the outputs from # homophonic substitution is now done in a component independent of that # performing homophonic substitution. # Code lines of documents with the same version number are always identical. # There may be interim modifications of comment lines. # This software may be freely used: # 1. for all personal purposes unconditionally and # 2. for all other purposes under the condition that its name, version number # and authorship are explicitly mentioned and that the author is informed of # all eventual code modifications done. # A list of present author's software that are currently directly maintained by # himself is available at http://mok-kong-shen.de. Users are advised to # download such software from that home page only. # The author is indebted to TPS and CWL for review and suggestions throughout # HOMOPHONE-SP's development phase. Any remaining deficiencies of the software # are however the sole responsibility of the author. # Constructive critiques, comments and suggestions of extensions and # improvements are sincerely solicited. Address of the author: # # Email: mok-kong.shen@t-online.de # # Mail: Mok-Kong Shen, Postfach 340238, Munich 80099, Germany # # (Sender data optional, if no reply is required.) ################################################################################ import random # Compute from sessionkeyextension an integer v and concatenate it to secretkey # to form sessionkey. See Epilogue. def formsessionkey(): global secretkey,sessionkeyextension global sessionkey assert type(secretkey)==type(0) and type(sessionkeyextension)==type("a") by=bytearray(sessionkeyextension,'latin-1') lby=len(by) assert lby > 0 v=0 for i in range(lby): v<<=8 v|=by[i] sessionkey=(secretkey<<(lby*8))|v return # Initialization. Seed Python's built-in PRNG, compute checktext, set an initial # value for chaining the letters output from homophonic substitution, determine # numbhomophones, construct a homophonic mapping of letters of the plaintext # alphabet to letters of the ciphertext alphabet (in terms of their respective # indices) and its reverse. def inithomophone(): global sessionkey global ptalpha,ctalpha,lenptalpha,lenctalpha global ptfrequencypercent global lenchecktext,checktext global numbhomophones global mappingptindicestoctindicespairs,mappingctindicespairstoptindices global chaining for c in ptalpha: assert ptalpha.count(c)==1 for c in ctalpha: assert ctalpha.count(c)==1 lenptalpha=len(ptalpha) lenctalpha=len(ctalpha) # Seed Python's built-in PRNG. formsessionkey() random.seed(sessionkey) # Compute checktext for verification of integrity of processing and message # transmission. checktext="" for i in range(lenchecktext): checktext+=ptalpha[random.randint(0,lenptalpha-1)] # Initial value of chaining. chaining=random.randint(0,lenctalpha-1) # Using ptfrequencypercent, determine the optimal numbers of homophones # corresponding to the individual letters of ptalpha such that a possibly flat # statistical frequency distribution of the homophones (pairs of letters from # ctalpha) would result in the homophonic substitution. n=lenctalpha**2 numbhomophones=\ [round(n*ptfrequencypercent[i]/100) for i in range(lenptalpha)] # Modify (if necessary due to effects of rounding) such that min >= 1 and # sum = n. The modification is done on the largest entry of numbhomophones. for i in range(lenptalpha): if numbhomophones[i]<1: numbhomophones[i]=1 maxval=max(numbhomophones) maxidx=numbhomophones.index(maxval) diff=n-sum(numbhomophones) numbhomophones[maxidx]+=diff # Sanity check. assert min(numbhomophones) >= 1 and n > lenptalpha # allhomophonesindicespairs is a list of all possible pairs of incides of # letters from ctalpha. allctindicespairs=[] for ci in range(lenctalpha): for cj in range(lenctalpha): allctindicespairs.append([ci,cj]) # Pseudo-randomly shuffle this list and assign sections of appropriate length # as sublists of mappingptindicestoctindicespairs and compute its reverse # mappingctindicespairstoptindices. These two lists are, excepting that they # are in terms of the indices of the letters of the alphabet and not in terms # of the letters of the alphabet themselves, similar to Table 1.14 and # Table 1.15(a) of [1] respectively. random.shuffle(allctindicespairs) mappingptindicestoctindicespairs=[] mappingctindicespairstoptindices=\ [[0 for j in range(lenctalpha)] for i in range(lenctalpha)] k=0 for pti in range(lenptalpha): numbhomo=numbhomophones[pti] k1=k+numbhomo sublist=allctindicespairs[k:k1] mappingptindicestoctindicespairs.append(sublist) for j in range(numbhomo): ctindicespair=sublist[j] ct0i=ctindicespair[0] ct1i=ctindicespair[1] mappingctindicespairstoptindices[ct0i][ct1i]=pti k=k1 return def writestringfile(stringname,astring): f=open(stringname+"string.txt","w") f.write(astring) f.close() return def readstringfile(stringname): f=open(stringname+"string.txt","r") astring=f.read() f.close() return(astring) # Compute the pseudo-random indices of the initial transposition of the # user-given plaintext letters and the final transposition of the ciphertext # letters. def computetranspositionindices(): global lenusertext,lenplaintext,lenciphertext global initialtranspositionindices,finaltranspositionindices initialtranspositionindices=[i for i in range(lenusertext)] random.shuffle(initialtranspositionindices) finaltranspositionindices=[i for i in range(lenciphertext)] random.shuffle(finaltranspositionindices) return # Preprocessing (transposition) of usertext. # direction = 0: for transposition of usertext with initialtranspositionindices. # direction = 1: for reversing the effect done with direction=0. def preprocessing(usertext,direction): global lenusertext,lenplaintext,lenciphertext global initialtranspositionindices,finaltranspositionindices assert direction in [0,1] processedtextlist=list(usertext) if direction==0: for i in range(lenusertext): processedtextlist[i]=usertext[initialtranspositionindices[i]] else: for i in range(lenusertext): processedtextlist[initialtranspositionindices[i]]=usertext[i] processedusertext="".join(processedtextlist) return(processedusertext) # Postprocesing (transposition) of ciphertext. # direction = 0: for transposition of ciphertext with finaltranspositionindices. # direction = 1: for reversing the effect done with direction=0. def postprocessing(ciphertext,direction): global lenusertext,lenplaintext,lenciphertext global initialtranspositionindices,finaltranspositionindices assert direction in [0,1] processedtextlist=list(ciphertext) if direction==0: for i in range(lenciphertext): processedtextlist[i]=ciphertext[finaltranspositionindices[i]] else: for i in range(lenciphertext): processedtextlist[finaltranspositionindices[i]]=ciphertext[i] processedciphertext="".join(processedtextlist) return(processedciphertext) # Encryption processing of a text written by the user, resulting in ciphertext. def encrypt(usertext): global ptalpha,ctalpha,lenptalpha,lenctalpha global lenusertext,lenplaintext,lenciphertext global numbhomophones global mappingptindicestoctindicespairs,mappingctindicespairstoptindices global chaining global initialtranspositionindices,finaltranspositionindices global lenchecktext,checktext inithomophone() for p in usertext: assert p in ptalpha lenusertext=len(usertext) lenplaintext=lenusertext+lenchecktext lenciphertext=2*lenplaintext computetranspositionindices() # Preproossing of usertext. usertext=preprocessing(usertext,0) plaintext=usertext+checktext plaintextindiceslist=[ptalpha.index(plaintext[i]) for i in range(lenplaintext)] # Do homophonic substitution. The ciphertext letters of the letter pairs thus # obtained are, after chaining (via their indices), collected in # chainedctletterslist. chainedctletterslist=[] for i in range(lenplaintext): # pi is index of the plaintext letter in ptalpha. pi=plaintextindiceslist[i] # Find the corresponding number of homophones. numbhomo=numbhomophones[pi] # Make a pseudo-random choice. choice=random.randint(0,numbhomo-1) # Obtain the indices of the corresponding homophonic pair of letters of ctalpha. [ct0i,ct1i]=mappingptindicestoctindicespairs[pi][choice] # Chaining processing. (chaining in effect sums up all previously processed # letters.) chainedct0i=chaining=(ct0i+chaining)%lenctalpha chainedctletterslist.append(ctalpha[chainedct0i]) chainedct1i=chaining=(ct1i+chaining)%lenctalpha chainedctletterslist.append(ctalpha[chainedct1i]) ciphertext="".join(chainedctletterslist) # Postprocessing of ciphertext. ciphertext=postprocessing(ciphertext,0) return(ciphertext) # Decryption processing of ciphertext, the reverse of encrypt(). def decrypt(ciphertext): global ptalpha,ctalpha,lenptalpha,lenctalpha global lenusertext,lenplaintext,lenciphertext global numbhomophones global mappingptindicestoctindicespairs,mappingctindicespairstoptindices global chaining global initialtranspositionindices,finaltranspositionindices global lenchecktext,checktext inithomophone() for c in ciphertext: assert c in ctalpha lenciphertext=len(ciphertext) assert lenciphertext%2==0 lenplaintext=lenciphertext//2 ciphertext=postprocessing(ciphertext,1) chainedctletterslist=list(ciphertext) ptletterslist=[] for i in range(0,lenciphertext,2): chainedct0i=ctalpha.index(chainedctletterslist[i]) ct0i=(chainedct0i-chaining)%lenctalpha chaining=chainedct0i chainedct1i=ctalpha.index(chainedctletterslist[i+1]) ct1i=(chainedct1i-chaining)%lenctalpha chaining=chainedct1i pi=mappingctindicespairstoptindices[ct0i][ct1i] ptletterslist.append(ptalpha[pi]) plaintext="".join(ptletterslist) # Check whether the recovered checktext is a correct one. recoveredchecktext=plaintext[-lenchecktext:] if recoveredchecktext!=checktext: print("Error: Integrity check failed *********") exit(333) # Extract the decrypted usertext and return it. recoveredusertext=plaintext[:-lenchecktext] recoveredusertext=preprocessing(recoveredusertext,1) return(recoveredusertext) ################################################################################ # Installation of the software. # Both communication partners have to download the same version 3x of Python # from http://www.python.org. (Employing the same version of Python ensures # against any potentially possible incompatibilities among different versions.) # The present code can be stored in a file named e.g. homophonesp.py and the # examples given further below run in Python's GUI IDLE. (File --> Open to find # and open the file, then in the window showing the code Run --> Run Module to # run it. One could also type homophonesp.py in a DOS-window.) Modifications # of the code in the code window, e.g. the plaintext string, can be done online # and the code re-run. ################################################################################ # Illustrative example. # The communication partners have to specify the same (identical) system # parameter values, whose appropriate selection is users' own responsibility. # We assume for the present illustrative example the following system # parameter values: # Parameters for forming a session-dependent seed for Python's built-in PRNG. # secretkey is an integer in hexadecimal or decimal format, sessionkeyextension # is a text string. See Epilogue. secretkey=0x4a98717c70e997c2715e87c3fa1ef559 sessionkeyextension="SEL 15062017 RDI RN380" # Plaintext alphabet. ptalpha="abcdefghijklmnopqrstuvwxyz" # Corresponding plaintext letter frequencies in percent (from [2, fig.7.5]). ptfrequencypercent=\ [8.04,1.54,3.06,3.99,12.51,2.30,1.96,5.49,7.26,0.16,0.67,4.14, 2.53,7.09,7.60,2.00,0.11,6.12,6.54,9.25,2.71,0.99,1.92,0.19,1.73,0.10] # [2] A. J. Menezes et al., Handbook of Applied Cryptography, CRC Press, 1996. # Ciphertext alphabet. (Choice of its size may be assisted via printout of # numbhomophones after a test run of encryption processing.) ctalpha="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLM" # Length of checktext for purpose of integrity check. lenchecktext=10 # Sender specific code lines: print("Sender:") print() usertext="thisisanexampleusertextforourhomophoneencryptionscheme" print("usertext:") print(usertext) print() ciphertext=encrypt(usertext) print("ciphertext:") print(ciphertext) print() print() writestringfile("ciphertext",ciphertext) # Sender sends file ciphertextstring.txt to receiver (or a text string to be # assigned to a variable ciphertext by the receiver). # Receiver specific code lines: print("Receiver:") print() ciphertext=readstringfile("ciphertext") recoveredusertext=decrypt(ciphertext) print("recoveredusertext:") print(recoveredusertext) ################################################################################ # Appendix. # Users who are new to Python or who begin to use a new version of the Python # software may like to know a bit about the statistical qualities of its # built-in PRNG which is employed in HOMOPHONE-SP. The following code is # intended to serve for that purpose. prnbitn is the number of bits of the PRNs # to be generated and must be a multiple of 8. testseed is any arbitrary seed # chosen for running the test. One may like to try a number of different # prnbitn and testseed. import math,random # Maurer's Universal Test, see [6]. # # [6] J-S. Coron, D. Naccache, An Accurate Evaluation of Maurer's Universal Test. # http://www.jscoron.fr/publications/universal.pdf qq=2560 qqp1=qq+1 kk=256000 qqkkp1=qq+kk+1 def maurertest(bb): global qq,qqp1,kk,qqkkp1 eftu=7.1836656 # y1 and y2 are for rho=0.01 and rho=0.001 respectively. y1=2.5758 y2=3.2905 t=[0 for i in range(256)] for i in range(1,qqp1,1): t[bb[i]]=i sum=0.0 for i in range(qqp1,qqkkp1,1): sum+=math.log10(i-t[bb[i]]) t[bb[i]]=i tu=(sum/kk)/math.log10(2.0) c=math.sqrt(0.3732189+(0.3730195*256)/kk) sigma=c*math.sqrt(3.2386622/kk) t11=eftu-y1*sigma t12=eftu+y1*sigma t21=eftu-y2*sigma t22=eftu+y2*sigma return(tu,t11,t12,t21,t22) def maurertestresult(h,gg): global chnum global qq,qqp1,kk,qqkkp1 if h*chnum < qqkkp1: print("Error in maurertestresult") exit(6) bb=[0 for k in range(h*chnum)] u=0 k1=-1 k2=chnum-1 for i in range(h): g=gg[i] for k in range(k2,k1,-1): bb[u]=g&0xff g>>=8 u+=1 k1+=chnum k2+=chnum tu,t11,t12,t21,t22 = maurertest(bb) print("Maurer's Universal Test for L=8, rho=0.01 (Middle value is the "\ "test statistic\nand should lie between the other two values): "\ "%6.3f %6.3f %6.3f"%(t11,tu,t12)) def randtest(prnbitn,testseed): global chnum if prnbitn<8 or (prnbitn%8)!=0: print("Error randtext: Wrong prnbitn") exit(55555) chnum=prnbitn//8 h=qqkkp1//chnum if h*chnum < qqkkp1: h+=1 random.seed(testseed) tpwbitnm1=2**prnbitn-1 print("PRNs generated with random.randint():") gg=[random.randint(0,tpwbitnm1) for i in range(h)] maurertestresult(h,gg) print("PRNs generated with random.getrandbits():") gg=[random.getrandbits(prnbitn) for i in range(h)] maurertestresult(h,gg) return # Example of user chosen test parameter: prnbitn=128 testseed="My arbitrarily chosen seed" print() print() print("Test of Python's PRNG, parameters used:") print("prnbitn:",prnbitn) print("testseed:",testseed) randtest(prnbitn,testseed) ################################################################################ # Epilogue. # We presume that the computer, on which this software is run, is free from # malware infection via software and/or hardware means and that there are no # emission security risks (which could be manifold in practical situations). # In the way personally preferred by the present author, the parameter # "sessionkey" employed to seed Python's PRNG at initialization time consists of # two components: "secretkey", which should have sufficient entropy, e.g. # stemming from dice throws, and which could be used for a certain longer time # period before being renewed, and "sessionkeyextension", which is dynamic, i.e. # different for different sessions and which need not be secret (since it is its # variability that is important in the present context), being composed of # certain variable session-dependent data, e.g. date, time, subject, message # serial number, etc. In case there is no systematic scheme of derivation of # "sessionkeyextension" to be used by the receiver, it can be sent in the clear # to him. In view of the high dynamics of our sessionkey and the comparatively # low volumes of communications of our targeted users and the fact that the PRNs # are not directly used to process (e.g. via xor) the plaintext but indirectly # to treat it via certain pseudo-random operations, our use of Python's # built-in PRNG is obviously entirely secure in practice. For a utility to # convert dice throws to a hexadecimal sequence, see DICE. # Users who have also interest in public key cryptography may note that the # present author has a Python code of RSA key generation and encryption in his # software PROVABLEPRIME, which contains some comments of general interest. # Earlier the present author has written a software HOMOPHONE which does not # incur data expansion (measured in bytes) but whose ciphertext is binary and # thus unfortunately not readable. This explains the use of a suffix in the # name of the present software. # In the case of our illustrative example, print(numbhommophones) will give: # [122, 23, 47, 61, 191, 35, 30, 84, 110, 2, 10, 63, 38, 108, 116, 30, 2, 93, # 99, 141, 41, 15, 29, 3, 26, 2]