HOMOPHONE-SP

Return to main page




# HOMOPHONE-SP, An encryption software employing homophonic substitution and
# transpositions (with authentication)


# Prologue.


# Monoalphabetical substitution is liable to be easily analyzed due to the fact
# that the frequency distribution of the plaintext letters and the corresponding
# ciphertext letters are identical, since the mapping is one-to-one. Homophonic
# substitution attempts to mitigate this defect through employing one-to-many
# mapping instead. It is interesting that already in the 1400s a homophonic
# substitution cipher was used in the city of Mantua in Italy [1, p.35], with
# the set of homophones being the 26 lower-case letters plus the 10 digit
# symbols, such that the more frequent letters a, e, i, o, and t can map to more
# than one homophones. It is evident from the frequency distribution of the
# letters of common English texts that a much larger set of homophones is
# required in order that a more or less uniform statistical frequency
# distribution of the homophones in the results of substitution could be
# achieved. If, however, the ciphertext letters are to be human readable, which
# is commonly desirable, the simple method of the genre of Mantua would limit
# the set of homophones to a size of only about 70 and thus leaves the said
# issue of frequency distribution of the homophones yet much to be desired.
# A versatile method of obtaining larger sets of homophones -- albeit with the
# trade-off of data expansion to a volume that is double of that of the
# plaintext -- is shown in [1, p.35-36], where an example demonstrates the use
# of the 100 pairs of decimal digits as the set of homophones. (Note that in
# this case the ciphertext letters are in a rather tiny alphabet of 0-9.)

# {1] D. Salomon, Data Privacy and Security, Springer-Verlag, 2003.

# In designing the present software, we stipulate that the trade-off of data
# expansion is practically acceptable to our targeted users, the common people,
# since their message volumes are as a rule fairly small. With the method
# mentioned above, any practically desirably large set of homophones can be
# generated for the substitution. In order to obtain higher security, our design
# is not a pure homophonic substitution but a combination of homophonic
# substitution and pseudo-random transpositions. It consists of an initial
# transposition of the sequence of letters of the user-given plaintext
# (a pseudo-random checktext is appended to it for purpose of authentication
# (integrity check)), a homophonic substitution of plaintext letters to pairs
# of ciphertext letters, and a final transposition of the sequence of
# ciphertext letters.

# An illustrative example is given further below.


# Version 2.1, released on 20.06.2017.

# Update notes:

# Version 1.0: Released on 06.06.2017.  

# Version 2.0: 15.06.2017: In order to enhance the complexity of analysis and
# better cater for breaches of integrity, the sequence of letters resulting
# from the homophonic substitution are chained in a sense analogous to block
# chaining of modern block ciphers. See the function encryption().

# Version 2.1: 20.06.2017: The chaining processing of the outputs from
# homophonic substitution is now done in a component independent of that
# performing homophonic substitution.

# Code lines of documents with the same version number are always identical.
# There may be interim modifications of comment lines. 


# This software may be freely used:

# 1. for all personal purposes unconditionally and

# 2. for all other purposes under the condition that its name, version number 
#    and authorship are explicitly mentioned and that the author is informed of
#    all eventual code modifications done.


# A list of present author's software that are currently directly maintained by
# himself is available at http://mok-kong-shen.de. Users are advised to
# download such software from that home page only.


# The author is indebted to TPS and CWL for review and suggestions throughout
# HOMOPHONE-SP's development phase. Any remaining deficiencies of the software
# are however the sole responsibility of the author.


# Constructive critiques, comments and suggestions of extensions and
# improvements are sincerely solicited. Address of the author:
#
# Email: mok-kong.shen@t-online.de
#
# Mail: Mok-Kong Shen, Postfach 340238, Munich 80099, Germany
#
# (Sender data optional, if no reply is required.)



################################################################################



import random


# Compute from sessionkeyextension an integer v and concatenate it to secretkey
# to form sessionkey. See Epilogue.

def formsessionkey():
  global secretkey,sessionkeyextension
  global sessionkey
  assert type(secretkey)==type(0) and type(sessionkeyextension)==type("a")
  by=bytearray(sessionkeyextension,'latin-1')
  lby=len(by)
  assert lby > 0
  v=0
  for i in range(lby):
    v<<=8
    v|=by[i]
  sessionkey=(secretkey<<(lby*8))|v
  return


# Initialization. Seed Python's built-in PRNG, compute checktext, set an initial
# value for chaining the letters output from homophonic substitution, determine
# numbhomophones, construct a homophonic mapping of letters of the plaintext
# alphabet to letters of the ciphertext alphabet (in terms of their respective
# indices) and its reverse.

def inithomophone():
  global sessionkey
  global ptalpha,ctalpha,lenptalpha,lenctalpha
  global ptfrequencypercent
  global lenchecktext,checktext
  global numbhomophones
  global mappingptindicestoctindicespairs,mappingctindicespairstoptindices
  global chaining
  for c in ptalpha:
    assert ptalpha.count(c)==1
  for c in ctalpha:
    assert ctalpha.count(c)==1
  lenptalpha=len(ptalpha)
  lenctalpha=len(ctalpha)
# Seed Python's built-in PRNG.
  formsessionkey()
  random.seed(sessionkey)
# Compute checktext for verification of integrity of processing and message
# transmission.
  checktext=""
  for i in range(lenchecktext):
    checktext+=ptalpha[random.randint(0,lenptalpha-1)]
# Initial value of chaining.
  chaining=random.randint(0,lenctalpha-1)
# Using ptfrequencypercent, determine the optimal numbers of homophones
# corresponding to the individual letters of ptalpha such that a possibly flat
# statistical frequency distribution of the homophones (pairs of letters from
# ctalpha) would result in the homophonic substitution.
  n=lenctalpha**2
  numbhomophones=\
    [round(n*ptfrequencypercent[i]/100) for i in range(lenptalpha)]
# Modify (if necessary due to effects of rounding) such that min >= 1 and
# sum = n. The modification is done on the largest entry of numbhomophones.
  for i in range(lenptalpha):
    if numbhomophones[i]<1:
      numbhomophones[i]=1
  maxval=max(numbhomophones)
  maxidx=numbhomophones.index(maxval)
  diff=n-sum(numbhomophones)
  numbhomophones[maxidx]+=diff
# Sanity check.
  assert min(numbhomophones) >= 1 and n > lenptalpha
# allhomophonesindicespairs is a list of all possible pairs of incides of
# letters from ctalpha.
  allctindicespairs=[]
  for ci in range(lenctalpha):
    for cj in range(lenctalpha):
      allctindicespairs.append([ci,cj])
# Pseudo-randomly shuffle this list and assign sections of appropriate length
# as sublists of mappingptindicestoctindicespairs and compute its reverse
# mappingctindicespairstoptindices. These two lists are, excepting that they
# are in terms of the indices of the letters of the alphabet and not in terms
# of the letters of the alphabet themselves, similar to Table 1.14 and
# Table 1.15(a) of [1] respectively.
  random.shuffle(allctindicespairs) 
  mappingptindicestoctindicespairs=[]
  mappingctindicespairstoptindices=\
    [[0 for j in range(lenctalpha)] for i in range(lenctalpha)]
  k=0
  for pti in range(lenptalpha):
    numbhomo=numbhomophones[pti]
    k1=k+numbhomo
    sublist=allctindicespairs[k:k1]
    mappingptindicestoctindicespairs.append(sublist)
    for j in range(numbhomo):
      ctindicespair=sublist[j]
      ct0i=ctindicespair[0]
      ct1i=ctindicespair[1]
      mappingctindicespairstoptindices[ct0i][ct1i]=pti
    k=k1
  return


def writestringfile(stringname,astring):
  f=open(stringname+"string.txt","w")
  f.write(astring)
  f.close()
  return


def readstringfile(stringname):
  f=open(stringname+"string.txt","r")
  astring=f.read()
  f.close()
  return(astring)


# Compute the pseudo-random indices of the initial transposition of the
# user-given plaintext letters and the final transposition of the ciphertext
# letters.

def computetranspositionindices():
  global lenusertext,lenplaintext,lenciphertext
  global initialtranspositionindices,finaltranspositionindices
  initialtranspositionindices=[i for i in range(lenusertext)]
  random.shuffle(initialtranspositionindices)
  finaltranspositionindices=[i for i in range(lenciphertext)]
  random.shuffle(finaltranspositionindices)
  return


# Preprocessing (transposition) of usertext.
# direction = 0: for transposition of usertext with initialtranspositionindices.
# direction = 1: for reversing the effect done with direction=0.

def preprocessing(usertext,direction):
  global lenusertext,lenplaintext,lenciphertext
  global initialtranspositionindices,finaltranspositionindices
  assert direction in [0,1]
  processedtextlist=list(usertext)
  if direction==0:
    for i in range(lenusertext):
      processedtextlist[i]=usertext[initialtranspositionindices[i]]
  else:
    for i in range(lenusertext):
      processedtextlist[initialtranspositionindices[i]]=usertext[i]
  processedusertext="".join(processedtextlist)
  return(processedusertext)


# Postprocesing (transposition) of ciphertext.
# direction = 0: for transposition of ciphertext with finaltranspositionindices.
# direction = 1: for reversing the effect done with direction=0.

def postprocessing(ciphertext,direction):
  global lenusertext,lenplaintext,lenciphertext
  global initialtranspositionindices,finaltranspositionindices
  assert direction in [0,1]
  processedtextlist=list(ciphertext)
  if direction==0:
    for i in range(lenciphertext):
      processedtextlist[i]=ciphertext[finaltranspositionindices[i]]
  else:
    for i in range(lenciphertext):
      processedtextlist[finaltranspositionindices[i]]=ciphertext[i]
  processedciphertext="".join(processedtextlist)
  return(processedciphertext)


# Encryption processing of a text written by the user, resulting in ciphertext.

def encrypt(usertext):
  global ptalpha,ctalpha,lenptalpha,lenctalpha
  global lenusertext,lenplaintext,lenciphertext
  global numbhomophones
  global mappingptindicestoctindicespairs,mappingctindicespairstoptindices  
  global chaining
  global initialtranspositionindices,finaltranspositionindices
  global lenchecktext,checktext
  inithomophone()
  for p in usertext:
    assert p in ptalpha
  lenusertext=len(usertext)
  lenplaintext=lenusertext+lenchecktext
  lenciphertext=2*lenplaintext
  computetranspositionindices()
# Preproossing of usertext.
  usertext=preprocessing(usertext,0)
  plaintext=usertext+checktext
  plaintextindiceslist=[ptalpha.index(plaintext[i])
                        for i in range(lenplaintext)]
# Do homophonic substitution. The ciphertext letters of the letter pairs thus
# obtained are, after chaining (via their indices), collected in
# chainedctletterslist.
  chainedctletterslist=[]
  for i in range(lenplaintext):
# pi is index of the plaintext letter in ptalpha.
    pi=plaintextindiceslist[i]
# Find the corresponding number of homophones.
    numbhomo=numbhomophones[pi]
# Make a pseudo-random choice.
    choice=random.randint(0,numbhomo-1)
# Obtain the indices of the corresponding homophonic pair of letters of ctalpha.
    [ct0i,ct1i]=mappingptindicestoctindicespairs[pi][choice]
# Chaining processing. (chaining in effect sums up all previously processed
# letters.)
    chainedct0i=chaining=(ct0i+chaining)%lenctalpha
    chainedctletterslist.append(ctalpha[chainedct0i])
    chainedct1i=chaining=(ct1i+chaining)%lenctalpha
    chainedctletterslist.append(ctalpha[chainedct1i])
  ciphertext="".join(chainedctletterslist)
# Postprocessing of ciphertext.
  ciphertext=postprocessing(ciphertext,0)
  return(ciphertext)


# Decryption processing of ciphertext, the reverse of encrypt().

def decrypt(ciphertext):
  global ptalpha,ctalpha,lenptalpha,lenctalpha
  global lenusertext,lenplaintext,lenciphertext
  global numbhomophones
  global mappingptindicestoctindicespairs,mappingctindicespairstoptindices  
  global chaining
  global initialtranspositionindices,finaltranspositionindices
  global lenchecktext,checktext
  inithomophone()
  for c in ciphertext:
    assert c in ctalpha
  lenciphertext=len(ciphertext)
  assert lenciphertext%2==0
  lenplaintext=lenciphertext//2
  ciphertext=postprocessing(ciphertext,1)
  chainedctletterslist=list(ciphertext)
  ptletterslist=[]
  for i in range(0,lenciphertext,2):
    chainedct0i=ctalpha.index(chainedctletterslist[i])
    ct0i=(chainedct0i-chaining)%lenctalpha
    chaining=chainedct0i
    chainedct1i=ctalpha.index(chainedctletterslist[i+1])
    ct1i=(chainedct1i-chaining)%lenctalpha
    chaining=chainedct1i
    pi=mappingctindicespairstoptindices[ct0i][ct1i]
    ptletterslist.append(ptalpha[pi])                         
  plaintext="".join(ptletterslist)
# Check whether the recovered checktext is a correct one.
  recoveredchecktext=plaintext[-lenchecktext:]
  if recoveredchecktext!=checktext:
    print("Error: Integrity check failed *********")
    exit(333)
# Extract the decrypted usertext and return it.
  recoveredusertext=plaintext[:-lenchecktext]
  recoveredusertext=preprocessing(recoveredusertext,1)
  return(recoveredusertext)



################################################################################



# Installation of the software.

# Both communication partners have to download the same version 3x of Python
# from http://www.python.org. (Employing the same version of Python ensures
# against any potentially possible incompatibilities among different versions.)
# The present code can be stored in a file named e.g. homophonesp.py and the
# examples given further below run in Python's GUI IDLE. (File --> Open to find
# and open the file, then in the window showing the code Run --> Run Module to
# run it. One could also type homophonesp.py in a DOS-window.) Modifications
# of the code in the code window, e.g. the plaintext string, can be done online
# and the code re-run.



################################################################################



# Illustrative example.


# The communication partners have to specify the same (identical) system
# parameter values, whose appropriate selection is users' own responsibility.
# We assume for the present illustrative example the following system
# parameter values:

# Parameters for forming a session-dependent seed for Python's built-in PRNG.
# secretkey is an integer in hexadecimal or decimal format, sessionkeyextension
# is a text string. See Epilogue.

secretkey=0x4a98717c70e997c2715e87c3fa1ef559

sessionkeyextension="SEL 15062017 RDI RN380"


# Plaintext alphabet.

ptalpha="abcdefghijklmnopqrstuvwxyz"

# Corresponding plaintext letter frequencies in percent (from [2, fig.7.5]).

ptfrequencypercent=\
  [8.04,1.54,3.06,3.99,12.51,2.30,1.96,5.49,7.26,0.16,0.67,4.14,
   2.53,7.09,7.60,2.00,0.11,6.12,6.54,9.25,2.71,0.99,1.92,0.19,1.73,0.10]

# [2] A. J. Menezes et al., Handbook of Applied Cryptography, CRC Press, 1996.

# Ciphertext alphabet. (Choice of its size may be assisted via printout of
# numbhomophones after a test run of encryption processing.)

ctalpha="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLM"

# Length of checktext for purpose of integrity check.

lenchecktext=10



# Sender specific code lines:

print("Sender:")
print()

usertext="thisisanexampleusertextforourhomophoneencryptionscheme"

print("usertext:")
print(usertext)
print()

ciphertext=encrypt(usertext)

print("ciphertext:")
print(ciphertext)
print()
print()

writestringfile("ciphertext",ciphertext)

# Sender sends file ciphertextstring.txt to receiver (or a text string to be
# assigned to a variable ciphertext by the receiver).



# Receiver specific code lines:

print("Receiver:")
print()

ciphertext=readstringfile("ciphertext")

recoveredusertext=decrypt(ciphertext)

print("recoveredusertext:")
print(recoveredusertext)



################################################################################



# Appendix.


# Users who are new to Python or who begin to use a new version of the Python
# software may like to know a bit about the statistical qualities of its
# built-in PRNG which is employed in HOMOPHONE-SP. The following code is
# intended to serve for that purpose. prnbitn is the number of bits of the PRNs
# to be generated and must be a multiple of 8. testseed is any arbitrary seed
# chosen for running the test. One may like to try a number of different
# prnbitn and testseed.

import math,random

# Maurer's Universal Test, see [6].
#
# [6] J-S. Coron, D. Naccache, An Accurate Evaluation of Maurer's Universal Test.
#     http://www.jscoron.fr/publications/universal.pdf


qq=2560
qqp1=qq+1
kk=256000
qqkkp1=qq+kk+1


def maurertest(bb):
  global qq,qqp1,kk,qqkkp1
  eftu=7.1836656
# y1 and y2 are for rho=0.01 and rho=0.001 respectively.
  y1=2.5758
  y2=3.2905
  t=[0 for i in range(256)]
  for i in range(1,qqp1,1):
    t[bb[i]]=i
  sum=0.0
  for i in range(qqp1,qqkkp1,1):
    sum+=math.log10(i-t[bb[i]])
    t[bb[i]]=i
  tu=(sum/kk)/math.log10(2.0)
  c=math.sqrt(0.3732189+(0.3730195*256)/kk)
  sigma=c*math.sqrt(3.2386622/kk)
  t11=eftu-y1*sigma
  t12=eftu+y1*sigma
  t21=eftu-y2*sigma
  t22=eftu+y2*sigma
  return(tu,t11,t12,t21,t22)


def maurertestresult(h,gg):
  global chnum
  global qq,qqp1,kk,qqkkp1
  if h*chnum < qqkkp1:
    print("Error in maurertestresult")
    exit(6)
  bb=[0 for k in range(h*chnum)]
  u=0
  k1=-1
  k2=chnum-1
  for i in range(h):
    g=gg[i]
    for k in range(k2,k1,-1):
      bb[u]=g&0xff
      g>>=8
      u+=1
    k1+=chnum
    k2+=chnum
  tu,t11,t12,t21,t22 = maurertest(bb)
  print("Maurer's Universal Test for L=8, rho=0.01 (Middle value is the "\
        "test statistic\nand should lie between the other two values): "\
        "%6.3f %6.3f %6.3f"%(t11,tu,t12))


def randtest(prnbitn,testseed):
  global chnum
  if prnbitn<8 or (prnbitn%8)!=0:
    print("Error randtext: Wrong prnbitn")
    exit(55555)
  chnum=prnbitn//8
  h=qqkkp1//chnum
  if h*chnum < qqkkp1:
    h+=1
  random.seed(testseed)
  tpwbitnm1=2**prnbitn-1
  print("PRNs generated with random.randint():")
  gg=[random.randint(0,tpwbitnm1) for i in range(h)]
  maurertestresult(h,gg)
  print("PRNs generated with random.getrandbits():")
  gg=[random.getrandbits(prnbitn) for i in range(h)]
  maurertestresult(h,gg)
  return


# Example of user chosen test parameter:

prnbitn=128

testseed="My arbitrarily chosen seed"

print()
print()
print("Test of Python's PRNG, parameters used:")
print("prnbitn:",prnbitn)
print("testseed:",testseed)
randtest(prnbitn,testseed)



################################################################################



# Epilogue.


# We presume that the computer, on which this software is run, is free from
# malware infection via software and/or hardware means and that there are no
# emission security risks (which could be manifold in practical situations).

# In the way personally preferred by the present author, the parameter
# "sessionkey" employed to seed Python's PRNG at initialization time consists of
# two components: "secretkey", which should have sufficient entropy, e.g.
# stemming from dice throws, and which could be used for a certain longer time
# period before being renewed, and "sessionkeyextension", which is dynamic, i.e.
# different for different sessions and which need not be secret (since it is its
# variability that is important in the present context), being composed of
# certain variable session-dependent data, e.g. date, time, subject, message
# serial number, etc. In case there is no systematic scheme of derivation of
# "sessionkeyextension" to be used by the receiver, it can be sent in the clear
# to him. In view of the high dynamics of our sessionkey and the comparatively 
# low volumes of communications of our targeted users and the fact that the PRNs
# are not directly used to process (e.g. via xor) the plaintext but indirectly
# to treat it via certain pseudo-random operations, our use of Python's
# built-in PRNG is obviously entirely secure in practice. For a utility to 
# convert dice throws to a hexadecimal sequence, see DICE.

# Users who have also interest in public key cryptography may note that the
# present author has a Python code of RSA key generation and encryption in his
# software PROVABLEPRIME, which contains some comments of general interest.

# Earlier the present author has written a software HOMOPHONE which does not
# incur data expansion (measured in bytes) but whose ciphertext is binary and
# thus unfortunately not readable. This explains the use of a suffix in the
# name of the present software.

# In the case of our illustrative example, print(numbhommophones) will give:
# [122, 23, 47, 61, 191, 35, 30, 84, 110, 2, 10, 63, 38, 108, 116, 30, 2, 93,
# 99, 141, 41, 15, 29, 3, 26, 2]



Return to main page