WORDLISTTEXTSTEGANOGRAPHY

Return to main page




# WORDLISTTEXTSTEGANOGRAPHY, a software to hide secret information bits in
# natural language texts employing an extensive word list.


# Prologue.


# While steganography with images or audio media as cover has highly advanced
# developments in our digital era [1, 2] (partly due to practical interests in
# the closely related issue of watermarking), there appear to be barely really
# significant advancements in steganography with natural language texts as
# cover. In format-based text steganography modern printing employing pixels
# certainly allows easy achievement of impossibility of detection by naked eyes
# of minute modifications of the printout. However, if the digital file of a
# document is available for examination, the fact that such modifications
# occurred is at least highly susceptible to detection. The present author in
# his EMAILSTEGANO exploits instead the relative freedom of the number of words
# in the lines of emails to transmit 1 stegobit per line (i.e. number of words
# mod 2), which should well avoid detection. However, the stego bitrate of that
# scheme is fairly low, although with sufficient number of emails any
# arbitrarily long stegobit sequence could in principle be transferred that way.
# In the quest of higher stego bitrates we intend therefore in the present
# software to exploit an idea (possibly a bit novel) that belongs to the other
# subfield of text steganography, namely linguistics based steganography. In
# linguistic steganography there are since quite a time works in the direction
# initiated by P. Wayner's paper on "mimic functions". However, in our view all
# texts generated by machines alone are by nature difficult to achieve high
# naturalness. Hence we let on the contrary the task of composition of cover
# texts mainly to be one of the user, i.e. the user writes via trial and error
# the words -- dynamically depending on certain system error messages -- into
# his cover text such that these words carry the desired stegobits and the whole
# text thus resulting is nonetheless sufficiently natural, anyway according to
# the judgements of the user.
#
# [1] J. Fridrich, Steganography in Digital Media. Cambridge, 2010.
# [2] R. Boehme, Advanced Statistical Steganalysis. Berlin, 2010.

# Basically our idea is the following: In common texts any sentence may have
# well acceptable alternative formulations such that two alternative sentences
# may differ: (1) in the words being used and (2) in the positions of
# occurrences of identical words in the sentences. We divide a chosen extensive
# word list into two sublists (the words entering the sublists are
# pseudo-randomly chosen via a PRNG seeded with dynamic session key materials).
# Words in wordlist0 carry the stegobit 0 and words in wordlist1 carry the
# stegobit 1. Words outside of these two lists carry no stegobits. At any time
# in the processing, when a stegobit is to be embedded but the user happens to
# write a word that carries the wrong stegobit, the system issues an error
# message so that the user will try to use instead an appropriate alternative
# word to avoid that error or, if that's not successful after some efforts,
# back up, i.e. cancel one or more words that have earlier been written and
# attempt thus to better reach the goal of complete embedding of all given
# stegobits in the resulting final cover text.

# Certainly there is the universal "principle of no free lunch". The trade-off
# of thus embedding stegobits into a cover text that is to be as natural as
# possible is naturally some amount of hard work in composing the text and the
# corresponding time resource involved, while in comparison EMAILSTEGANO of
# the present author runs automatically. As can be plausibly shown by an example
# run to be done by the user further below, this task should anyway be
# practically feasible, though several factors apparently may influence the
# amount of work being done: the size and/or content of the word list used, the
# content of the excludewordlist (see next paragraph), the chosen theme of the
# cover text, the amount of user's experience with this software and the user's
# proficiency of the language.

# In the code below there is a list named excludewordlist, which contains words
# that are to be excluded from the given word list such that the probability of
# encountering impasse in composition of the cover texts could be greatly
# reduced. Members of this list have been chosen with the limited experience
# sofar obtained by the present author in a small number of test runs and hence
# there would conceivably be a need to essentially modify that list in the
# future in order to enable better performance in text composition. It is hoped
# that users will attempt to do their own modifications of the said list as they
# with time gain more experience with the software and will kindly inform the
# present author of the best modifications they have done such that in a future
# new version of this software all other users will be profited from their
# reports.

# We assume that the warden (adversary) is passive, i.e. the messages of the
# sender will not be manipulated and further there are no technical transmission
# errors.

# A good thesaurus, e.g. Webster's New World Thesaurus, could be very helpful in
# composing the cover texts.

# It appears to be without problems in practice to satisfactorily achieve a
# stegobit embedding rate in the range of [0.5, 1.0] per word of cover texts.
# This is supported by an example given further below, which also shows that
# alternative cover texts of entirely different themes can be written to
# satisfactorily carry the same stegobit sequence.

# To run this software, besides the IDLE of Python, one needs an extensive
# English word list, which we recommend to be the content of wordsEn.txt in:
# http://www-01.sil.org/linguistics/wordlists/english/

# In order to familiarize the users with the functionalities of this software,
# a detailed description of some example runs is given further below. It is
# recommended that the users complete these exercises before employing their
# own chosen sessionkey materials and values of the parameter lstegobytes to
# actually communicate with each other, with the protection of their freedom of
# privacy hopefully satisfactorily enhanced by the present software.


# Version 2.1, released on 13.04.2017.

# Update notes:

# Version 1.0: Released on 28.11.2016.

# Version 1.0.1: 17.12.2016: Addition of a function to check the uniqueness of
# entries of the word list file in case the user for the first time employs a
# word list file that is different from the one we recommend above and the
# uniqueness of entries of the file is unknown.

# Version 1.0.2: 20.12.2016: checkwordlistfile() is now more efficient.

# Version 2.0: 28.03.2017: (1) Addition of software installation instructions.
# (2) Earlier versions required the sender to store the stegobytes into a file
# before proper processing work can be done. Now the stegobytes are entered 
# online and there is no information of the stegobytes left behind on the
# computer when the session is closed. (3) In previous versions, at the 
# beginning of the details of example runs, for the execution of the statements:
# by=bytearray([103,175])
# writebinaryfile(by,"stegobytes")
# in the Python shell window, it was necessary to first exit from the GUI
# window. A sentence for that was unfortunately missing but that's no longer
# relevant in Version 2.0, since the stego bytes are entered online. (4) Code
# lines of checkwordlistfile() are now merged into buildwordlists().

# Version 2.1: 13.04.2017: Version 2.0 ran ok for Python Version 3.5. However,
# for the newest Python Version 3.6.1 the tkinter function tkinter.message()
# must be written as such and it is not accepted to simply write message() (even
# though under "from tkinter import *").


# Code lines of documents with the same version number are always identical.
# There may be interim modifications of comment lines. 


# This software may be freely used:

# 1. for all personal purposes unconditionally and

# 2. for all other purposes under the condition that its name, version number 
#    and authorship are explicitly mentioned and that the author is informed of
#    all eventual code modifications done.


# This software is dedicated to secure communications between activists of
# non-democratic countries and their helpers.


# A list of present author's software that are currently directly maintained by
# himself is available at http://mok-kong-shen.de. Users are advised to
# download such software from that home page only.


# The author is indebted to TPS and CWL for review and suggestions throughout
# WORDLISTTEXTSTEGANOGRAPHY's development phase. Any remaining deficiencies of
# the software are however the sole responsibility of the author.


# Constructive critiques, comments and suggestions of extensions and
# improvements are sincerely solicited. Address of the author:
#
# Email: mok-kong.shen@t-online.de
#
# Mail: Mok-Kong Shen, Postfach 340238, Munich 80099, Germany
#
# (Sender data optional, if no reply is required.)



################################################################################



import random


# Output string, a text string, to a file. File name is the string of the
# formal parameter filename extended by ".txt".

def writestringfile(filename,string):
  f=open(filename+".txt","w")
  f.write(string)
  f.close()
  return


# The inverse of writestringfile().

def readstringfile(filename):
  f=open(filename+".txt","r")
  string=f.read()
  f.close()
  return(string)


# Write a byte sequence to a binary file.

def writebinaryfile(byarray,filename):
  fp=open(filename+".bin","wb")
  fp.write(byarray)
  fp.close()
  return


# The inverse of writebinaryfile().

def readbinaryfile(filename):
  fp=open(filename+".bin","rb")
  byarray=bytearray(fp.read())
  return(byarray)


# Read a word list from a file to wlist.

def readtextfiletowlist(filename):
  f=open(filename+".txt","r")
  wlist=f.read().split()
  f.close()
  return(wlist)


# Write wlist to a word list file. (Not used, for completeness only.)

def writewlisttotextfile(wlist,filename):
  str=" ".join(wlist)
  f=open(filename+".txt","w")
  f.write(str)
  f.close()
  return


# Find index of word in wordlist. We use a binary search, since wordlist is in
# strictly ascending order. (If word is not in wordlist, a negative value is
# returned which, after negation, gives the location of wordlist, where the
# given word could have had its place.)

def findindex(wordlist,lwordlist,word):
  low=0
  high=lwordlist-1
  if word<=wordlist[low]:
    if word==wordlist[low]:
      return(low)
    else:
      return(-1)
  elif word>=wordlist[high]:
    if word==wordlist[high]:
      return(high)
    else:
      return(-high)
  while high>low+1:
    mean=(high+low)//2
    wlmean=wordlist[mean]
    if word==wlmean:
      return(mean)
    elif word>wlmean:
      low=mean
    else:
      high=mean
  if mean==0:
    return(-1)
  else:
    return(-mean)
  
    
# Users may profitably enlarge includewordlist, if the cover texts often employ
# terms of certain special fields of knowledge that are not in the word list
# used. This list need not be in sorting order.

includewordlist=[]


# Words to be excluded from wordlist, need not be in sorting order. See
# Prologue.

excludewordlist=\
['a','after','am','an','and','are','as','at',
 'be','being','but','by',
 'can','could',
 'do','did','does','doing','done',
 'each','else',
 'for','from',
 'he','her',
 'going',
 'he','how',
 'i','if','in','into','is',
 'me','my',
 'not',
 'of','on','one','only','other','or','our','out',
 'shall','she','should',
 'than','that','this','the','their','then','they','to',
 'you','your',
 'was','were','what','when','where','which','who','whom','whose',
 'why','will','with','would']
 

def readwordlistfile(wordlistfilename):
  wordlist=readtextfiletowlist(wordlistfilename)
  lwordlist=len(wordlist)
  for i in range(lwordlist):
    wordlist[i]=wordlist[i].lower()
  wordlist.sort()
  for i in range(0,lwordlist-1):
    if wordlist[i]==wordlist[i+1]:
      print("Error: word list file",wordlistfilename,\
            "has doubled entries.   ###############")
      exit(111)  
  return(wordlist)


# Building wordlists.

def buildwordlists():
  global wordlist,wordlist0,wordlist1
  global lwordlist,lwordlist0,lwordlist1
  global includewordlist,excludewordlist
  global wordlistfilename
  excludewordlist1=['#','\n','(','.',',',';','!','?',':','-',')']
# The entries in the file with wordlistfilename must be unique, otherwise the
# program will be aborted. For optimal performance these entries should be
# sorted. (For example, WordsEn.txt is sorted.)
  wordlist=readtextfiletowlist(wordlistfilename)
  lwordlist=len(wordlist)
  for i in range(lwordlist):
    wordlist[i]=wordlist[i].lower()
  wordlist.sort()
  for i in range(0,lwordlist-1):
    if wordlist[i]==wordlist[i+1]:
      print("Error: word list file has doubled entries", wordlist[i],\
            ". ###############")
      exit(222)
  for w in includewordlist:
    if w not in wordlist:
      wordlist.append(w)      
  for w in excludewordlist+excludewordlist1:
    w=w.lower()
    if w in wordlist:
      wordlist.remove(w)
  wordlist0=[]
  wordlist1=[]  
  for w in wordlist:
    if random.getrandbits(1)==0:
      wordlist0.append(w)
    else:
      wordlist1.append(w)
  lwordlist=len(wordlist)
  lwordlist0=len(wordlist0)
  lwordlist1=len(wordlist1)
  return


################################################################################


# Specification of the GUI.


from tkinter import *
import tkinter.scrolledtext
import tkinter.messagebox


def command1():
  global userkind
  global stegobytes,lstegobytes
  global stegobits,lstegobits
  global sendermode
  if userkind!=1:
    print("Error: Receiver should not use this button")
    tkinter.messagebox.showerror("",
                                 "Error: Receiver should not use this button")
    return
  sendermode=1
  T1.delete('1.0',END)
  T2.delete('1.0',END)
  suc=0
  while suc==0:
    print(lstegobytes,"stegobyte values will be needed:")
    blist=[]
    suc1=1
    for i in range(lstegobytes):
      u=input("Input an integer in [0,255]: ")
      u=int(u)
      blist.append(u)
      if u>255:
        suc1=0
    if suc1==0:
      continue
    print(blist)
    yesno=input("Is the above list of values correct? Answer: 1=Yes, 0=No :")
    if yesno=="1":
      stegobytes=bytearray(blist)
      suc=1      
  stegobits=[]
  for i in range(lstegobytes):
    u=stegobytes[i]
    bits=[]
    for i in range(8):
      bits.append(u&1)
      u>>=1
    bits.reverse()
    stegobits+=bits
  lstegobits=lstegobytes*8
  print("Input stegobytes done")
  return


def command2():
  global userkind
  global sendermode
  if userkind!=1:
    print("Error: Receiver should not use this button")
    tkinter.messagebox.showerror("",
                                 "Error: Receiver should not use this button")
    return
  if sendermode==0:
    tkinter.messagebox.showerror("","Press first Input stegpbytes button")
    return
  bb=tkinter.messagebox.askyesno("","Read covertext.txt into user text "\
                                 "editing area, ok?")
  if bb:
    uu=readstringfile("covertext")
    T1.delete('1.0',END)
    T1.insert('1.0',uu)   
    print("Read covertext.txt done")
    checkinput()
  return


def command3():
  global userkind
  global sendermode
  if userkind!=1:
    print("Error: Receiver should not use this button")
    tkinter.messagebox.showerror("",
                                 "Error: Receiver should not use this button")
    return
  global textok
  if sendermode==0:
    tkinter.messagebox.showerror("","Press first Input stegpbytes button")
    return
  checkinput()
  if textok:
    by=extraction()
    if stegobytes!=by:
      tkinter.messagebox.showerror("","System error   ###############")
      print("System error   ###############")
      exit(333)
  else:
    tkinter.messagebox.showwarning("",
          "Embedding of stegobits not yet satisfactory   ###############")
    print("Embedding of stegobits not yet satisfactory   ###############")    
  bb=tkinter.messagebox.askyesno("","Write user text editing area to "\
       "covertext.txt, ok?")
  if bb:
    uu=T1.get('1.0',END+'-1c')
    writestringfile("covertext",uu)
    print("Write covertext.txt done")
  return


# Extraction from text in the user text editing area the stegobits.

def extraction():
  global wordlist,wordlist0,wordlist1
  global lwordlist,lwordlist0,lwordlist1
  global stegobytes,lstegobytes
  lstegobits=lstegobytes*8
  uu=T1.get('1.0',END+'-1c')
  tt=uu.replace("\n"," # ")
  wlist=tt.split()
  swlist=[]
  for w in wlist:
    sw=w.lower()
    if sw[0]=='(':
      sw=sw[1:]
    while sw[-1] in ['.',',',';','!','?',':','-',')']:
      sw=sw[:-1]
      if sw=='':
        break
    if sw!='':
      swlist.append(sw)
  stegobitsrecovered=[]  
  nbits=0
  for i in range(len(swlist)):
    sw=swlist[i]
    if findindex(wordlist,lwordlist,sw)>=0:
      if findindex(wordlist0,lwordlist0,sw)>=0:
        stegobitsrecovered.append(0)
      else:
        stegobitsrecovered.append(1)
      nbits+=1
      if nbits==lstegobits:
        break
  if nbits < lstegobits:
    tkinter.messagebox.showerror("",
          "Error: wrong/manipulated file materials and/or parameters"\
          "   ###############")                  
    print("Error: wrong/manipulated file materials and/or parameters"\
          "   ###############")
    exit(444)
  k=0
  k1=8
  by=bytearray(0)
  for i in range(lstegobytes):
    u=0
    for i in range(k,k1):
      u<<=1
      u|=stegobitsrecovered[i]
    by+=bytearray([u])
    k=k1
    k1=k+8  
  return(by)


def command4():
  bb=tkinter.messagebox.askyesno("","Read covertext.txt into user text "\
       "editing area, ok?")
  if bb:
    uu=readstringfile("covertext")
    T1.delete('1.0',END)
    T1.insert('1.0',uu)
    T2.delete('1.0',END)
    print("Read covertext.txt done")
  else:
    return
  by=extraction()
  bb=tkinter.messagebox.askyesno("","Write stegobytesrecovered.bin, ok?\n") 
  if bb:
    writebinaryfile(by,"stegobytesrecovered")
    print("Write stegobytesrecovered.bin done")
  return


# Check the embedding of given stego bits with the text in the user text editing
# area.

def checkinput():
  global wordlist,wordlist0,wordlist1
  global lwordlist,lwordlist0,lwordlist1
  global stegobits,lstegobits
  global textok
  textok=False
  uu=T1.get('1.0',END+'-1c')
  if uu.count("#")!=0:
    tkinter.messagebox.showerror("","Remove # which is not permitted")
  tt=uu.replace("\n"," # ")
  wlist=tt.split()
  swlist=[]
  for w in wlist:
    sw=w.lower()
    if sw[0]=='(':
      sw=sw[1:]
    while sw[-1] in ['.',',',';','!','?',':','-',')']:
      sw=sw[:-1]
      if sw=='':
        break
    if sw!='':
      swlist.append(sw)
  stbp=0
  numberr=0
  wordcount=0
  textwordcount=0
  for i in range(len(swlist)):
    sw=swlist[i]
    if sw!="#":
      textwordcount+=1
    if findindex(wordlist,lwordlist,sw)>=0:
      wordcount+=1
      if stbp < lstegobits:
        stb=stegobits[stbp]
        if (stb==0 and findindex(wordlist0,lwordlist0,sw)<0) or\
           (stb==1 and findindex(wordlist1,lwordlist1,sw)<0):
          swlist[i]=sw.upper()
          numberr+=1
        stbp+=1
  lswlist=len(swlist)
  for i in range(lswlist):
    if swlist[i]=="#":
      swlist[i]="\n"
  remainingbits=lstegobits-stbp     
  vv=' '+' '.join(swlist)+"\n\nstegobits embedded: "+str(stbp)+\
     "   number of errors: "+str(numberr)+"   remaining stego bits: "+\
     str(remainingbits)
  if numberr!=0:
    vv+="\n\nAttempt first to correct in user text editing area the first "\
        "error indicated\nhere by a word in all capital letters via revising "\
        "that word and/or words in\nits neighbourhood   ###############\n"
  vv+="\ntext wordcount: "+str(textwordcount)
  if remainingbits==0 and numberr==0:
    vv+="   longer text not necessary   ###############"\
        "\n(cover text may be terminated now and file written and sent "\
        "to the receiver)"
    textok=True
  else:
    textok=False
  T2.delete('1.0',END)
  T2.insert('1.0',vv)
  return


def keyreleasefunc1(event):
  global sendermode
  if sendermode==0:
    tkinter.messagebox.showerror("","Press first Read stegpbytes button")
    return
  checkinput()
  return


def keyreleasefunc2(event):
  global sendermode
  tkinter.messagebox.showerror("",
    "User should not use the system message area")
  if sendermode==1:
    checkinput()
  else:
    T2.delete(1.0,END)
  return


# Dimensions of the GUI.

t1height=12
t2height=16
width=80
bwidth=22

# A state variable.

sendermode=0

# Definition of the diverse GUI components.

root=Tk()
root.title("WORDLISTTEXTSTEGANOGRAPHY")
Label(root,text="User Text Editing Area (either direct input or "\
                "load from file covertext.txt via button)").pack()

T1=tkinter.scrolledtext.ScrolledText(root,height=t1height,width=width)
T1.pack()
T1.bind("<""KeyRelease>",keyreleasefunc1)

Label(root,text="System Message Area (no editing)").pack()

T2=tkinter.scrolledtext.ScrolledText(root,height=t2height,width=width)
T2.pack()
T2.bind("<""KeyRelease>",keyreleasefunc2)

T3=Frame(root)
T3.pack()

Button(T3,text="Input stegobytes",bg="yellow",width=bwidth,\
       command=command1).pack(side=LEFT)
Button(T3,text="Read covertext.txt",width=bwidth,command=command2).\
       pack(side=LEFT)
Button(T3,text="Write covertext.txt",width=bwidth,command=command3).\
       pack(side=LEFT)
Button(T3,text="Gen stegobytesrecovered.bin",bg="green",width=bwidth,\
       command=command4).pack(side=LEFT)


################################################################################


# Compute from sessionkeyextension an integer v and concatenate it to secretkey
# to form sessionkey. See Epilogue.

def formsessionkey():
  global secretkey,sessionkeyextension
  global sessionkey
  assert type(secretkey)==type(0) and type(sessionkeyextension)==type("a")
  by=bytearray(sessionkeyextension,'latin-1')
  lby=len(by)
  assert lby > 0
  v=0
  for i in range(lby):
    v<<=8
    v|=by[i]
  sessionkey=(secretkey<<(lby*8))|v
  return


# Initialization.

def initlinguisticsteganography():
  global stegobytes,lstegobytes
  global sessionkey
  formsessionkey()
# Initialize Python's built-in PRNG.
  random.seed(sessionkey)
# Build wordlist, wordlist0 and wordlist1.
  buildwordlists()
  return


# Optional check of by the sender of the correctness of processing of our
# software (after having composed a covertext without the system reporting
# errors and having then written out the covertext to covertext.bin) by pressing
# the rightmost button, then terminating the GUI and invoking this function
# in the Python shell window.

def checkimplementation():
  global userkind
  global sendermode
  global stegobytes,lstegobytes
  assert userkind==1
  if sendermode==0:
    print("Error: No processing has been done")
    return
  by1=readbinaryfile("stegobytesrecovered")
  if stegobytes==by1:
    print("Implementation ok")
  else:
    print("Implementation wrong or there were user mistakes (e.g. editing "\
          "of cover text\nwas yet premature when the file covertext.txt "
          "being used was written)\n###############")
    print("Please report to the author in the first case")
  return



################################################################################



# An example explaining the use of the software to be tried out:


# Parameters for forming a session-dependent seed for Python's built-in PRNG.
# secretkey is an integer in hexadecimal or decimal format, sessionkeyextension
# is a text string. See Epilogue.

secretkey=0x4a98717c70e997c2715e87c3fa1ef559

sessionkeyextension="RST 28.11.2016 DC348" 


# lstegobytes is a secret parameter of communication of the session specifying
# the byte length of stego information in the cover text and like the key
# materials must be identical for both sender and receiver. lstegobytes can
# be either a constant or a session variable (with its value indicated via
# e.g. the presence of an agreed-upon keyword in the cover text, or transmitted
# through other channels).

lstegobytes=2


# userkind=1: User is sender.
# userkind=2: User is receiver.

userkind=1


# We use the words in the file wordsEn.txt.

wordlistfilename="wordsEn"


# Initialization.

initlinguisticsteganography()


# Run GUI.

root.mainloop()



################################################################################



# Example runs to become familiar with the functionality of this software:


# To explain how the sender works with this software:
#
# In the specification of session parameters above, set userkind=1.
#
# Start up Python, load and run the present program. The main GUI window pops
# up. (In the following one always clicks on the "yes" button when a small
# window pops up, for that's the right decision for these example runs.)
#
# First push the "Input stegobytes" button, then in the Python shell window type 
# in the values of the stegobytes 103 and 175 (for our examples). Then in the 
# User Text Editing Area of the main GUI window type in the following text which 
# should finally result in a system message "longer text not necessary":
#
# I am arriving Zurich on Wednesday. Could you ask your acquaintance there to
# reserve for me in a good pension near the University a room? I'll remain in
# Zurich after hearing the lectures till the following Tuesday.
#
# Now push the "Write covertext.txt" button to write out that text to a file
# covertext.txt in the directory.
#
# Next try the "Read covertext.txt" button. This should in our case change
# nothing of the main GUI window, since the same text is again read in.
#
# Next push the "Gen stegobytesrecovered.bin" button to have the stegobytes be
# extracted from the cover text (in the file covertext.txt) and stored them
# into the file stegobytesrecovered.bin. (In application runs the sender doesn't
# need to do this step, if he is convinced from experience of the correctness of
# our implementation.)
#
# Next exit the main GUI window by clicking the X button on its upper right
# corner.
#
# Now in the existing Python shell window type: checkimplementation(), which 
# should result in "Implementation ok", testifying that the contents of 
# stegobytesrecovered.bin is identical to the stegobytes that the sender has
# input. (Note, though, that in this case the stego bytes of the sender is thus
# available as a file in his directory which may need to be deleted for security
# reasons.)
#
# Finally terminate the Python run by clicking on File --> Exit.
#
# We have in the above employed a cover text (for the stegobytes chosen above)
# that is the result of trial and error done by the present author via
# appropriately chosen the words in the user text editing area such that the
# cover text is of sufficiently good quality, i.e. looking quite natural and
# innocuous, yet with all stegobits successfully embedded. As indicated in
# Prologue above, this is however as a rule unfortunately not a very trivial
# task when applying this software, at least before one has gained certain
# experiences with it. So the user is advised now to think of an arbitrary
# theme and attempt to compose a corresponding cover text for the same
# stegobytes and using the same sessionkey materials as the present author. For
# that purpose one proceeds as follows:
#
# Start up Python again, load and run the present program. The main GUI window
# pops up. Push the "Input stegobytes" button, input in the Python shell window
# the same stegobytes as before and start to compose in the user text editing
# area of GUI a cover text of the user's own while following the advices
# appearing in the system message area, where all words that are shown in upper
# case letters indicate those corresponding words in the user text editing area
# that don't carry the desired stegobits and it is thus the task of the user to
# compose his text such that no corresponding words appear in the system
# message area with all upper case characters. With enough patience success is
# practically almost surely guaranteed. One continues writing till "longer text
# not necessary" appears in the system message area, after which one could write
# additionally any arbitrary stuff, if desired, and finish by pushing the
# "write covertext.txt" button and then exit the GUI by clicking the X on its
# right corner and then terminate the Python run. File covertext.txt is now
# ready for being sent to the receiver.
#
# In case one has to pause before being able to finish the work, push the
# "Write covertext.txt" button to store the text so far being written. Later in
# a new Python session one can then, after having input the stegobytes, push
# the "Read covertext.txt" button, after which the text written in the earlier
# session should reappear in the user text editing area to be further processed,
# i.e. covertext.txt is in this case a temporary file for editing.
#
# To demonstrate that the composition of a cover text isn't a hopelessly hard
# task, the following is an alternative cover text that can be tried (for the
# same stegobytes and the same sessionkey materials):
#
# Today my earlier colleague Petersen called and wanted to be informed of
# possibly existing linguistic methods to hide informations in cover texts with
# acceptable value of embedding efficiency.


# To explain how the receiver works with this software (note that he has
# obtained from the sender the file covertext.txt, which due to our previous
# example run is now already in our directory):
#
# In the specification of session parameters above, set userkind=2.
#
# Start up Python, load and run the current program. The main GUI window pops
# up. Push the "Gen stegobytesrecovered.bin" button to generate and write out
# a file stegobytesrecovered.bin in the directory, whose content should be 
# identical to the stegobytes of the sender. (cf. the verification done by the
# sender above.) Next exit the main GUI window by clicking the X button on its
# upper right corner. Finally terminate the Python run by clicking on
# File --> Exit.

# For users' real application runs (with their own agreed upon sessionkey
# materials and vaules of lstegobytes):
#
# The sender needs to use only: the "Input stegobytes" and the "Write
# covertext.txt" button, if he can finish composing cover text in one session,
# or additionally the "Read covertext.txt" button, if he requires more than one
# session to finish the work.
#
# The receiver needs to use only the "Gen stegobytesrecovered.bin" button.



################################################################################



# Installation of the software.

# (1) In http://www-01.sil.org/linguistics/wordlists/english/ get the file
# wordsEn.txt.

# (2) Both communication partners have to download the same version 3x of Python
# from http://www.python.org. (Employing the same version of Python ensures
# against any potentially possible incompatibilities among different versions.)
# The present code can be stored in a file named e.g. wordliststego.py and the
# examples given further below run in Python's GUI IDLE. (File --> Open to find
# and open the file, then in the window showing the code Run --> Run Module to
# run it. One could also type wordliststego.py in a DOS-window.) Modifications
# of the code in the code window, e.g. the plaintext string, can be done online
# and the code re-run.

# (3) Note that the currently newest version of Python is Version 3.6.1, which
# should be used for Version 2.1 of this software.



################################################################################



# Epilogue.


# By its very nature, good linguistic steganography (i.e. with cover texts that
# appear to be entirely natural) cannot achieve very high stegobit embedding
# rates. The present software is aimed to be useful for its targeted users (the
# common people) who, though sharing with their partners secret keys, are (due
# to surveillance by the wardens) unable to communicate via apparently encrypted
# materials (this is the case in particular for activists in non-democratic
# countries) and are consequently forced to accept the inconvenience resulting
# from certain rather low values of (secret) channel bandwidth. Our scheme
# achieves a stegobit embedding rate anyhow in the range of [0.5, 1.0] bit per
# word of cover texts, which should be practically acceptable when sending
# emails that are to transmit e.g. some 20 stegobytes each. It may be noted
# that, exploiting pseudo-randomness of outputs of a PRNG seeded by a dynamic
# session-dependent secret key, the present scheme is not simply a 
# steganographical one but is in fact providing secret protection via both
# cryptography and steganography.

# In the way personally preferred by the present author, the parameter
# "sessionkey" employed to seed Python's PRNG at initialization time consists of
# two components: "secretkey", which should have sufficient entropy, e.g.
# stemming from dice throws, and which could be used for a certain longer time
# period before being renewed, and "sessionkeyextension", which is dynamic, i.e.
# different for different sessions and which need not be secret (since it is its
# variability that is important in the present context), being composed of
# certain variable session-dependent data, e.g. date, time, subject, message
# serial number, etc. In case there is no systematic scheme of derivation of
# "sessionkeyextension" to be used by the receiver, it can be sent in the clear
# to him. In view of the high dynamics of our sessionkey and the comparatively 
# low volumes of communications of our targeted users and the fact that the PRNs
# are not directly used to process (e.g. via xor) the plaintext but indirectly
# to treat it via certain pseudo-random operations, our use of Python's
# built-in PRNG is obviously entirely secure in practice. For a utility to 
# convert dice throws to a hexadecimal sequence, see author's DICE.

# Note that the sessionkey, which in this scheme is a secret for longer time
# use, could be transported via, if necessary, the lower stego bitrate scheme
# EMAILSTEGANO of the author.

# In the example runs we let the sessionkey, sessionkeyextention and lstegobytes
# be kept the same as in the state just after download of the software. These
# are of course to be properly (dynamically) modified by the users in sessions
# of their real applications.

# It is assumed that the user's cover text input is in normal English and that
# punctuations are limited to:  '(', '.', ',', ';', '!', '?', ':', '-', ')'

# It is sender's responsibility to ensure that the cover text sent to the
# receiver is one that, immediately before it is written to covertext.txt, the
# system message window indicates that the stego bits embedding is ok and that
# the receiver will use in the corresponding session the same sessionkey,
# sessionkeyextension and lstegobytes.

# Since the sublists wordlist0 and wordlist1 are pseudo-randomly determined
# (dependent on session key materials), there should exist some sufficient
# security protection of the embedded stegbits in case the amount of stegobits
# is not too large and the session key materials are appropriately chosen.
# (Our targeted users, the common people, certainly don't have huge amounts of
# stegobytes to transmit in practice.)

# Sender and receiver should employ the same version of Python (and of this
# software, of course), the same word list file, the same includewordlist and
# the same excludewordlist. (The example runs are done with Python 3.5.)

# Python can be downloaded from http://www.python.org. The present code can be
# stored in a file named e.g. steganography.py and run in Python's GUI IDLE.
# (File --> Open to find and open the file, then in the window showing the code
# Run --> Run Module to run it. On later executions, the file can be found with
# File --> Recent Files.) The code may be modified, if desired, in the code
# window and saved before being executed, but the version before modification
# will thus be gone.

# It may be remarked that in the not infrequent special case of top-secret
# communications, where the messages consist only of certain stereotyped phrases
# or sentences, one could advantageously do the following: Compile a codebook of
# size 2**n (e.g. n=8) with indices in [0, 2**n-1] for all the possibly to be
# used phrases and sentences. In a given session, use a PRNG to pseudo-randomly
# map the values of [0, 2**n-1] via a permutation polynomial mod 2**n to indices
# to be used in the session and compose the message as a concatenation of the
# bits of the session indices of the codebook of the corresponding phrases and
# sentences. (Sender and receiver do the same, thus can use the same coodbook.)
# This way, the result to be processed by the stego software would be only a
# fairly small number of bytes in size. Another example of small stego materials
# to be processed is the output of a block cipher in cases where the input to
# the block cipher consists of one or two blocks only, e.g. the input is a new
# key to be used by the communication partners.

# The freedom to compose the cover texts under the guidance of this software
# tends naturally to increase with the size of an appropriately chosen
# excludewordlist but the trade-off is a lower stegobit embedding rate. As
# mentioned earlier, our current excludewordlist is a preliminary one and
# certainly not the best one. Users attempting optimization of the process of
# composition of cover texts through modification of that list may find the fact
# encouraging that the English vocabulary could in a certain sense be "reduced"
# to a set of 360 words only (see
# http://learnthesewordsfirst.com/about/research-behind-the-dictionary.html),
# which implies in particular that the size of the optimal excludewordlist would
# need to be of very tiny size after all.

# There is a general caveat concerning the cover texts: A mighty adversary may
# be able to check whether everything written in them is indeed true.

# Our scheme obviously may also be applied to other languages (eventually with
# certain minor modifications), provided that a word list of quality comparable
# to that of wordsEn.txt is available, i.e. one containing most words of common
# use with (excepting upper/lower case) all the diverse grammatical forms of
# the stem words. Experts in linguistics, in particular those engaged in machine
# translation, may help to supply useful informations in this respect. For 
# Chinese, more adaptation work of the processing code may be required, but 
# a good word list would need at most 10 K entries in view of the existence of
# the in pre-computer era widely used telegraphic codebook.

# It is feasible to introduce special includewordlists for obtaining higher 
# stegobit embedding rates under favourable circumstances. For example, a
# list of 16 (or more) first names of persons can be specified therewith to code
# the 16 different values of 4 bits. However, since such lists would result in
# additional, though tiny, coding logic that the general users will have to 
# examine and since that feature can be easily coded and added in by those
# desiring it, we content ourselves with this remark.

# Note that the cover texts are not required to be of the highest quality from
# the language point of view but instead should not differentiate from texts
# that the user would otherwise compose without steganography so as to avoid
# notice of the warden. Thus not having English as one's mother tongue may even
# be a slight advantage in the present context.

# Steganography and cryptography are by nature closely related to some extent.
# Hence we recommend users of this software to also take a look at the Epilogue
# of another software of the present author: PROVABLEPRIME, which contains some 
# comments of general interests.

# For information uaers could perform a test of Python's built-in PRNG using the
# code given e.g. in author's PERMPOLYSP.

# We presume that the computer, on which this software is run, is free from
# malware infection via software and/or hardware means and that there are no
# emission security risks (which could be manifold in practical situations).



Return to main page