EnglishFrenchGermanItalianPortugueseRussianSpanish
Home arrow Script Collection arrow Form extractor from word to Excel
Form extractor from word to Excel PDF Print E-mail
I had a bunch of filled out word documents with word forms in them and neded them in Excel. initially i tried CSV but it didn't play nice with encodings. So I decided to write directly to XLS.
 
"""
Copyright 2009 Konrads Smelkovs <konrads [at] smelkovs.com>
UTF8Recorder and UnicodeWriter come from python docs
"""
 
import sys,os,csv
import win32com.client
import pywintypes
 
 
class ExcelWriter(object):
    def __init__(self,excelfile):
        self.excelapp=win32com.client.DispatchEx('Excel.Application')
        self.excelapp.Visible=0
        self.excelapp.Application.AskToUpdateLinks=0
        self.workbook=self.excelapp.Workbooks.Add()
        os.unlink(excelfile) #TODO: remove for release
        self.workbook.SaveAs(excelfile)
        # Only worksheet 1 is used.
        self.worksheet=self.workbook.Worksheets.Item(1)
        self.currentrow=1
 
    def _getrow(self,row):
        """Convert integer row index to Alphabetical:
        1 -> A
        2 -> B
        ...
        """
        if row<27:
            return chr(ord('A')-1 + row)
        else:
            first=row / 26
            return chr(ord('A')-1 + first) +  chr(ord('A')-1 + row % 26)
        
    def __del__(self):
        self.workbook.Save()
        self.workbook.Close()
        self.excelapp.Quit()
 
    def writerow(self,data):
        for col in xrange(1,len(data)+1):
            range=self._getrow(col)+str(self.currentrow)
            print >>sys.stderr,"Range: %s"  % range
            cell=self.worksheet.Range(range)
            cell.Value=data[col-1]
        self.currentrow+=1
        
def main():
 if len(sys.argv)<3:
    print "Usage: %s <directory> <outfile.csv>" % sys.argv[0]
    print "Where <directory> - directory containing word docs with forms"
    print "and <outfile.csv> - file where to put results"
    sys.exit(-1)
 directory=os.path.abspath(sys.argv[1])
 wordapp = win32com.client.Dispatch("Word.Application")
 wordapp.Visible=0 # Hide word app
 results=[]
 for docfile in os.listdir(directory):
     thisdocresults=[]
     if docfile.endswith(".doc") or docfile.endswith(".docx"):
         print >> sys.stderr, "Processing %s" % docfile
         worddoc=wordapp.Documents.Open(os.path.join(directory,docfile))
         for i in range(1,worddoc.FormFields.Count+1):
            try:
                form=worddoc.FormFields.Item(i)
                name=form.Name
                value=form.Result
                thisdocresults.append((name,value))
                try:
                    print >>sys.stderr, "%s: %s" % (name,value)
                except UnicodeEncodeError,e:
                    print >>sys.stderr, "Error decoding charset,%s" % e
            except pywintypes.com_error,e:
                print >>sys.stderr, "Exception: %s" % str(e)
         results.append(thisdocresults)
         worddoc.Close()
 wordapp.Quit()
 writer=ExcelWriter(os.path.abspath(sys.argv[2]))
 print >>sys.stderr,"Writing to Excel"
 for docres in results:
     data=[]
     for (n,v) in docres:
         data.append(v)
     writer.writerow(data)
 
if __name__=="__main__":
    main()



Be first to comment this article
RSS comments

Write Comment
  • Please keep the topic of messages relevant to the subject of the article.
  • Personal verbal attacks will be deleted.
  • Please don't use comments to plug your web site. Such material will be removed.
  • Just ensure to *Refresh* your browser for a new security code to be displayed prior to clicking on the 'Send' button.
  • Keep in mind that the above process only applies if you simply entered the wrong security code.
Name:
E-mail
Comment:

Code:* Code
I wish to be contacted by email regarding additional comments

Last Updated ( Thursday, 18 March 2010 )
 
Next >

Post your scripts

If you have interesting scripts to share with the community, please post them to this site. Hint: For syntax highlighting and correct Python intendation place your code between html tags <pre> and </pre>.

Suggested

RSS category feeds

RSS site feeds

My prefered Python IDE

My prefered Python editor is Pyscripter from MMExperts. It is not only an editor. Pyscripter is a full Python IDE including (remote) debugging, a class browser, and all other nice helpers which a full featured IDE needs.

Do you have a script for me ?

Do you have an interesting Python script which does some really cool thing on Windows ? Please post them to this site. It`s very simple - simply copy&paste it to this form. No login is requiered.

Hint: For syntax highlighting and correct Python intendation place your code between html tags <pre> and </pre>.

My prefered web framework

My prefered web framework for developing web applications is Django. Django calls itself The web framework for perfectionists with deadlines. It is a really fast, scalable and (thanks Python) the sexiest web framework of the world.