RSS category feeds

RSS site feeds


Grab text or source from HTML pages PDF Print E-mail
import win32com.client
from time import sleep
def download_url_with_ie(url):
    Given a url, it starts IE, loads the page, gets the HTML.
    Works only in Win32 with Python Win32com extensions enabled.
    Needs IE. Why? If you’re forced to work with Brain-dead 
    closed sourceapplications that go to tremendous length to deliver
    output specific to browsers; and the application has no interface
    other than a browser; and you want get data into a CSV or XML
    for further analysis;
    Note: IE internally formats all HTML to stupid mixed-case, no-
    quotes-around-attributes syntax. So if you are planning to parse
    the data, make sure you study the output of this function rather
    than looking at View-source alone.
    #if you are calling this function in a loop, it is more
    #efficient to open ie once at the beginning, outside this
    #function and then use the same instance to go to url’s
    ie = win32com.client.Dispatch("InternetExplorer.Application")
    ie.Visible = 1 #make this 0, if you want to hide IE window
    #IE started
    #it takes a little while for page to load. sometimes takes 5 sec.
    if ie.Busy:
    #now, we got the page loaded and DOM is filled up
    #so get the text
    text = ie.Document.body.innerHTML
    #text is in unicode, so get it into a string
    text = unicode(text)
    text = text.encode('ascii','ignore')
    #save some memory by quitting IE! **very important** 
    #return text
    print text
Last Updated ( Tuesday, 28 April 2009 )
< Prev   Next >




  • hey how can we clear history of a single skype group ? any idea? More...
  • Dear Brent I'm trying to get the points coordinate in a PYTHON/CATIA macro but I... More...
  • Hi Brent, I'm trying to create a macro for create the best fitting circle given... More...
  • I have run your code on my own machine, but I just got the last paragraph of the... More...
  • Hello Gabriel, It seem that the linked site moved to (... More...

Login Form

Lost Password?

My prefered Python IDE

My prefered Python editor is Pyscripter from MMExperts. It is not only an editor. Pyscripter is a full Python IDE including (remote) debugging, a class browser, and all other nice helpers which a full featured IDE needs.

Do you have a script for me ?

Do you have an interesting Python script which does some really cool thing on Windows ? Please post them to this site. It`s very simple - simply copy&paste it to this form. No login is requiered.

Hint: For syntax highlighting and correct Python intendation place your code between html tags <pre> and </pre>.

My prefered web framework

My prefered web framework for developing web applications is Django. Django calls itself The web framework for perfectionists with deadlines. It is a really fast, scalable and (thanks Python) the sexiest web framework of the world.