| Who | When |
Messages | |
(not accepting new messages)
|
|
Michael Hudson
|
1
|
 |
|
12-09-2004 11:22 AM ET (US)
|
|
I like the way your google ads are now all about Hamlet.
|
Fredrik Lundh
|
2
|
 |
|
12-12-2004 05:33 PM ET (US)
|
|
Google moves in mysterious ways...
|
| MatsunoTokuhiro
|
3
|
 |
|
12-13-2004 09:28 AM ET (US)
|
|
|
| Michael
|
4
|
 |
|
12-13-2004 11:00 PM ET (US)
|
|
Hi,
I'm using TidyHTMLTreeBuilder, along with ElementTree, and they are excellent! However, I'm experiencing some problems when throwing CNN.com's website into it. When I do that, I get the following error:
"..in parse tree = TidyHTMLTreeBuilder.parse(source) File "C:\Python23\Lib\site-packages\elementtidy\TidyHTMLTreeBuilder.py", line 89, in parse return ElementTree.parse(source, TreeBuilder()) File "C:\Python23\lib\site-packages\elementtree\ElementTree.py", line 865, in parse tree = ElementTree() File "C:\Python23\lib\site-packages\elementtree\ElementTree.py", line 590, in parse parser.feed(data) File "C:\Python23\Lib\site-packages\elementtidy\TidyHTMLTreeBuilder.py", line 75, in close return ElementTree.XML(stdout) File "C:\Python23\lib\site-packages\elementtree\ElementTree.py", line 879, in XML parser.feed(text) File "C:\Python23\lib\site-packages\elementtree\ElementTree.py", line 1169, in close def close(self): ExpatError: no element found: line 1, column 0 >>> "
It seems to me that, the program crashes when there's a form around all rows, ie. <table><form><tr><td>test</td></tr></form></table>
Could someone else try it on their system?
Cheers, Michael
|
| Ming
|
5
|
 |
|
12-14-2004 11:33 PM ET (US)
|
|
Deleted by author 12-14-2004 11:50 PM
|
| Ming
|
6
|
 |
|
12-14-2004 11:54 PM ET (US)
|
|
Edited by author 12-14-2004 11:54 PM
Hi all, With <html><body>hello</body></html> how is "hello" stored in the element tree? Which node is it under? Similarly, with: foo blah bar, how is bar stored? Which node is it in? Cheers, Ming
|
Fredrik Lundh
|
7
|
 |
|
12-15-2004 02:25 AM ET (US)
|
|
in the body case, the "hello" text ends up in the 'text' attribute of the body node (in HTML/CSS terminology, this is known as an "anonymous block"). in the other cases, the trailing text is stored in the 'tail' attribute of the preceeding element; see: http://effbot.org/zone/element-infoset.htm#mixed-content
|
| Ming
|
8
|
 |
|
12-16-2004 01:08 AM ET (US)
|
|
Hi,
Does TidyHTML ignore some tags e.g. <caption>? When I write the tree back out, the <caption> tags disappear. Please help.
Cheers, Michael
|
| Alan
|
9
|
 |
|
12-18-2004 04:09 PM ET (US)
|
|
Preserving comments in ElementTree
Is there a way to preserve comments using ElementTree?
I thought it might have been something I was doing, but even the simplest possible round trip:
ElementTree.ElementTree(file='file-with-comments.xml').write('file2.xml')
... loses comments that were in the original file. Which is catastrophic (ok, means going and getting the backup copy) if you are actually writing back to the same file and had temporarily (you thought) commented something out.
|
Fredrik Lundh
|
10
|
 |
|
12-19-2004 04:26 PM ET (US)
|
|
The ElementTree class is not really designed for round-tripping of human-authored documents; the parsers are only concerned about the infoset, and the tree writer will happily use its own way to encode things, completely ignoring whether something was originally a character reference or an entity or a CDATA section, etc.
You can add comments to trees, though, so it should be possible to tweak one of the parsers so it preserves comments. I'll see if I can dig up an example...
|
| Stewart Midwinter
|
11
|
 |
|
12-19-2004 08:35 PM ET (US)
|
|
Edited by author 12-19-2004 08:37 PM
hi, the alternative method for dealing with too-long strings in the Validated Entry widget on this page: http://effbot.org/zone/tkinter-entry-validate.htmand described as follows: 'Note that if the user pastes a long string into the entry box, it will be rejected by this implementation. A better solution might be to change the validate method to: def validate(self, value): if self.maxlength: value = value[:self.maxlength] return value appears to throw an exception if you type more than maxlength characters. My thought is that the validation method cannot deal with the situation where you want to keep typing past maxlength characters. Instead, you need to define a getresults method that will return all of the validated string except in the case of a ChopLength class; in that case it will return only maxlength characters. Also, if you attempt to set an initial value for the entry fields, it will only be accepted for the integer or float entry fields. The init method for the MaxLength subclass has to be modified to pass the value argument through. Lastly, if you set an initial text that is longer than MaxLength class will allow, you will be unable to edit it in any way. As described below, I've added code to deal with this. I've described a solution on the Tkinter wiki at: http://tkinter.unpy.net/wiki/ValidateEntry
|
| Ming
|
12
|
 |
|
12-19-2004 10:40 PM ET (US)
|
|
Edited by author 12-22-2004 09:37 PM
Hi, Could someone try to feed in a Google search results page into TidyHTMLTreeBuilder and see if it crashes? Mine crashes every time with a form inside a table. I wonder if it's just me here. Please confirm. Edit: ** This is a page you can try: ( http://www.google.com.au/search?hl=en&q=effbot&meta=)Please help. This is urgent. Thank you very much in advance
|
| Richard Sharp
|
13
|
 |
|
01-13-2005 09:30 AM ET (US)
|
|
Am I missing something here. I tried to slot in cElementTree where I previously imported ElementTree. I try then to use XMLTreeBuilder, which is obviously not in the c-program but is in the python program. Is there a way I have overlooked for getting round this?
|
Fredrik Lundh
|
14
|
 |
|
01-13-2005 09:44 AM ET (US)
|
|
it's a compatibility glitch: cElementTree uses ET 1.3 names, and doesn't provide an ET 1.2-compatible alias for the parser class. so to use the XMLTreeBuilder directly, you have to access it as XMLParser.
I'll fix this in the next cET relase.
|
| Tim Cradic
|
15
|
 |
|
01-14-2005 09:07 AM ET (US)
|
|
I am trying to use your Console module to set up a non-blocking keypress poll routine. I am running a background function and want to break out if a key is pressed. Do you have any examples that show how to use the Console for this purpose?
Thank you.
|
| Bill Oldroyd
|
16
|
 |
|
01-14-2005 09:28 AM ET (US)
|
|
I am using findtext to extract data from one XML instance to another. findtext converts < to < etc., which is most useful when you want text, but it means I have to convert the < back again to < .
Is there any way of avoiding this ?.
Sorry if this is a simple qestion. I find ElementTree very easy to use.
Bill bill.oldroyd@bl.uk
[I am using ElementTree to help create a gateway to convert between NLM Entrez web service for PubMed and a standard HTTP search protocol SRU - if anyone is interested.]
|