| Who | When |
Messages | |
(not accepting new messages)
|
|
| Fredrik Lundh
|
95
|
 |
|
12-30-2005 05:47 AM ET (US)
|
|
|
Fredrik Lundh
|
94
|
 |
|
12-17-2005 03:25 AM ET (US)
|
|
Edited by author 12-23-2005 08:37 AM
(update: this bug is fixed in cElementTree 1.0.4)
Chris: this looks like a refcount bug in the default entity handler. Here's a patch:
=== cElementTree.c ================================================================== --- cElementTree.c (revision 1128) +++ cElementTree.c (local) @@ -1953,7 +1953,6 @@ res = PyObject_CallFunction(self->handle_data, "O", value); else res = NULL; - Py_DECREF(value); Py_XDECREF(res); } else { PyErr_Format(
thanks! /F
|
Chris Olds
|
93
|
 |
|
12-16-2005 07:42 PM ET (US)
|
|
I'm using ElementTree 1.2.6 with Python 2.4 on WinXP. With ElementTree.py, I can define entities by setting the entity dict in the XMLTreeBuilder object. With cElementTree, I get different behavior depending on whether or not a DOCTYPE is present in the file. If I have a doctype, parsing works, but I get a segfault when the program finishes. If I do not have a doctype, I get 'undefined entity' exceptions, but no segfault
doc = """<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE patent-application-publication SYSTEM "pap-v15-2001-01-31.dtd" []> <patent-application-publication> <subdoc-abstract> <paragraph id="A-0001" lvl="0">A new and distinct cultivar of Begonia plant named ‘BCT9801BEG’.</paragraph> </subdoc-abstract> </patent-application-publication>"""
#from elementtree import ElementTree as et import cElementTree as et
entities = { u'rsquo' : u"’", # <!--=single quotation mark, right --> u'lsquo' : u"‘", # <!--=single quotation mark, left --> }
parser = et.XMLTreeBuilder() parser.entity.update(entities) parser.feed(doc) t = parser.close() print t.find('.//paragraph').text
|
Fredrik Lundh
|
92
|
 |
|
12-13-2005 11:54 AM ET (US)
|
|
Working on it, working on it. (the technical issues are no problem, but where should it be placed?)
|
| Manuzhai
|
91
|
 |
|
12-13-2005 04:30 AM ET (US)
|
|
Great that ETree has landed in the stdlib. I hope you can sort out the issues with expat soon so that we can also use cElementTree.
|
Fredrik Lundh
|
90
|
 |
|
12-04-2005 04:20 PM ET (US)
|
|
"As you have figured out the encoding would it be safer to use: codes.open(filename, mode, encoding)"
Agreed. I will update the article.
Thanks! /F
|
| mark_m
|
89
|
 |
|
12-01-2005 12:35 PM ET (US)
|
|
I would be concerned that the cElementTree encoding workaround is unsafe.
The line that concerns me .. while 1: s = f.read(1000000) # <-- THIS LINE if not s:
If you are reading in a file that represents a single character as multiple bytes then you could have a problem if a character starts on the 1,000,000 character boundary, but it's other byte(s) are in the next chunk.
As you have figured out the encoding would it be safer to use: codes.open(filename, mode, encoding)
(I have never used this function so not sure if it solves the problem)
Thanks Mark
|
|
|
88
|
 |
|
11-25-2005 10:35 PM ET (US)
|
|
Deleted by topic administrator 11-26-2005 04:38 AM
|
| Ralph Meijer
|
87
|
 |
|
11-24-2005 09:02 AM ET (US)
|
|
Referring to: http://online.effbot.org/2005_11_01_archive.htm#20051124. 1) I don't think the result of simple_eval('((...))') should be one-tuples of one-tuples (etc) of the empty tuple, but rather simply the empty tuple. The other brackets are just that. 2) cpython has a hardcoded *parser* limit of 32 nested expression levels. This has bitten me too while generating python from other languages. When writing python manually you wouldn't normally run into this limit.
|
| Fredrik Lundh
|
86
|
 |
|
11-23-2005 05:13 PM ET (US)
|
|
Yup. That's the "there is at least one more, but I'll return to that one later" buglet I mentioned in my last post (you're not the first one noticing this).
It's a bit embarrasing; my only defense is that the example was derived from a piece of code designed to parse the output from "repr()".
</F>
|
| Marius Gedminas
|
85
|
 |
|
11-23-2005 04:49 PM ET (US)
|
|
Speaking of bugs in the simple iterator-based parser, it also accepts invalid constructs like
>>> simple_eval("(1 2 3 4)")
(1, 2, 3, 4)
(/me looks around for a Preview button, shrugs, then submits)
|
Fredrik Lundh
|
84
|
 |
|
11-17-2005 09:34 AM ET (US)
|
|
|
Fredrik Lundh
|
83
|
 |
|
11-17-2005 08:55 AM ET (US)
|
|
Alfred: I've fixed the mycontroller typo. The second script may yield a "TclError: wrong # args" error if you click on the canvas, but the posted version seems to work as expected if you drag lines.
Stewart: In my experience, being able to focus on one problem at a time is a great way to reduce the complexity, not increase it. As for other advantages, I hope to be able to address them in a followup article (feel free to look at the WCK version of this article for some hints).
|
| Stewart Midwinter
|
82
|
 |
|
11-16-2005 07:25 PM ET (US)
|
|
Following up on Alfred's post... I read the post on tkController. I understand that it brings to Tkinter the same separation that exists in WCK: one set of classes to draw the widgets, and one set to handle the events. What's missing for me is an understanding of why you would want to do this. Other than it being an interesting intellectual exercise, what particular benefits are there that offset the probable greater complexity?
thanks S
|
| Alfred Milgrom
|
81
|
 |
|
11-16-2005 06:15 PM ET (US)
|
|
Thank you for the very interesting Tkinter Tricks ( http://online.effbot.org/#20051113 posted on November 13). I just wanted to alert you to a simple error in the program as shown on the blog: The first example defines a class MyController, but the code later on refers to it as ClickController. As well, when I run the second example I encounter an error, but have not tracked it down. The refactored code presented as an alternative runs without error. Thanks for your good work, Alfred Milgrom (fredm [at] smartypantsco [dot] com)
|
| Douglas Beethe
|
80
|
 |
|
11-10-2005 04:43 PM ET (US)
|
|
Edited by author 11-10-2005 04:46 PM
Regarding /m75 and /m76 -- I left out one important point. The socket.setdefaulttimeout(...) works OK for a single-threaded app, but falters in a multi-threaded app where the threads have differing timeout constraints. This implies getting down to the select() level -- do you happen to know of any examples which might have extended your xmlrpclib code base to support multi-threaded clients with independent timeouts? Perhaps something melding asyncore-like capability?
|