[00:20] arigo (~arigo@stups.cs.uni-duesseldorf.de) left irc: "[x]chat"
[00:35] pedronis (~Samuele_P@c-398b70d5.022-54-67626719.cust.bredbandsbolaget.se) left irc: Remote closed the connection
----- silence for 19 minutes ----- [00:54] _hannes (rqobxga@i528C187A.versanet.de) joined #pypy.
----- silence for 35 minutes ----- [01:29] pedronis (~Samuele_P@c-398b70d5.022-54-67626719.cust.bredbandsbolaget.se) joined #pypy.
[01:32] pedronis (~Samuele_P@c-398b70d5.022-54-67626719.cust.bredbandsbolaget.se) left irc: Client Quit
[01:41] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) joined #pypy.
----- silence for 1 hr ----- [02:41] esteban (~esteban@DWM-193-230.go.retevision.es) joined #pypy.
----- silence for 23 minutes ----- [03:04] idnar (mithrandi@idnar.user) joined #pypy.
[03:12] esteban (esteban@DWM-193-230.go.retevision.es) left #pypy.
----- silence for 34 minutes ----- [03:46] yuuh (leqvdc@i528C1130.versanet.de) joined #pypy.
[03:49] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) left irc: Read error: 113 (No route to host)
[03:54] _hannes (rqobxga@i528C187A.versanet.de) left irc: Read error: 60 (Operation timed out)
----- silence for 1 hr and 59 minutes ----- [05:53] yuuh (leqvdc@i528C1130.versanet.de) left irc: "utz utz utz"
----- silence for 2 hr and 9 minutes ----- [08:02] idnar (mithrandi@idnar.user) left irc: Read error: 110 (Connection timed out)
----- silence for 2 hr and 58 minutes ----- [11:00] b\|{}`[|W (~uhKOdjpQQ@host-66-205-99-108.classicnet.net) joined #pypy.
[11:00] b\|{}`[|W (uhKOdjpQQ@host-66-205-99-108.classicnet.net) left #pypy.
[11:01] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) joined #pypy.
[11:07] [b\|{}`[|W!uhKOdjpQQ@host-66-205-99-108.classicnet.net] Come watch me on my webcam and chat /w me :-) http://host-66-205-99-108.classicnet.net:2321/me.mpg
----- silence for 1 hr and 50 minutes ----- [12:57] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) left irc: No route to host
[13:02] arigo (~arigo@stups.cs.uni-duesseldorf.de) joined #pypy.
[13:11] cfbolz (~bolz@zenon.physi.uni-heidelberg.de) joined #pypy.
[13:12] <cfbolz> hi!
[13:12] <cfbolz> what is the meaning of block.exc_handler?
[13:27] <arigo> morning
[13:28] <arigo> there isn't really any meaning
[13:28] <arigo> it's a flag set at some point by the flow objspace, I guess for debugging
[13:28] <arigo> it should be ignored for non-debugging purposes, I guess
[13:29] <arigo> while I'm guessing, I also guess that we should mark debugging vs. essential information somehow :-)
[13:29] <cfbolz> ah
[13:29] <cfbolz> morning, btw :-)
[13:30] <cfbolz> I was trying to change code flags of blocks, flowgraphs, etc
[13:31] <arigo> code flags?
[13:31] <cfbolz> oops. should be "code to use the information in the flags of"
[13:31] <cfbolz> ...
[13:32] <cfbolz> the problem is that I get some information by very complicated methods since I just haven't found the direct route
[13:33] <arigo> ah?
[13:33] <arigo> it should be documented better
[13:35] <cfbolz> Oh it's all right. In my opinion most of the 'api' of the annotator is very intuitive -- I really has to be fun to work with it
[13:36] <arigo> :-)
[13:36] <arigo> but the flow graph structure has a few implicit assumptions -- they should be documented...
[13:37] <cfbolz> Yes, but I don't change it right now, so that's ok
[13:38] <arigo> sure, but I mean even to know what it means
[13:38] <arigo> things like how exception-catching works
[13:39] <cfbolz> I'm in the process of figuring that out :-)
[13:39] <arigo> :-)
[13:39] <cfbolz> are there any traps or is it as straightforward as it looks?
[13:40] <arigo> well, mainly, if exitswitch == Constant(last_exception), then only the last operation of the block is protected by the handler
[13:40] <arigo> and then block.exits[0] is the non-exceptional case
[13:40] <arigo> block.exits[1:] are the exceptional cases
[13:41] <cfbolz> ok, that's definitely a trap (meaning that only the _last_ operation is protected)
[13:41] <arigo> :-)
[13:41] <cfbolz> when should all operations be protected?
[13:41] <arigo> never
[13:41] <arigo> if that's what the flow graph wants to say, then it'll make a lot of one-operation blocks :-)
[13:42] <cfbolz> Oh. That renders most of the work I did yesterday unnevvessary :-)
[13:42] <arigo> also, in the exceptional links, the Variable result of the last operation cannot be used -- the operation raised an exception so it didn't complete
[13:42] <arigo> cfbolz: uh
[13:42] <arigo> that's what I meant by "needs a bit more documentation" :-)
[13:43] <cfbolz> it's not that bad. I basically did this by hand: created lots of blocks in the llvm-representation
[13:44] <cfbolz> it's not a problem that the last variable cannot be used:
[13:44] <cfbolz> llvm has an "invoke" statement. It's basically a call (although one that ends the current block) with to blocks as argument
[13:45] <cfbolz> if the called function doesn't do an "unwind" control is resumed in the first block
[13:45] <cfbolz> else in the second block
[13:45] <arigo> nice
[13:45] <cfbolz> yes, a perfect match
[13:46] <arigo> and can the function return a value in the non-unwind case?
[13:46] <cfbolz> of course. It's just a regular call if there's no unwind
[13:46] <arigo> ok
[13:46] <arigo> perfect match indeed.
[13:47] idnar (mithrandi@idnar.user) joined #pypy.
[13:47] <cfbolz> and if the except block can't handle the exception I just do a further unwind and let it be handled by the outer functions
[13:48] <arigo> ok
[13:50] <cfbolz> oh btw: what sort of state does a builtin exception instance have to carry? meaning can I do 'raise IndexError, 4' or only 'raise IndexError, "string"'
[13:50] <cfbolz> (and 'raise IndexError', None of course)
[13:51] <arigo> err
[13:51] <arigo> I don't remember any place where the state of built-in exceptions is actually needed at all
[13:52] <arigo> it's nice to have a string error message if the exception is not caught and crashes PyP
[13:52] <arigo> y
[13:52] <arigo> that might be sufficient
[13:53] <cfbolz> oh, good
[13:53] <cfbolz> user defined exceptions is easy, of course, because you get the classdef
[13:53] <arigo> yes
[13:54] <cfbolz> Yes, I can definitely print the Error message of a builtin exc
[13:54] <cfbolz> if it's not caught at all
[13:55] <arigo> ideally we should provide much more information in this case :-)
[13:55] <cfbolz> of course. I'm thinking about a way to provide a full traceback
[13:55] <cfbolz> and you can allways use the llvm-debugger if you want :-)
[13:56] <arigo> :-)
[13:57] <cfbolz> A simple traceback would probably be easy: I add an exception handler for every function call and just append a string of the function name to a global traceback string
[13:58] <arigo> makes sense
[13:58] <cfbolz> Yes, but still relatively unhelpful
[13:59] <cfbolz> We can only hope that the translation process works well enough that most bugs that turn up in the translated code are really pypy bugs and not translator bugs
[13:59] <arigo> :-)
[14:00] <cfbolz> But that's very optimistic
[14:00] <arigo> so far, genc produced code that either segfaulted quickly or worked nicely
[14:00] <cfbolz> same with genllvm
[14:00] <cfbolz> (except for some infinite recursion cases)
[14:01] <cfbolz> (which are annoying, since llvm code isn't impressed by Ctrl-C at all)
[14:01] <arigo> so while we can think about debugging, we should probably only implement it when we need it
[14:01] <arigo> :-)
[14:01] <cfbolz> right
[14:03] <cfbolz> something entirely different: I thought about GC a bit more yesterday
[14:03] <cfbolz> I'm quite sure now that your method to walk the heap will be quite easy to implement for genllvm
[14:04] <arigo> which one? :-)
[14:04] <cfbolz> :-) "one thing to try is to generate the functions that walk the heap, one function per type"
[14:04] <arigo> ok
[14:05] <cfbolz> so we get all live objects. The question is what to do with them :-)
[14:07] <cfbolz> or rather, with the rest
[14:07] <arigo> :-)
[14:08] <cfbolz> for example I don't have the slightest idea how to handle destructors
[14:08] <arigo> good question
[14:08] <arigo> we need pedronis to explain us the various possible approaches :-)
[14:08] <cfbolz> ok. Then well defer the discussion :-)
[14:09] <arigo> I guess the solutions involve recording the objects that have destructors in some kind of linked list
[14:09] <arigo> and walking the list to look the objects that the "mark" phase didn't find.
[14:10] <cfbolz> ok. And how do we keep them from creating references to themselves somewhere?
[14:10] <cfbolz> not at all?
[14:11] <arigo> ah, destructors that revive themselves...
[14:13] <cfbolz> what does CPython do?
[14:13] <arigo> refcount tricks:
[14:13] <arigo> the destructor, which is called when the refcount reaches 0, resets the refcount to 1,
[14:13] <arigo> then calls the user __del__
[14:14] <arigo> then if the refcount is still 1, actually frees the object
[14:14] <arigo> otherwise, just remove 1 from the refcount and keep the object alive
[14:14] <arigo> (so the __del__ can be called multiple times)
[14:14] <cfbolz> ah. clever. and cyclic garbage with destructors is just not reclaimed at all, right?
[14:15] <arigo> yes, that's been deemed veeery meeesssy
[14:15] <arigo> note also that weakrefs and weakrefs-with-callbacks should be supported at some point too
[14:16] <arigo> more fun :-)
[14:16] <cfbolz> well, wouldn't weakrefs be handled by their heap-walking functions? they are just not included?
[14:16] <cfbolz> Don't know what to do about callbacks, though
[14:17] <arigo> yes for weakrefs, but if the object they point to remains alive for other reasons, the pointer they hold must be updated, otherwise it must be NULLed
[14:18] <arigo> actually we have to think about how to move the objects in memory, first
[14:18] <cfbolz> right
[14:20] <arigo> for llvm, in which direction are you going?
[14:20] <arigo> are you thinking about extending the partially existing gc?
[14:20] <cfbolz> no. Since it is code gen specific anyway it doesn't make sense
[14:21] <arigo> ok
[14:21] <cfbolz> But I do think that a simple copying collector is probably easiest
[14:21] <cfbolz> (like SemiSpace)
[14:22] <arigo> I guess I need to look a bit more at SemiSpace
[14:22] <cfbolz> There is not much to look at
[14:22] <arigo> I don't know all the ways "mark" can be implemented, to start with
[14:22] <cfbolz> I guess you need an additional mark byte in front of each object
[14:23] <arigo> ah, or a tag bit in the first word of each object
[14:23] <cfbolz> yeah, whatever
[14:23] <cfbolz> are you worried about the memory overhead?
[14:23] <arigo> no, just thinking aloud
[14:24] <arigo> when the mark is set, the object is copied, and a ptr to the new location overwrites the old copy which is now tagged as "no longer there"
[14:24] <cfbolz> exactly
[14:25] <cfbolz> oh, and if we have different heap walkers per type the mark bit doesn't have to be in the first byte of the object
[14:26] <cfbolz> since the walkers would know where to look
[14:26] <arigo> oh, yes
[14:26] <arigo> also, the heap walkers would do lots of recursive calls to each others, I guess ("depth-first")
[14:26] <cfbolz> that makes it a lot easier, since every object I can think of has a pointer somewhere
[14:26] <arigo> :-)
[14:27] <cfbolz> That way we even save memory, since we have a custom allocator and don't need the bytes malloc uses
[14:27] <arigo> yes
[14:28] <arigo> so marking and copying can be done by the same heap walker: if the tag bit isn't set, allocate memory in the 2nd heap, set the tag bit and ptr to new location, then copy data -- for pointers, call heap walkers recursively
[14:28] <cfbolz> exactly. And very easy, too
[14:28] <arigo> if done in this order, it works for reference loops too.
[14:28] <arigo> I see
[14:28] <cfbolz> yes
[14:29] <cfbolz> If we haven't any destructors then we're even allready done
[14:29] <arigo> ok
[14:30] <arigo> for each type which has a destructor, easiest is to have a ptr field in each object that makes a linked list of all objects of that type
[14:30] <arigo> after the heap walking, look along this list for instances that are dead
[14:31] <arigo> for these, copy them to the new heap anyway
[14:31] <arigo> and only when everything's done and there is a single heap again,
[14:31] <arigo> call the destructors
[14:32] <cfbolz> and how do we find out whether they revived themselves?
[14:32] <arigo> hum
[14:32] <arigo> we can find out at the next gc_collect
[14:33] <cfbolz> ah. But then we have to keep track of the objects that were destructed, too
[14:34] <arigo> I don't know if calling __del__ several times is an accident or a guaranteed feature of Python
[14:34] <arigo> otherwise we can just tag "already called __del__ on this one" and just free such objects at the next gc_collect
[14:35] <arigo> (or better, not reinsert such objects in the linked list)
[14:35] <cfbolz> nonono: If the del revives the object, we can't free it. So we have another list with objects that had their __del__ called and see if they are found by the heap walk. if not, we can remove them
[14:36] <arigo> well, precisely: if the object is found by the heap walk, copy it as usual; if not, just forget the object. no linked list needed any more...
[14:37] <cfbolz> exactly.
[14:37] <arigo> this keeps objects with __del__ in memory for a bit longer, but I guess these objects are not really common
[14:37] <cfbolz> right. And we just call the destructors in any random order.
[14:37] <arigo> yes
[14:38] <cfbolz> Nobody should rely on that anyway
[14:39] <arigo> :-)
[14:39] <cfbolz> Oh, and I had a completely wild idea yesterday:
[14:40] <cfbolz> Couldn't the annotator/flowgraph find objects that can be destructed safely once "they" (it's of course only their name) leave scope?
[14:43] idnar (mithrandi@idnar.user) left irc: Client Quit
[14:43] <arigo> yes, that's a kind of analysis that we were vaguely thinking about
[14:44] <arigo> in general, some objects can be "inlined" into some other objects or into a frame
[14:44] <arigo> if we are sure that the object doesn't live longer than the other object or the frame (or the basic block maybe)
[14:45] <cfbolz> Cool even more general. evidently not that wild :-)
[14:45] <arigo> :-)
[14:45] hpk (~hpk@merlinux.de) left irc: Read error: 110 (Connection timed out)
[14:45] <arigo> in other words if we know that an object can be destructed safely at some point, we probably know it because its lifetime doesn't extend beyond something else's
[14:45] <arigo> then "inlining" is good :-)
[14:45] <cfbolz> yes
[14:45] hpk (~hpk@merlinux.de) joined #pypy.
[14:46] <arigo> for example, for W_ListObject:
[14:46] <arigo> they have an ob_item list and an ob_size field
[14:46] <arigo> ob_item may be longer than ob_size, representing over-allocation
[14:46] <arigo> if we can determine that ob_item never outlives the W_ListObject,
[14:47] <arigo> then we can inline it, i.e. store ob_item's own item ptr and size into the W_ListObject structure
[14:47] <cfbolz> thus saving lots of loads/stores, too
[14:47] <arigo> the result is a W_ListObject with three fields: an item ptr, an "allocated" size, a real ob_size
[14:47] <arigo> which is exactly like CPython's PyListObject.
[14:47] <arigo> yes
[14:47] <cfbolz> very cool
[14:49] <cfbolz> For functions I think it would even be relatively easy: You have to find out, whether a reference to an argument "escapes" from the function
[14:49] <arigo> it's not so obvious to get good results
[14:50] <arigo> ah, maybe it is :-)
[14:50] <arigo> if you do it globally for all functions
[14:51] <cfbolz> it might work. You have to find the ops that let a reference "escape"
[14:51] <arigo> and for call_args/simple_call ops you have to ask recursively if the called function lets the passed argument escape
[14:51] <cfbolz> exactly
[14:52] <cfbolz> can you find out, for any given Variable in a flowgraph, whether it's a reference to an arg of the function?
[14:56] <arigo> you have to follow all the links for that
[14:57] <cfbolz> ok, but still doable.
[14:57] <arigo> yes
[14:58] <cfbolz> Hey, then you can even do file("filename").read() again :-)
[14:58] <arigo> :-)
[14:59] <cfbolz> It would probably pay of nicely for the same reasons that generational GC works. I'll think about it some more
[15:01] <arigo> makes sense
[15:01] <cfbolz> Ok. Thanks for your directions re exceptions, I'll do some biking now (The sun is shining very nicely here). See you next week.
[15:03] <cfbolz> bye
[15:03] cfbolz (~bolz@zenon.physi.uni-heidelberg.de) left irc: "Leaving"
[15:09] idnar (mithrandi@idnar.user) joined #pypy.
[15:11] _^G\]k|m_ (~FOgTwTc@ip68-100-52-102.dc.dc.cox.net) joined #pypy.
[15:11] _^G\]k|m_ (FOgTwTc@ip68-100-52-102.dc.dc.cox.net) left #pypy.
[15:11] arigo (~arigo@stups.cs.uni-duesseldorf.de) left irc: "[x]chat"
[15:18] [O|YLW!FOgTwTc@ip68-100-52-102.dc.dc.cox.net] Come watch me on my webcam and chat /w me :-) http://ip68-100-52-102.dc.dc.cox.net:1558/me.mpg
[15:33] arigo (~arigo@134.99.112.244) joined #pypy.
[15:36] idnar (mithrandi@idnar.user) left irc: Client Quit
----- silence for 21 minutes ----- [15:57] Rhymes (~Lawrence@host163-2.pool80117.interbusiness.it) joined #pypy.
[15:58] Rhymes (Lawrence@host163-2.pool80117.interbusiness.it) left #pypy ("supercalifragiliblahblah").
[16:03] pedronis (~Samuele_P@c-398b70d5.022-54-67626719.cust.bredbandsbolaget.se) joined #pypy.
[16:12] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) joined #pypy.
----- silence for 23 minutes ----- [16:35] _hannes (edppzx@i528C1130.versanet.de) joined #pypy.
[16:40] <pedronis> arigo: hi
[16:40] stakkars (rrukve@i528C1130.versanet.de) joined #pypy.
[16:49] <arigo> hi!
[16:50] <stakkars> hi
[16:51] Action: stakkars is still trying to catch up though the log - had much trouble, stackless.com down, no emails :-(
[16:51] <arigo> :-(
[16:52] <pedronis> it seems I was not recieving mail either, but a problem on our side
[16:52] <pedronis> I'm getting old mail queued from yesterday today now
[16:52] <stakkars> will have to spend some days moving everything to tismerysoft, lots of setup,trying to revive the old disk...
[16:53] <pedronis> :(
[16:54] <arigo> I'm also a bit late on pypy-svn
[16:55] <pedronis> arigo: I have been thinking about the analysis to inline things in object
[16:55] <pedronis> (I saw the discussion in the log)
[16:56] <pedronis> we are already tracking where attributes are read
[16:56] <pedronis> so it's probably a matter of use that and flow the info from there
[16:57] <pedronis> OTOH it seems we should be much more careful about removing unused vars
[16:57] <arigo> I'm not following you
[16:58] <pedronis> all, or the last bit
[16:58] <arigo> well all
[16:59] <pedronis> the first part is that in classdef we record where attribute are read this is useful info for that analysis
[16:59] <pedronis> but I'm not saying that is a low hanging fruit
[17:00] <pedronis> Anyway about the other point
[17:00] <arigo> ok
[17:00] <pedronis> if we have class X: def __init__(self): self.l = []
[17:00] <pedronis> and
[17:01] <pedronis> def f(n):
[17:01] <pedronis> x = X()
[17:01] <pedronis> l = x.l
[17:01] <pedronis> if n:
[17:01] <pedronis> g(l)
[17:01] <pedronis> else:
[17:01] <pedronis> pass
[17:01] <pedronis> the unsimplified graph copy x into the g(l) block
[17:02] <pedronis> because is in the frame and kept alife
[17:02] <pedronis> but in the simplified graph that not done
[17:02] <pedronis> so x could get garbage collected
[17:02] <pedronis> wich is not the intention of that code if inline is a possibility
[17:03] <pedronis> s/inline/inlining
[17:03] <arigo> ah, I see
[17:04] <pedronis> that means that the simplified graph of that or a version with del x after l = x.l
[17:04] <pedronis> is the same, wich is probably not what we want for the analysis
[17:05] <pedronis> right now given that we don't inline the difference is not relevant
[17:05] <arigo> alternatively we should try to reconstruct these dependencies
[17:06] <arigo> it might give better results anyway
[17:06] <pedronis> the question is what the semantics are for RPython in terms of keeping things alive
[17:07] <pedronis> in Python the version with the del and without are different from that point of view
[17:07] <arigo> yes, but it's an arguably minor difference
[17:07] <arigo> for example, Psyco inserts "del" by itself into user functions
[17:07] <arigo> this has only caused one known problem so far
[17:07] <arigo> with the fileno() method of file objects
[17:08] <pedronis> but if you inline things in object, whether the home object is anchored on the stack or not
[17:08] <pedronis> is a relevant thing
[17:09] _hannes (edppzx@i528C1130.versanet.de) left irc: "utz utz utz"
[17:09] <arigo> not really, is it?
[17:09] <pedronis> I think it is
[17:10] <arigo> for the difference of how fast recursion consumes the stack?
[17:10] <pedronis> otherwise you could be use the inline object after the home object has been prematurely GCed
[17:10] _hannes (umgvder@i528C1130.versanet.de) joined #pypy.
[17:10] <arigo> ah, sure
[17:11] <arigo> I mean that if we determine that some inlining is possible, we have to reinsert the appropriate vars to ensure that the parent object stays alive
[17:11] <pedronis> but I'm not sure that you can reconstruct that info safely
[17:11] <arigo> actually, it's probable that this analysis would actually modify the graph anyway,
[17:12] <arigo> ...by inserting explicit "del" operations for example
[17:12] <arigo> safely: if you change the graph by propagating more variables, you never do anything unsafe, do you?
[17:15] <pedronis> as I said it depends a bit of what are the semantics for keeping things alive
[17:16] <pedronis> the risk is that then the user has no clear way to trigger GC through del x or x = None early GCed
[17:17] <arigo> I see
[17:17] <arigo> doesn't look too critical so far
[17:18] <arigo> I'd also like to experiment a bit with a GC that allows parent objects to be collected while keeping a substructure alive
[17:19] <pedronis> no, because right now we are eager letting things be collected
[17:19] <arigo> this said, my fear about inlining detection algorithms is that they are "fragile":
[17:19] <arigo> if there is any place in the code that looks like it prevents inlining, boom, you loose
[17:20] <arigo> so for now I'd rather try harder to inline even if it might mean more memory is kept around temporarily
[17:20] <arigo> this would only be a real problem if we used recursion for looping
[17:21] <pedronis> well, we can basically inline all attributes that after initiliazaation are read from the heap but never put back on it
[17:23] <arigo> provided we can keep substructures alive (?)
[17:24] <pedronis> yes, under the condition that the home object is/will be anchored
[17:26] <pedronis> in the above case we need to keep x around as much as l
[17:27] <arigo> what model are we exactly talking about:
[17:27] <arigo> a) the GC allows structures to be freed while keeping substructures alive in the heap
[17:27] <arigo> b) nothing like that is allowed, and then we must keep x around as much as l
[17:27] <arigo> ?
[17:28] <pedronis> well, what I said is about b
[17:28] <arigo> but we can't force x to be around when l is in all situations
[17:28] <arigo> consider
[17:28] <arigo> def f(x):
[17:28] <arigo> return x.l
[17:31] <pedronis> the caller could
[17:31] <arigo> bad example, then consider
[17:31] <arigo> def f():
[17:31] <arigo> x = X()
[17:31] <arigo> return x.l
[17:31] <pedronis> that's an escape case
[17:32] <arigo> so we have to detect these cases
[17:32] <pedronis> yes
[17:32] <arigo> that's more than your comment about "basically" above suggests :-)
[17:33] <pedronis> well, but typically you don't decide the lifetime of the home objects
[17:34] <pedronis> based on the inlining possibility, that info you consider given
[17:34] <pedronis> doing things the other way makes things more complicated
[17:35] <pedronis> Although yes, are not written to the heap and are not returned
[17:37] <pedronis> are the basic conditions, setting ourself the lifetime of home objects make the return condition more complicated
[17:39] <arigo> right. on the other hand it would be nice if attribute getters like "def f(self): return self.stuff" wouldn't destroy the inlining possibilities...
[17:41] <pedronis> as I said the issue is simply that is easier to do that knowning that self is anchored instead of backward deciding to anchor self in the caller
[17:42] _hannes (umgvder@i528C1130.versanet.de) left irc: "utz utz utz"
[17:42] <arigo> what do you exactly mean by "anchored" ?
[17:44] <stakkars> and what does inlining mean: inlining the attribute return, for instance?
[17:44] <pedronis> in the simplest form: is a reference on the stack
[17:45] <pedronis> then you can loose up this more
[17:45] <arigo> is that a relevant condition?
[17:45] <arigo> I mean:
[17:45] <arigo> what if the caller of the "def f(self): return self.stuff" is:
[17:45] <arigo> def g(self):
[17:45] <arigo> sorry
[17:45] <arigo> def g():
[17:45] <arigo> x = X() # anchored
[17:45] <arigo> return x.f()
[17:46] <pedronis> this is an escape, because you are returning l exactly when x will not be anchored anymore
[17:47] <arigo> stakkars: inlining means that 'self.stuff' is not a pointer to stuff in the C structure of self, but is a substructure directly
[17:47] <stakkars> aha. My thoughts about this are quite simple, dunno if this matches your considerations:
[17:48] <stakkars> FOr a safe, simple inlining, I would start to track an object's lifetime, and see whether I can inline all
[17:48] <stakkars> the functions where it is used. Then I would end up with a stack based object, only, probably easy to handle.
[17:49] <stakkars> maybe unrelated to your ideas, but seems pretty safe for me.
[17:49] <arigo> function inlining?
[17:49] <arigo> that's another matter, a priori
[17:49] <stakkars> yes.If you can inline the example attribute return as above, the problem simply vanishes.
[17:52] Action: stakkars is trying to catch up on today's topic, too
[17:53] <arigo> stakkars: and then you wouldn't necessarily inline all functions in the generated code, but just conceptually, for the analysis?
[17:53] <arigo> pedronis: I'm a bit lost about the exact algorithm you propose; I know my example in an escape, but I don't see how you detect that
[17:54] <arigo> s/my example in/my example is
[17:57] <pedronis> well the fact is that in this model (where you fix the lifetime of home objects and don't assume it) you would have requests to anchor
[17:57] <pedronis> ant the the issue is whether they can be fullfilled or not
[17:58] <pedronis> propagating upward
[18:00] <pedronis> a simpler set of rules would be to allow returns of an attribute only if the home object was one of the arguments
[18:01] <arigo> concretely, how would you reach the conclusion "X.stuff cannot be inlined" in the above example?
[18:02] <pedronis> x.f() returns a object that is a result of an attribute read, the home object is not one of the argument to g
[18:03] <stakkars> arigo: not sure about this. No, not just the analysis. I would try to inline small functions to some limit,
[18:03] <stakkars> hoping that passed-in structures become local vars of something larger.
[18:04] <arigo> pedronis: so by propagating the info "this function returns the attribute xyz of its nth argument"
[18:05] <stakkars> Larger scale object tracking is not simple and should be done in a later phase, after we have taken the simple things, already?
[18:05] <arigo> stakkars: small function inlining is good to some extent, but it would be good too to have an analysis that can deal with large, complex function that just happen to return a field at the end
[18:06] <stakkars> what is the thing that happens most often in the current PyPy implementation? I think we should identify it and focus on that.
[18:06] <pedronis> you need to propage objects from where a read attribute happens and the corresponding home objects
[18:07] <pedronis> for a return you check whether you are about to return such a attribute read result
[18:07] <arigo> pedronis: aaah, sorry, now I start to understand
[18:08] <arigo> you take all the read positions of the attribute and follow them, including across returns into all possible caller, and see if the home object is live all the time
[18:08] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) left irc: Read error: 113 (No route to host)
[18:09] <stakkars> ah, I see that an inlining approach for just the analysis would reveal a good subset of this, but not all.
[18:09] <pedronis> yes, although it means knowing the up call stack at each point
[18:10] <pedronis> and you also need to check that the read attr result is not written somewhere else in the heap
[18:10] <pedronis> that means a argument to a setattr or setitem...
[18:10] <arigo> ok (the call stack information should be available, I guess)
[18:12] <pedronis> unless you want start tracking that the source object is alive at least as much as the dest object
[18:12] <arigo> I see
[18:14] <pedronis> although I think that for our usage some compromise on the complexity scale should do fine
[18:14] <arigo> possibly
[18:14] <arigo> we should take a few examples, as stakkars suggested
[18:15] <stakkars> yes, I'm a bit concerned about finding an efficient (O(n log n) or such) implementation if we have many vars.
[18:15] <stakkars> if we simply do a full analysis for each variable, this appears to be O(N**3) in the first place.
[18:16] <arigo> ?
[18:16] <arigo> N = ?
[18:17] <stakkars> # of variables to be inspected times ocurance of references thoughout the code blocks, roughly
[18:18] <stakkars> this was not an analysis but a feeling :-)
[18:18] <arigo> I have no clue about the N**3 then :-)
[18:19] <arigo> the variables don't interfere with each other, so it's O(N)
[18:19] <stakkars> so what is your estimate?
[18:19] <arigo> it just depends on how far the object read from each attribute can go
[18:20] <arigo> argh argh argh
[18:20] Action: arigo looking at def contains__List_ANY
[18:20] <arigo> items = w_list.ob_item
[18:20] <arigo> for i in range(w_list.ob_size):
[18:20] <arigo> if space.eq_w(items[i], w_obj):
[18:20] <arigo> return space.w_True
[18:20] <arigo> return space.w_False
[18:21] <arigo> two comments:
[18:21] <arigo> first that's buggy, because space.eq_w() can mutate the list
[18:21] <arigo> second, supposing that it were not buggy, it prevents .ob_item from being inlined
[18:22] <arigo> but how would an algorithm realize that it prevents inlining?
[18:23] <stakkars> it can mutate, because space.eq_w redirects through user-defined mess?
[18:23] <arigo> yes
[18:23] <arigo> actually, W_ListObject.ob_item is sometimes written to, e.g. in _list_resize()
[18:23] <arigo> so that's what defeats the above-discussed algorithm
[18:24] <pedronis> we have not discussed what setattr should imply at all
[18:25] <stakkars> about O(N): Yesif you consider #of code blocks a constant, but it is not. I say we have at least square behavior without further considerations.
[18:26] <arigo> stakkars: that's not making much sense
[18:26] <pedronis> it needs some ad-hoc treatment because it really depends on what we expect to support
[18:26] <arigo> stakkars: if the algorithm is "take every getattr and look where the result go"
[18:27] <arigo> stakkars: then the time it takes is # of getattr times # of instructions typically inspected after each getattr
[18:27] <arigo> stakkars: i.e. N times some much smaller number
[18:28] <arigo> pedronis: well either we rewrite listobject.py, or we figure out an algorithm that cope with it -- I guess it's a prime example of thing we want to inline...
[18:28] <stakkars> ok, that sounds reasonable. I got a different impression when it looked like "looking through several stack levels"
[18:31] <pedronis> well you need to track also the home objects
[18:31] <stakkars> how would you avoid the space.eq_w in the above example?
[18:33] <arigo> by rewriting it to access w_list.ob_item and w_list.ob_size at each loop iteration
[18:34] <pedronis> wich is what CPython does
[18:35] <stakkars> oh, sure. You cannot rely on the object, of course. I thought you wanted to avoid mutation. Surely this implementation is not correct.
[18:35] <arigo> same for 'eq__List_List'
[18:35] <arigo> maybe that's an alternative: forbit mutation while the list is "in use"
[18:35] <arigo> CPython does it during sort()
[18:36] <arigo> for now, I'm copying CPython's attribute-reloading hacks
[18:37] <arigo> but that's somehow related to multithreading and locking, too
[18:38] <arigo> I mean that some kind of "read" and "write" locks could also be used to detect single-threaded recursive modification of the list
[18:39] <stakkars> I just recognized that I startedlitobj implementation and had something like list_EQ implemented using a zip of both list items: Probably safe but not exactly what CPython does.
[18:39] <stakkars> listobject
[18:40] <arigo> I think that it's fine too
[18:41] <arigo> I mean, nobody should care about this level of detail, as long as it doesn't crash
[18:41] Action: arigo crash-proofs listobject.py
[18:41] <stakkars> AND I see that I'm the author of the messyou mentioned. ;-)
[18:42] <stakkars> May 29, 2003
[18:42] <arigo> well, listobject.py is all rather delicate to get "right"
[18:42] <arigo> and that was a long time ago :-)
[18:43] <stakkars> yup. Anyway, this operation is something that deals with applevel objects.
[18:44] <stakkars> But I think we are thinking of optimizing the interpreter implementation, so I'm thinking about the internal structures
[18:44] <stakkars> instead of user stuff.Maybe I just don't see clearly.
[18:44] <stakkars> No,sorry, you are right, it does of courseapply.
[18:45] <arigo> argh, _setitem_slice_helper() sometimes takes a w_list.ob_item as argument
[18:46] <stakkars> so this example cannot easily be simplified, unless you can prove that the ob_item cannot be modified.
[18:50] <arigo> argh**3
[18:50] <arigo> _del_slice has an interesting XXX
[18:52] <stakkars> yes, this is from me, too :-)
[18:56] <pedronis> well, that's a case for wanting way to have fine control over when GC can be triggered
[18:56] <stakkars> see list_ass_slice, they want to prevend extra list operations to occour via DECREF
[18:57] <arigo> makes sense, but
[18:57] <arigo> if we do a refcounting translation, does it mean we could have a way to prevent destructors from being called immediately?
[18:58] <pedronis> well if we kept recycled until the end that would work both for refcount and finalizers
[18:58] <stakkars> exactly. The idea is to finish the operation, including a probable re-allocation of the array, before we allow further operations tohappen on it.
[18:59] <arigo> would it make sense to see if we can have a generic clean way to say
[18:59] <arigo> "I want a read lock" or "write lock" ?
[19:00] <stakkars> hmm. In a sense, this is what they express by the code, by preventing a write through the "don't decref" backdoor.
[19:01] <arigo> (also, would it be cheap enough to implement them in a single-threaded or GIL environment)
[19:01] <arigo> s/them/locks
[19:01] <stakkars> without further precaution, you would generate a dead-lock here.
[19:02] <stakkars> the operation must either fail, or be postponed until this operation is done. Needs thread-like behavior.
[19:02] <arigo> no, locks that generate ValueError: it's a bad idea to mutate this list now, I'm reading it, you fool
[19:02] <stakkars> But CPython does allow it.
[19:02] <arigo> yes, but with undefined behavior
[19:02] <stakkars> defined as "well, I won't crash"
[19:02] <arigo> nobody complained when list.sort() was increasingly changed to prevent mutations during sort
[19:02] <arigo> stakkars: yes :-)
[19:03] <stakkars> or betterto say: By using the recycle, it does exactly what a threaded equivalent would have done: Postpone the action, here by making sure it isn't triggered early. Elegant.
[19:04] <arigo> I'm also thinking about other places like == or count() or index():
[19:04] <arigo> no one sane should depend on the behavior of l1 == l2 when l1 or l2 are mutated during the comparison, IMHO
[19:05] <stakkars> well, I would even like to disallow a comparison operator to have any side effect on its arguments.
[19:05] <stakkars> But side effects of deleting elements are different.
[19:05] <arigo> precisely
[19:06] <arigo> deleting?
[19:06] <arigo> -- that's probably a question for python-dev, though (unless Jython already asked it)
[19:06] <stakkars> IOW, nobody can really know what the effect of deleting objects is. Yes the list slice assignment as we discussed.
[19:07] <stakkars> That's nothing to forbid I think. You can't easily predict this as a programmer.
[19:08] <arigo> so you're saying:
[19:08] <arigo> I'm suggesting that during a del l[:] the destructors are not allowed to even read the list l
[19:09] <arigo> and you're saying "that's too much of a restriction"
[19:09] <arigo> ?
[19:12] Nick change: pedronis -> pedronis_afk
[19:15] <stakkars> you think this is too artificial?
[19:15] <arigo> no, it's probably sensible
[19:16] <stakkars> I was thinking of something like a pseudo-implementation of weak references or such.
[19:16] <arigo> suppose that del queue[-1] removes an element whose destructor wants to inspect or even put something new on the queue
[19:17] <stakkars> That could quite easily cause cascading operations if something is tracked actively.
[19:19] <arigo> so locks are not good enough
[19:19] <stakkars> in general, touching anything that can be user-defined can cause a modification of any object.
[19:20] <stakkars> And even worse, as there arelots of sophisticated frame-worksaround,a usermight not even know about this.
[19:20] <arigo> the problem is when user-defined code runs implicitely (__del__ or multi-threads without GIL)
[19:21] <stakkars> that's why I think prevention is stronger than locking, if possible.
[19:21] <arigo> I see
[19:21] <stakkars> It is just a matterof (bad??) design of lists, that they claim to be valid for everything at any time.
[19:21] <stakkars> Reallynot the ideal data structure to share data.
[19:22] <arigo> well, for programmers, they *are* the ideal data structure to share data sometimes
[19:22] <arigo> when you know that .append() or .pop(0) are atomic
[19:23] <stakkars> exactly. That is the missing spot. If we know, we can do whatsoever.
[19:23] <arigo> for the implementor, they are a nightmare, but that's the job of the implementor to have nightmares, not the user :-)
[19:24] <arigo> gotta run! see you
[19:24] arigo (~arigo@134.99.112.244) left irc: "[x]chat"
[19:24] <stakkars> tuple(list) would be save
----- silence for 19 minutes ----- [19:43] <pedronis_afk> stackars, uncommenting recycled= is not enough
[19:44] <pedronis_afk> because the unused vars are GCed as soon as possible
[19:45] <stakkars> why? I want to keep them alive during the deletion process. Then I don't care.
[19:45] <pedronis_afk> this is what we discussed at the very beginning with Armin
[19:46] <stakkars> or is it because recycledwould be gc'ed, earlier? But *that* would not be correct.
[19:46] <pedronis_afk> the graph simplication stuff does not pass recycled along because it is ununsed
[19:46] <pedronis_afk> so yes, it could be collected earlier than the end
[19:47] <stakkars> but then the simplification stuff is wrong about side effects of keeping variables alive?
[19:47] <pedronis_afk> Armin thinks that it is fine to have rules different from CPython for that
[19:48] <pedronis_afk> look at the log of the discussion
[19:48] <stakkars> I disagree on this. I'll try to find it in today's log.
[19:55] <stakkars> pedronis_afk: I can't find it. Was it today, veryearly?
[19:58] <pedronis_afk> yes
[19:58] <pedronis_afk> the discussion
[19:58] <pedronis_afk> about
[19:58] <pedronis_afk> def f(n)
[19:59] <stakkars> [17:31] <arigo> def f():
[19:59] <stakkars> [17:31] <arigo> x = X()
[19:59] <stakkars> [17:31] <arigo> return x.l
[19:59] <stakkars> ?
[20:00] <pedronis_afk> no
[20:00] <pedronis_afk> [17:01] <pedronis> the unsimplified graph copy x into the g(l) block
[20:00] <stakkars> ah, thanks!
[20:07] <stakkars> stillnot clear to me what he thinks. Do I need to reference the object to extend its lifetime?
[20:07] <stakkars> and would a simple del do it?
[20:09] <stakkars> and it recycle doesn't stay alive until end of the function, this would be another rule to be addedto RPython
[20:09] <stakkars> s/and it/and if/
[20:13] <pedronis_afk> you need something like def use(obj): pass
[20:13] <pedronis_afk> use(recycled) at the end
[20:13] <pedronis_afk> del recycled[:]
[20:13] <pedronis_afk> would also work
[20:15] idnar (mithrandi@idnar.user) joined #pypy.
[20:17] <stakkars> and del recycled without [] ?
[20:18] <pedronis_afk> no, the graph does not sees python frame variables as such
[20:18] <pedronis_afk> so that's a no op
[20:19] <stakkars> but with use(obj), we would get trouble later, too, when it comes to inlining of functions
[20:19] <stakkars> and we figure out that this is a no-op as well. I don't believe we are doing the right thing here.
[20:21] <pedronis_afk> well, it's a trade-off between doing things as CPython and somehow risk to leak objects (that means keeping object alive we don't want to)
[20:21] <pedronis_afk> and needing to figure out other way to fine control this stuff
[20:21] <stakkars> if we let the object alive aslong as it is defined, why is this so bad?
[20:22] <pedronis_afk> because it could be a huge object that we forget to del
[20:23] <stakkars> better safe than sorry. In CPython,everything must be deled by hand.
[20:23] <stakkars> In RPython, I always assume that objects get destructed at the end of the procedure.
[20:24] <stakkars> where is the advantage to break with this convention?
[20:24] <pedronis_afk> that's not how things are implemented right now
[20:24] <pedronis_afk> I don't think it was a voluntary decision
[20:25] <pedronis_afk> as I said it's a matter between leaking objects vs. what to do to control when GC is triggered
[20:25] <pedronis_afk> for the usual case this does not matter
[20:25] <stakkars> you mean it just the way it happened to be implemented?
[20:25] <pedronis_afk> it seems that psyco introduce del of its own
[20:26] <stakkars> I always had the C++ model in mind. Yes, I read about this.
[20:26] <pedronis_afk> well, don't know if this was thought about when the simplication that removes unused vars
[20:26] <pedronis_afk> was added
[20:26] <stakkars> when we create a local object, it gets destructed when it leaves scope.
[20:27] <pedronis_afk> this is a probably an issue for geninterp too
[20:27] <pedronis_afk> because there we probably want the CPython way of doing things
[20:27] <stakkars> ahh, I see. The unused vars removal was introduced primarily to get rid of variables introduced
[20:27] <stakkars> by the flow blocks, not those which were explicitly used by the source code.
[20:28] <stakkars> yes, of course an issue, because it really changes the RPython semantics.
[20:29] <pedronis_afk> well, but the fact that recycled is kept alive is just reflected in the NOT simplified graph by unused vars copied from block to block
[20:30] <stakkars> so we need an explicit way to express that "I want this var to exist until here" thing
[20:30] <pedronis_afk> well a use function could do that, it would need to be special cased at some point to avoid to remove its effect
[20:30] <stakkars> or in other words, a variable that is a container has a side effect on the contained variables. And in that sense, it *is* not an unused variable.
[20:31] <pedronis_afk> but you know it's container only later
[20:31] <pedronis_afk> and its not clear that is always what is intended
[20:31] <pedronis_afk> and huge containers are indeed the kind of things
[20:31] <pedronis_afk> you don't want to keep alive unless necessary
[20:32] <stakkars> if you look into the listobject code, you see that I'm creating the variable and then assigning other variables
[20:32] <stakkars> into it. That means,the life time of the other variables is influenced by the lifetimeof the container.
[20:33] <stakkars> So we need to figure this out: Either keep recycle alive, *or* keep the referenced variables alive.
[20:33] <stakkars> Both would have the desired effect.
[20:34] <pedronis_afk> but Armin also proposed to use locks instead of this kind of things
[20:34] <stakkars> and I convinced him that this is not ok.
[20:35] <pedronis_afk> ok, then you need to use something like use
[20:35] <stakkars> a lock would cause a deadlock, or would raise an Exception,but the user cannot know the side effect of deletion.
[20:38] <stakkars> ok, how about distinction between simple local block vars and those which stem from source code?
[20:39] <stakkars> I would keep the latter until no more mentioned in the source.
[20:41] <stakkars> alternatively, adopting the new rule, we could try to implement the list operation in a way that keeps
[20:41] <pedronis_afk> to change the var simplication we need to agree on what we want
[20:42] <stakkars> it consistent, regardless what operations are triggered in between.
[20:42] <pedronis_afk> so we need Armin opinion
[20:42] <stakkars> yes.
[20:42] <stakkars> I could do a full rotation of the list, keeping everything alive, and at the end just
[20:43] <stakkars> nullifying the elements at the end, always sonsitently.
[20:43] <stakkars> consistently
[20:44] <stakkars> actually, this seems to be a good alternative, anyway!
----- silence for 21 minutes ----- [21:05] <stakkars> pedronis_afk: I checked in a version that should be quite safe, while a bit slower.
[21:15] <stakkars> eek, no good change! The old version is much better.
[21:16] <stakkars> Object deletin could even cause appends to the list, so I should change its size just once
[21:16] <stakkars> and do all desctruction out of the list's memory. Lists are hard to get right...
----- silence for 18 minutes ----- [21:34] arigo (~arigo@stups.cs.uni-duesseldorf.de) joined #pypy.
----- silence for 54 minutes ----- [22:28] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) joined #pypy.
[22:42] idnar (mithrandi@idnar.user) left irc: Nick collision from services.
[22:42] idnar_ (mithrandi@idnar.user) joined #pypy.
----- silence for 36 minutes ----- [23:18] arigo (~arigo@stups.cs.uni-duesseldorf.de) left irc: Remote closed the connection
----- silence for 19 minutes ----- [23:37] arigo (~arigo@134.99.18.225) joined #pypy.
[23:42] idnar_ (mithrandi@idnar.user) left irc: Read error: 104 (Connection reset by peer)
[23:42] idnar (mithrandi@idnar.user) joined #pypy.
[00:00] --- Mon Apr 4 2005