==== Channel ##pypy: 04/03/05 ====

[00:20] arigo (~arigo@stups.cs.uni-duesseldorf.de) left irc: "[x]chat"

[00:35] pedronis (~Samuele_P@c-398b70d5.022-54-67626719.cust.bredbandsbolaget.se) left irc: Remote closed the connection

----- silence for 19 minutes -----

[00:54] _hannes (rqobxga@i528C187A.versanet.de) joined #pypy.

----- silence for 35 minutes -----

[01:29] pedronis (~Samuele_P@c-398b70d5.022-54-67626719.cust.bredbandsbolaget.se) joined #pypy.

[01:32] pedronis (~Samuele_P@c-398b70d5.022-54-67626719.cust.bredbandsbolaget.se) left irc: Client Quit

[01:41] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) joined #pypy.

----- silence for 1 hr -----

[02:41] esteban (~esteban@DWM-193-230.go.retevision.es) joined #pypy.

----- silence for 23 minutes -----

[03:04] idnar (mithrandi@idnar.user) joined #pypy.

[03:12] esteban (esteban@DWM-193-230.go.retevision.es) left #pypy.

----- silence for 34 minutes -----

[03:46] yuuh (leqvdc@i528C1130.versanet.de) joined #pypy.

[03:49] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) left irc: Read error: 113 (No route to host)

[03:54] _hannes (rqobxga@i528C187A.versanet.de) left irc: Read error: 60 (Operation timed out)

----- silence for 1 hr and 59 minutes -----

[05:53] yuuh (leqvdc@i528C1130.versanet.de) left irc: "utz utz utz"

----- silence for 2 hr and 9 minutes -----

[08:02] idnar (mithrandi@idnar.user) left irc: Read error: 110 (Connection timed out)

----- silence for 2 hr and 58 minutes -----

[11:00] b\|{}`[|W (~uhKOdjpQQ@host-66-205-99-108.classicnet.net) joined #pypy.

[11:00] b\|{}`[|W (uhKOdjpQQ@host-66-205-99-108.classicnet.net) left #pypy.

[11:01] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) joined #pypy.

[11:07] [b\|{}`[|W!uhKOdjpQQ@host-66-205-99-108.classicnet.net] Come watch me on my webcam and chat /w me :-) http://host-66-205-99-108.classicnet.net:2321/me.mpg

----- silence for 1 hr and 50 minutes -----

[12:57] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) left irc: No route to host

[13:02] arigo (~arigo@stups.cs.uni-duesseldorf.de) joined #pypy.

[13:11] cfbolz (~bolz@zenon.physi.uni-heidelberg.de) joined #pypy.

[13:12] <cfbolz> hi!

[13:12] <cfbolz> what is the meaning of block.exc_handler?

[13:27] <arigo> morning

[13:28] <arigo> there isn't really any meaning

[13:28] <arigo> it's a flag set at some point by the flow objspace, I guess for debugging

[13:28] <arigo> it should be ignored for non-debugging purposes, I guess

[13:29] <arigo> while I'm guessing, I also guess that we should mark debugging vs. essential information somehow :-)

[13:29] <cfbolz> ah

[13:29] <cfbolz> morning, btw :-)

[13:30] <cfbolz> I was trying to change code flags of blocks, flowgraphs, etc

[13:31] <arigo> code flags?

[13:31] <cfbolz> oops. should be "code to use the information in the flags of"

[13:31] <cfbolz> ...

[13:32] <cfbolz> the problem is that I get some information by very complicated methods since I just haven't found the direct route

[13:33] <arigo> ah?

[13:33] <arigo> it should be documented better

[13:35] <cfbolz> Oh it's all right. In my opinion most of the 'api' of the annotator is very intuitive -- I really has to be fun to work with it

[13:36] <arigo> :-)

[13:36] <arigo> but the flow graph structure has a few implicit assumptions -- they should be documented...

[13:37] <cfbolz> Yes, but I don't change it right now, so that's ok

[13:38] <arigo> sure, but I mean even to know what it means

[13:38] <arigo> things like how exception-catching works

[13:39] <cfbolz> I'm in the process of figuring that out :-)

[13:39] <arigo> :-)

[13:39] <cfbolz> are there any traps or is it as straightforward as it looks?

[13:40] <arigo> well, mainly, if exitswitch == Constant(last_exception), then only the last operation of the block is protected by the handler

[13:40] <arigo> and then block.exits[0] is the non-exceptional case

[13:40] <arigo> block.exits[1:] are the exceptional cases

[13:41] <cfbolz> ok, that's definitely a trap (meaning that only the _last_ operation is protected)

[13:41] <arigo> :-)

[13:41] <cfbolz> when should all operations be protected?

[13:41] <arigo> never

[13:41] <arigo> if that's what the flow graph wants to say, then it'll make a lot of one-operation blocks :-)

[13:42] <cfbolz> Oh. That renders most of the work I did yesterday unnevvessary :-)

[13:42] <arigo> also, in the exceptional links, the Variable result of the last operation cannot be used -- the operation raised an exception so it didn't complete

[13:42] <arigo> cfbolz: uh

[13:42] <arigo> that's what I meant by "needs a bit more documentation" :-)

[13:43] <cfbolz> it's not that bad. I basically did this by hand: created lots of blocks in the llvm-representation

[13:44] <cfbolz> it's not a problem that the last variable cannot be used:

[13:44] <cfbolz> llvm has an "invoke" statement. It's basically a call (although one that ends the current block) with to blocks as argument

[13:45] <cfbolz> if the called function doesn't do an "unwind" control is resumed in the first block

[13:45] <cfbolz> else in the second block

[13:45] <arigo> nice

[13:45] <cfbolz> yes, a perfect match

[13:46] <arigo> and can the function return a value in the non-unwind case?

[13:46] <cfbolz> of course. It's just a regular call if there's no unwind

[13:46] <arigo> ok

[13:46] <arigo> perfect match indeed.

[13:47] idnar (mithrandi@idnar.user) joined #pypy.

[13:47] <cfbolz> and if the except block can't handle the exception I just do a further unwind and let it be handled by the outer functions

[13:48] <arigo> ok

[13:50] <cfbolz> oh btw: what sort of state does a builtin exception instance have to carry? meaning can I do 'raise IndexError, 4' or only 'raise IndexError, "string"'

[13:50] <cfbolz> (and 'raise IndexError', None of course)

[13:51] <arigo> err

[13:51] <arigo> I don't remember any place where the state of built-in exceptions is actually needed at all

[13:52] <arigo> it's nice to have a string error message if the exception is not caught and crashes PyP

[13:52] <arigo> y

[13:52] <arigo> that might be sufficient

[13:53] <cfbolz> oh, good

[13:53] <cfbolz> user defined exceptions is easy, of course, because you get the classdef

[13:53] <arigo> yes

[13:54] <cfbolz> Yes, I can definitely print the Error message of a builtin exc

[13:54] <cfbolz> if it's not caught at all

[13:55] <arigo> ideally we should provide much more information in this case :-)

[13:55] <cfbolz> of course. I'm thinking about a way to provide a full traceback

[13:55] <cfbolz> and you can allways use the llvm-debugger if you want :-)

[13:56] <arigo> :-)

[13:57] <cfbolz> A simple traceback would probably be easy: I add an exception handler for every function call and just append a string of the function name to a global traceback string

[13:58] <arigo> makes sense

[13:58] <cfbolz> Yes, but still relatively unhelpful

[13:59] <cfbolz> We can only hope that the translation process works well enough that most bugs that turn up in the translated code are really pypy bugs and not translator bugs

[13:59] <arigo> :-)

[14:00] <cfbolz> But that's very optimistic

[14:00] <arigo> so far, genc produced code that either segfaulted quickly or worked nicely

[14:00] <cfbolz> same with genllvm

[14:00] <cfbolz> (except for some infinite recursion cases)

[14:01] <cfbolz> (which are annoying, since llvm code isn't impressed by Ctrl-C at all)

[14:01] <arigo> so while we can think about debugging, we should probably only implement it when we need it

[14:01] <arigo> :-)

[14:01] <cfbolz> right

[14:03] <cfbolz> something entirely different: I thought about GC a bit more yesterday

[14:03] <cfbolz> I'm quite sure now that your method to walk the heap will be quite easy to implement for genllvm

[14:04] <arigo> which one? :-)

[14:04] <cfbolz> :-) "one thing to try is to generate the functions that walk the heap, one function per type"

[14:04] <arigo> ok

[14:05] <cfbolz> so we get all live objects. The question is what to do with them :-)

[14:07] <cfbolz> or rather, with the rest

[14:07] <arigo> :-)

[14:08] <cfbolz> for example I don't have the slightest idea how to handle destructors

[14:08] <arigo> good question

[14:08] <arigo> we need pedronis to explain us the various possible approaches :-)

[14:08] <cfbolz> ok. Then well defer the discussion :-)

[14:09] <arigo> I guess the solutions involve recording the objects that have destructors in some kind of linked list

[14:09] <arigo> and walking the list to look the objects that the "mark" phase didn't find.

[14:10] <cfbolz> ok. And how do we keep them from creating references to themselves somewhere?

[14:10] <cfbolz> not at all?

[14:11] <arigo> ah, destructors that revive themselves...

[14:13] <cfbolz> what does CPython do?

[14:13] <arigo> refcount tricks:

[14:13] <arigo> the destructor, which is called when the refcount reaches 0, resets the refcount to 1,

[14:13] <arigo> then calls the user __del__

[14:14] <arigo> then if the refcount is still 1, actually frees the object

[14:14] <arigo> otherwise, just remove 1 from the refcount and keep the object alive

[14:14] <arigo> (so the __del__ can be called multiple times)

[14:14] <cfbolz> ah. clever. and cyclic garbage with destructors is just not reclaimed at all, right?

[14:15] <arigo> yes, that's been deemed veeery meeesssy

[14:15] <arigo> note also that weakrefs and weakrefs-with-callbacks should be supported at some point too

[14:16] <arigo> more fun :-)

[14:16] <cfbolz> well, wouldn't weakrefs be handled by their heap-walking functions? they are just not included?

[14:16] <cfbolz> Don't know what to do about callbacks, though

[14:17] <arigo> yes for weakrefs, but if the object they point to remains alive for other reasons, the pointer they hold must be updated, otherwise it must be NULLed

[14:18] <arigo> actually we have to think about how to move the objects in memory, first

[14:18] <cfbolz> right

[14:20] <arigo> for llvm, in which direction are you going?

[14:20] <arigo> are you thinking about extending the partially existing gc?

[14:20] <cfbolz> no. Since it is code gen specific anyway it doesn't make sense

[14:21] <arigo> ok

[14:21] <cfbolz> But I do think that a simple copying collector is probably easiest

[14:21] <cfbolz> (like SemiSpace)

[14:22] <arigo> I guess I need to look a bit more at SemiSpace

[14:22] <cfbolz> There is not much to look at

[14:22] <arigo> I don't know all the ways "mark" can be implemented, to start with

[14:22] <cfbolz> I guess you need an additional mark byte in front of each object

[14:23] <arigo> ah, or a tag bit in the first word of each object

[14:23] <cfbolz> yeah, whatever

[14:23] <cfbolz> are you worried about the memory overhead?

[14:23] <arigo> no, just thinking aloud

[14:24] <arigo> when the mark is set, the object is copied, and a ptr to the new location overwrites the old copy which is now tagged as "no longer there"

[14:24] <cfbolz> exactly

[14:25] <cfbolz> oh, and if we have different heap walkers per type the mark bit doesn't have to be in the first byte of the object

[14:26] <cfbolz> since the walkers would know where to look

[14:26] <arigo> oh, yes

[14:26] <arigo> also, the heap walkers would do lots of recursive calls to each others, I guess ("depth-first")

[14:26] <cfbolz> that makes it a lot easier, since every object I can think of has a pointer somewhere

[14:26] <arigo> :-)

[14:27] <cfbolz> That way we even save memory, since we have a custom allocator and don't need the bytes malloc uses

[14:27] <arigo> yes

[14:28] <arigo> so marking and copying can be done by the same heap walker: if the tag bit isn't set, allocate memory in the 2nd heap, set the tag bit and ptr to new location, then copy data -- for pointers, call heap walkers recursively

[14:28] <cfbolz> exactly. And very easy, too

[14:28] <arigo> if done in this order, it works for reference loops too.

[14:28] <arigo> I see

[14:28] <cfbolz> yes

[14:29] <cfbolz> If we haven't any destructors then we're even allready done

[14:29] <arigo> ok

[14:30] <arigo> for each type which has a destructor, easiest is to have a ptr field in each object that makes a linked list of all objects of that type

[14:30] <arigo> after the heap walking, look along this list for instances that are dead

[14:31] <arigo> for these, copy them to the new heap anyway

[14:31] <arigo> and only when everything's done and there is a single heap again,

[14:31] <arigo> call the destructors

[14:32] <cfbolz> and how do we find out whether they revived themselves?

[14:32] <arigo> hum

[14:32] <arigo> we can find out at the next gc_collect

[14:33] <cfbolz> ah. But then we have to keep track of the objects that were destructed, too

[14:34] <arigo> I don't know if calling __del__ several times is an accident or a guaranteed feature of Python

[14:34] <arigo> otherwise we can just tag "already called __del__ on this one" and just free such objects at the next gc_collect

[14:35] <arigo> (or better, not reinsert such objects in the linked list)

[14:35] <cfbolz> nonono: If the del revives the object, we can't free it. So we have another list with objects that had their __del__ called and see if they are found by the heap walk. if not, we can remove them

[14:36] <arigo> well, precisely: if the object is found by the heap walk, copy it as usual; if not, just forget the object. no linked list needed any more...

[14:37] <cfbolz> exactly.

[14:37] <arigo> this keeps objects with __del__ in memory for a bit longer, but I guess these objects are not really common

[14:37] <cfbolz> right. And we just call the destructors in any random order.

[14:37] <arigo> yes

[14:38] <cfbolz> Nobody should rely on that anyway

[14:39] <arigo> :-)

[14:39] <cfbolz> Oh, and I had a completely wild idea yesterday:

[14:40] <cfbolz> Couldn't the annotator/flowgraph find objects that can be destructed safely once "they" (it's of course only their name) leave scope?

[14:43] idnar (mithrandi@idnar.user) left irc: Client Quit

[14:43] <arigo> yes, that's a kind of analysis that we were vaguely thinking about

[14:44] <arigo> in general, some objects can be "inlined" into some other objects or into a frame

[14:44] <arigo> if we are sure that the object doesn't live longer than the other object or the frame (or the basic block maybe)

[14:45] <cfbolz> Cool even more general. evidently not that wild :-)

[14:45] <arigo> :-)

[14:45] hpk (~hpk@merlinux.de) left irc: Read error: 110 (Connection timed out)

[14:45] <arigo> in other words if we know that an object can be destructed safely at some point, we probably know it because its lifetime doesn't extend beyond something else's

[14:45] <arigo> then "inlining" is good :-)

[14:45] <cfbolz> yes

[14:45] hpk (~hpk@merlinux.de) joined #pypy.

[14:46] <arigo> for example, for W_ListObject:

[14:46] <arigo> they have an ob_item list and an ob_size field

[14:46] <arigo> ob_item may be longer than ob_size, representing over-allocation

[14:46] <arigo> if we can determine that ob_item never outlives the W_ListObject,

[14:47] <arigo> then we can inline it, i.e. store ob_item's own item ptr and size into the W_ListObject structure

[14:47] <cfbolz> thus saving lots of loads/stores, too

[14:47] <arigo> the result is a W_ListObject with three fields: an item ptr, an "allocated" size, a real ob_size

[14:47] <arigo> which is exactly like CPython's PyListObject.

[14:47] <arigo> yes

[14:47] <cfbolz> very cool

[14:49] <cfbolz> For functions I think it would even be relatively easy: You have to find out, whether a reference to an argument "escapes" from the function

[14:49] <arigo> it's not so obvious to get good results

[14:50] <arigo> ah, maybe it is :-)

[14:50] <arigo> if you do it globally for all functions

[14:51] <cfbolz> it might work. You have to find the ops that let a reference "escape"

[14:51] <arigo> and for call_args/simple_call ops you have to ask recursively if the called function lets the passed argument escape

[14:51] <cfbolz> exactly

[14:52] <cfbolz> can you find out, for any given Variable in a flowgraph, whether it's a reference to an arg of the function?

[14:56] <arigo> you have to follow all the links for that

[14:57] <cfbolz> ok, but still doable.

[14:57] <arigo> yes

[14:58] <cfbolz> Hey, then you can even do file("filename").read() again :-)

[14:58] <arigo> :-)

[14:59] <cfbolz> It would probably pay of nicely for the same reasons that generational GC works. I'll think about it some more

[15:01] <arigo> makes sense

[15:01] <cfbolz> Ok. Thanks for your directions re exceptions, I'll do some biking now (The sun is shining very nicely here). See you next week.

[15:03] <cfbolz> bye

[15:03] cfbolz (~bolz@zenon.physi.uni-heidelberg.de) left irc: "Leaving"

[15:09] idnar (mithrandi@idnar.user) joined #pypy.

[15:11] _^G\]k|m_ (~FOgTwTc@ip68-100-52-102.dc.dc.cox.net) joined #pypy.

[15:11] _^G\]k|m_ (FOgTwTc@ip68-100-52-102.dc.dc.cox.net) left #pypy.

[15:11] arigo (~arigo@stups.cs.uni-duesseldorf.de) left irc: "[x]chat"

[15:18] [O|YLW!FOgTwTc@ip68-100-52-102.dc.dc.cox.net] Come watch me on my webcam and chat /w me :-) http://ip68-100-52-102.dc.dc.cox.net:1558/me.mpg

[15:33] arigo (~arigo@134.99.112.244) joined #pypy.

[15:36] idnar (mithrandi@idnar.user) left irc: Client Quit

----- silence for 21 minutes -----

[15:57] Rhymes (~Lawrence@host163-2.pool80117.interbusiness.it) joined #pypy.

[15:58] Rhymes (Lawrence@host163-2.pool80117.interbusiness.it) left #pypy ("supercalifragiliblahblah").

[16:03] pedronis (~Samuele_P@c-398b70d5.022-54-67626719.cust.bredbandsbolaget.se) joined #pypy.

[16:12] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) joined #pypy.

----- silence for 23 minutes -----

[16:35] _hannes (edppzx@i528C1130.versanet.de) joined #pypy.

[16:40] <pedronis> arigo: hi

[16:40] stakkars (rrukve@i528C1130.versanet.de) joined #pypy.

[16:49] <arigo> hi!

[16:50] <stakkars> hi

[16:51] Action: stakkars is still trying to catch up though the log - had much trouble, stackless.com down, no emails :-(

[16:51] <arigo> :-(

[16:52] <pedronis> it seems I was not recieving mail either, but a problem on our side

[16:52] <pedronis> I'm getting old mail queued from yesterday today now

[16:52] <stakkars> will have to spend some days moving everything to tismerysoft, lots of setup,trying to revive the old disk...

[16:53] <pedronis> :(

[16:54] <arigo> I'm also a bit late on pypy-svn

[16:55] <pedronis> arigo: I have been thinking about the analysis to inline things in object

[16:55] <pedronis> (I saw the discussion in the log)

[16:56] <pedronis> we are already tracking where attributes are read

[16:56] <pedronis> so it's probably a matter of use that and flow the info from there

[16:57] <pedronis> OTOH it seems we should be much more careful about removing unused vars

[16:57] <arigo> I'm not following you

[16:58] <pedronis> all, or the last bit

[16:58] <arigo> well all

[16:59] <pedronis> the first part is that in classdef we record where attribute are read this is useful info for that analysis

[16:59] <pedronis> but I'm not saying that is a low hanging fruit

[17:00] <pedronis> Anyway about the other point

[17:00] <arigo> ok

[17:00] <pedronis> if we have class X: def __init__(self): self.l = []

[17:00] <pedronis> and

[17:01] <pedronis> def f(n):

[17:01] <pedronis> x = X()

[17:01] <pedronis> l = x.l

[17:01] <pedronis> if n:

[17:01] <pedronis> g(l)

[17:01] <pedronis> else:

[17:01] <pedronis> pass

[17:01] <pedronis> the unsimplified graph copy x into the g(l) block

[17:02] <pedronis> because is in the frame and kept alife

[17:02] <pedronis> but in the simplified graph that not done

[17:02] <pedronis> so x could get garbage collected

[17:02] <pedronis> wich is not the intention of that code if inline is a possibility

[17:03] <pedronis> s/inline/inlining

[17:03] <arigo> ah, I see

[17:04] <pedronis> that means that the simplified graph of that or a version with del x after l = x.l

[17:04] <pedronis> is the same, wich is probably not what we want for the analysis

[17:05] <pedronis> right now given that we don't inline the difference is not relevant

[17:05] <arigo> alternatively we should try to reconstruct these dependencies

[17:06] <arigo> it might give better results anyway

[17:06] <pedronis> the question is what the semantics are for RPython in terms of keeping things alive

[17:07] <pedronis> in Python the version with the del and without are different from that point of view

[17:07] <arigo> yes, but it's an arguably minor difference

[17:07] <arigo> for example, Psyco inserts "del" by itself into user functions

[17:07] <arigo> this has only caused one known problem so far

[17:07] <arigo> with the fileno() method of file objects

[17:08] <pedronis> but if you inline things in object, whether the home object is anchored on the stack or not

[17:08] <pedronis> is a relevant thing

[17:09] _hannes (edppzx@i528C1130.versanet.de) left irc: "utz utz utz"

[17:09] <arigo> not really, is it?

[17:09] <pedronis> I think it is

[17:10] <arigo> for the difference of how fast recursion consumes the stack?

[17:10] <pedronis> otherwise you could be use the inline object after the home object has been prematurely GCed

[17:10] _hannes (umgvder@i528C1130.versanet.de) joined #pypy.

[17:10] <arigo> ah, sure

[17:11] <arigo> I mean that if we determine that some inlining is possible, we have to reinsert the appropriate vars to ensure that the parent object stays alive

[17:11] <pedronis> but I'm not sure that you can reconstruct that info safely

[17:11] <arigo> actually, it's probable that this analysis would actually modify the graph anyway,

[17:12] <arigo> ...by inserting explicit "del" operations for example

[17:12] <arigo> safely: if you change the graph by propagating more variables, you never do anything unsafe, do you?

[17:15] <pedronis> as I said it depends a bit of what are the semantics for keeping things alive

[17:16] <pedronis> the risk is that then the user has no clear way to trigger GC through del x or x = None early GCed

[17:17] <arigo> I see

[17:17] <arigo> doesn't look too critical so far

[17:18] <arigo> I'd also like to experiment a bit with a GC that allows parent objects to be collected while keeping a substructure alive

[17:19] <pedronis> no, because right now we are eager letting things be collected

[17:19] <arigo> this said, my fear about inlining detection algorithms is that they are "fragile":

[17:19] <arigo> if there is any place in the code that looks like it prevents inlining, boom, you loose

[17:20] <arigo> so for now I'd rather try harder to inline even if it might mean more memory is kept around temporarily

[17:20] <arigo> this would only be a real problem if we used recursion for looping

[17:21] <pedronis> well, we can basically inline all attributes that after initiliazaation are read from the heap but never put back on it

[17:23] <arigo> provided we can keep substructures alive (?)

[17:24] <pedronis> yes, under the condition that the home object is/will be anchored

[17:26] <pedronis> in the above case we need to keep x around as much as l

[17:27] <arigo> what model are we exactly talking about:

[17:27] <arigo> a) the GC allows structures to be freed while keeping substructures alive in the heap

[17:27] <arigo> b) nothing like that is allowed, and then we must keep x around as much as l

[17:27] <arigo> ?

[17:28] <pedronis> well, what I said is about b

[17:28] <arigo> but we can't force x to be around when l is in all situations

[17:28] <arigo> consider

[17:28] <arigo> def f(x):

[17:28] <arigo> return x.l

[17:31] <pedronis> the caller could

[17:31] <arigo> bad example, then consider

[17:31] <arigo> def f():

[17:31] <arigo> x = X()

[17:31] <arigo> return x.l

[17:31] <pedronis> that's an escape case

[17:32] <arigo> so we have to detect these cases

[17:32] <pedronis> yes

[17:32] <arigo> that's more than your comment about "basically" above suggests :-)

[17:33] <pedronis> well, but typically you don't decide the lifetime of the home objects

[17:34] <pedronis> based on the inlining possibility, that info you consider given

[17:34] <pedronis> doing things the other way makes things more complicated

[17:35] <pedronis> Although yes, are not written to the heap and are not returned

[17:37] <pedronis> are the basic conditions, setting ourself the lifetime of home objects make the return condition more complicated

[17:39] <arigo> right. on the other hand it would be nice if attribute getters like "def f(self): return self.stuff" wouldn't destroy the inlining possibilities...

[17:41] <pedronis> as I said the issue is simply that is easier to do that knowning that self is anchored instead of backward deciding to anchor self in the caller

[17:42] _hannes (umgvder@i528C1130.versanet.de) left irc: "utz utz utz"

[17:42] <arigo> what do you exactly mean by "anchored" ?

[17:44] <stakkars> and what does inlining mean: inlining the attribute return, for instance?

[17:44] <pedronis> in the simplest form: is a reference on the stack

[17:45] <pedronis> then you can loose up this more

[17:45] <arigo> is that a relevant condition?

[17:45] <arigo> I mean:

[17:45] <arigo> what if the caller of the "def f(self): return self.stuff" is:

[17:45] <arigo> def g(self):

[17:45] <arigo> sorry

[17:45] <arigo> def g():

[17:45] <arigo> x = X() # anchored

[17:45] <arigo> return x.f()

[17:46] <pedronis> this is an escape, because you are returning l exactly when x will not be anchored anymore

[17:47] <arigo> stakkars: inlining means that 'self.stuff' is not a pointer to stuff in the C structure of self, but is a substructure directly

[17:47] <stakkars> aha. My thoughts about this are quite simple, dunno if this matches your considerations:

[17:48] <stakkars> FOr a safe, simple inlining, I would start to track an object's lifetime, and see whether I can inline all

[17:48] <stakkars> the functions where it is used. Then I would end up with a stack based object, only, probably easy to handle.

[17:49] <stakkars> maybe unrelated to your ideas, but seems pretty safe for me.

[17:49] <arigo> function inlining?

[17:49] <arigo> that's another matter, a priori

[17:49] <stakkars> yes.If you can inline the example attribute return as above, the problem simply vanishes.

[17:52] Action: stakkars is trying to catch up on today's topic, too

[17:53] <arigo> stakkars: and then you wouldn't necessarily inline all functions in the generated code, but just conceptually, for the analysis?

[17:53] <arigo> pedronis: I'm a bit lost about the exact algorithm you propose; I know my example in an escape, but I don't see how you detect that

[17:54] <arigo> s/my example in/my example is

[17:57] <pedronis> well the fact is that in this model (where you fix the lifetime of home objects and don't assume it) you would have requests to anchor

[17:57] <pedronis> ant the the issue is whether they can be fullfilled or not

[17:58] <pedronis> propagating upward

[18:00] <pedronis> a simpler set of rules would be to allow returns of an attribute only if the home object was one of the arguments

[18:01] <arigo> concretely, how would you reach the conclusion "X.stuff cannot be inlined" in the above example?

[18:02] <pedronis> x.f() returns a object that is a result of an attribute read, the home object is not one of the argument to g

[18:03] <stakkars> arigo: not sure about this. No, not just the analysis. I would try to inline small functions to some limit,

[18:03] <stakkars> hoping that passed-in structures become local vars of something larger.

[18:04] <arigo> pedronis: so by propagating the info "this function returns the attribute xyz of its nth argument"

[18:05] <stakkars> Larger scale object tracking is not simple and should be done in a later phase, after we have taken the simple things, already?

[18:05] <arigo> stakkars: small function inlining is good to some extent, but it would be good too to have an analysis that can deal with large, complex function that just happen to return a field at the end

[18:06] <stakkars> what is the thing that happens most often in the current PyPy implementation? I think we should identify it and focus on that.

[18:06] <pedronis> you need to propage objects from where a read attribute happens and the corresponding home objects

[18:07] <pedronis> for a return you check whether you are about to return such a attribute read result

[18:07] <arigo> pedronis: aaah, sorry, now I start to understand

[18:08] <arigo> you take all the read positions of the attribute and follow them, including across returns into all possible caller, and see if the home object is live all the time

[18:08] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) left irc: Read error: 113 (No route to host)

[18:09] <stakkars> ah, I see that an inlining approach for just the analysis would reveal a good subset of this, but not all.

[18:09] <pedronis> yes, although it means knowing the up call stack at each point

[18:10] <pedronis> and you also need to check that the read attr result is not written somewhere else in the heap

[18:10] <pedronis> that means a argument to a setattr or setitem...

[18:10] <arigo> ok (the call stack information should be available, I guess)

[18:12] <pedronis> unless you want start tracking that the source object is alive at least as much as the dest object

[18:12] <arigo> I see

[18:14] <pedronis> although I think that for our usage some compromise on the complexity scale should do fine

[18:14] <arigo> possibly

[18:14] <arigo> we should take a few examples, as stakkars suggested

[18:15] <stakkars> yes, I'm a bit concerned about finding an efficient (O(n log n) or such) implementation if we have many vars.

[18:15] <stakkars> if we simply do a full analysis for each variable, this appears to be O(N**3) in the first place.

[18:16] <arigo> ?

[18:16] <arigo> N = ?

[18:17] <stakkars> # of variables to be inspected times ocurance of references thoughout the code blocks, roughly

[18:18] <stakkars> this was not an analysis but a feeling :-)

[18:18] <arigo> I have no clue about the N**3 then :-)

[18:19] <arigo> the variables don't interfere with each other, so it's O(N)

[18:19] <stakkars> so what is your estimate?

[18:19] <arigo> it just depends on how far the object read from each attribute can go

[18:20] <arigo> argh argh argh

[18:20] Action: arigo looking at def contains__List_ANY

[18:20] <arigo> items = w_list.ob_item

[18:20] <arigo> for i in range(w_list.ob_size):

[18:20] <arigo> if space.eq_w(items[i], w_obj):

[18:20] <arigo> return space.w_True

[18:20] <arigo> return space.w_False

[18:21] <arigo> two comments:

[18:21] <arigo> first that's buggy, because space.eq_w() can mutate the list

[18:21] <arigo> second, supposing that it were not buggy, it prevents .ob_item from being inlined

[18:22] <arigo> but how would an algorithm realize that it prevents inlining?

[18:23] <stakkars> it can mutate, because space.eq_w redirects through user-defined mess?

[18:23] <arigo> yes

[18:23] <arigo> actually, W_ListObject.ob_item is sometimes written to, e.g. in _list_resize()

[18:23] <arigo> so that's what defeats the above-discussed algorithm

[18:24] <pedronis> we have not discussed what setattr should imply at all

[18:25] <stakkars> about O(N): Yesif you consider #of code blocks a constant, but it is not. I say we have at least square behavior without further considerations.

[18:26] <arigo> stakkars: that's not making much sense

[18:26] <pedronis> it needs some ad-hoc treatment because it really depends on what we expect to support

[18:26] <arigo> stakkars: if the algorithm is "take every getattr and look where the result go"

[18:27] <arigo> stakkars: then the time it takes is # of getattr times # of instructions typically inspected after each getattr

[18:27] <arigo> stakkars: i.e. N times some much smaller number

[18:28] <arigo> pedronis: well either we rewrite listobject.py, or we figure out an algorithm that cope with it -- I guess it's a prime example of thing we want to inline...

[18:28] <stakkars> ok, that sounds reasonable. I got a different impression when it looked like "looking through several stack levels"

[18:31] <pedronis> well you need to track also the home objects

[18:31] <stakkars> how would you avoid the space.eq_w in the above example?

[18:33] <arigo> by rewriting it to access w_list.ob_item and w_list.ob_size at each loop iteration

[18:34] <pedronis> wich is what CPython does

[18:35] <stakkars> oh, sure. You cannot rely on the object, of course. I thought you wanted to avoid mutation. Surely this implementation is not correct.

[18:35] <arigo> same for 'eq__List_List'

[18:35] <arigo> maybe that's an alternative: forbit mutation while the list is "in use"

[18:35] <arigo> CPython does it during sort()

[18:36] <arigo> for now, I'm copying CPython's attribute-reloading hacks

[18:37] <arigo> but that's somehow related to multithreading and locking, too

[18:38] <arigo> I mean that some kind of "read" and "write" locks could also be used to detect single-threaded recursive modification of the list

[18:39] <stakkars> I just recognized that I startedlitobj implementation and had something like list_EQ implemented using a zip of both list items: Probably safe but not exactly what CPython does.

[18:39] <stakkars> listobject

[18:40] <arigo> I think that it's fine too

[18:41] <arigo> I mean, nobody should care about this level of detail, as long as it doesn't crash

[18:41] Action: arigo crash-proofs listobject.py

[18:41] <stakkars> AND I see that I'm the author of the messyou mentioned. ;-)

[18:42] <stakkars> May 29, 2003

[18:42] <arigo> well, listobject.py is all rather delicate to get "right"

[18:42] <arigo> and that was a long time ago :-)

[18:43] <stakkars> yup. Anyway, this operation is something that deals with applevel objects.

[18:44] <stakkars> But I think we are thinking of optimizing the interpreter implementation, so I'm thinking about the internal structures

[18:44] <stakkars> instead of user stuff.Maybe I just don't see clearly.

[18:44] <stakkars> No,sorry, you are right, it does of courseapply.

[18:45] <arigo> argh, _setitem_slice_helper() sometimes takes a w_list.ob_item as argument

[18:46] <stakkars> so this example cannot easily be simplified, unless you can prove that the ob_item cannot be modified.

[18:50] <arigo> argh**3

[18:50] <arigo> _del_slice has an interesting XXX

[18:52] <stakkars> yes, this is from me, too :-)

[18:56] <pedronis> well, that's a case for wanting way to have fine control over when GC can be triggered

[18:56] <stakkars> see list_ass_slice, they want to prevend extra list operations to occour via DECREF

[18:57] <arigo> makes sense, but

[18:57] <arigo> if we do a refcounting translation, does it mean we could have a way to prevent destructors from being called immediately?

[18:58] <pedronis> well if we kept recycled until the end that would work both for refcount and finalizers

[18:58] <stakkars> exactly. The idea is to finish the operation, including a probable re-allocation of the array, before we allow further operations tohappen on it.

[18:59] <arigo> would it make sense to see if we can have a generic clean way to say

[18:59] <arigo> "I want a read lock" or "write lock" ?

[19:00] <stakkars> hmm. In a sense, this is what they express by the code, by preventing a write through the "don't decref" backdoor.

[19:01] <arigo> (also, would it be cheap enough to implement them in a single-threaded or GIL environment)

[19:01] <arigo> s/them/locks

[19:01] <stakkars> without further precaution, you would generate a dead-lock here.

[19:02] <stakkars> the operation must either fail, or be postponed until this operation is done. Needs thread-like behavior.

[19:02] <arigo> no, locks that generate ValueError: it's a bad idea to mutate this list now, I'm reading it, you fool

[19:02] <stakkars> But CPython does allow it.

[19:02] <arigo> yes, but with undefined behavior

[19:02] <stakkars> defined as "well, I won't crash"

[19:02] <arigo> nobody complained when list.sort() was increasingly changed to prevent mutations during sort

[19:02] <arigo> stakkars: yes :-)

[19:03] <stakkars> or betterto say: By using the recycle, it does exactly what a threaded equivalent would have done: Postpone the action, here by making sure it isn't triggered early. Elegant.

[19:04] <arigo> I'm also thinking about other places like == or count() or index():

[19:04] <arigo> no one sane should depend on the behavior of l1 == l2 when l1 or l2 are mutated during the comparison, IMHO

[19:05] <stakkars> well, I would even like to disallow a comparison operator to have any side effect on its arguments.

[19:05] <stakkars> But side effects of deleting elements are different.

[19:05] <arigo> precisely

[19:06] <arigo> deleting?

[19:06] <arigo> -- that's probably a question for python-dev, though (unless Jython already asked it)

[19:06] <stakkars> IOW, nobody can really know what the effect of deleting objects is. Yes the list slice assignment as we discussed.

[19:07] <stakkars> That's nothing to forbid I think. You can't easily predict this as a programmer.

[19:08] <arigo> so you're saying:

[19:08] <arigo> I'm suggesting that during a del l[:] the destructors are not allowed to even read the list l

[19:09] <arigo> and you're saying "that's too much of a restriction"

[19:09] <arigo> ?

[19:12] Nick change: pedronis -> pedronis_afk

[19:15] <stakkars> you think this is too artificial?

[19:15] <arigo> no, it's probably sensible

[19:16] <stakkars> I was thinking of something like a pseudo-implementation of weak references or such.

[19:16] <arigo> suppose that del queue[-1] removes an element whose destructor wants to inspect or even put something new on the queue

[19:17] <stakkars> That could quite easily cause cascading operations if something is tracked actively.

[19:19] <arigo> so locks are not good enough

[19:19] <stakkars> in general, touching anything that can be user-defined can cause a modification of any object.

[19:20] <stakkars> And even worse, as there arelots of sophisticated frame-worksaround,a usermight not even know about this.

[19:20] <arigo> the problem is when user-defined code runs implicitely (__del__ or multi-threads without GIL)

[19:21] <stakkars> that's why I think prevention is stronger than locking, if possible.

[19:21] <arigo> I see

[19:21] <stakkars> It is just a matterof (bad??) design of lists, that they claim to be valid for everything at any time.

[19:21] <stakkars> Reallynot the ideal data structure to share data.

[19:22] <arigo> well, for programmers, they *are* the ideal data structure to share data sometimes

[19:22] <arigo> when you know that .append() or .pop(0) are atomic

[19:23] <stakkars> exactly. That is the missing spot. If we know, we can do whatsoever.

[19:23] <arigo> for the implementor, they are a nightmare, but that's the job of the implementor to have nightmares, not the user :-)

[19:24] <arigo> gotta run! see you

[19:24] arigo (~arigo@134.99.112.244) left irc: "[x]chat"

[19:24] <stakkars> tuple(list) would be save

----- silence for 19 minutes -----

[19:43] <pedronis_afk> stackars, uncommenting recycled= is not enough

[19:44] <pedronis_afk> because the unused vars are GCed as soon as possible

[19:45] <stakkars> why? I want to keep them alive during the deletion process. Then I don't care.

[19:45] <pedronis_afk> this is what we discussed at the very beginning with Armin

[19:46] <stakkars> or is it because recycledwould be gc'ed, earlier? But *that* would not be correct.

[19:46] <pedronis_afk> the graph simplication stuff does not pass recycled along because it is ununsed

[19:46] <pedronis_afk> so yes, it could be collected earlier than the end

[19:47] <stakkars> but then the simplification stuff is wrong about side effects of keeping variables alive?

[19:47] <pedronis_afk> Armin thinks that it is fine to have rules different from CPython for that

[19:48] <pedronis_afk> look at the log of the discussion

[19:48] <stakkars> I disagree on this. I'll try to find it in today's log.

[19:55] <stakkars> pedronis_afk: I can't find it. Was it today, veryearly?

[19:58] <pedronis_afk> yes

[19:58] <pedronis_afk> the discussion

[19:58] <pedronis_afk> about

[19:58] <pedronis_afk> def f(n)

[19:59] <stakkars> [17:31] <arigo> def f():

[19:59] <stakkars> [17:31] <arigo> x = X()

[19:59] <stakkars> [17:31] <arigo> return x.l

[19:59] <stakkars> ?

[20:00] <pedronis_afk> no

[20:00] <pedronis_afk> [17:01] <pedronis> the unsimplified graph copy x into the g(l) block

[20:00] <stakkars> ah, thanks!

[20:07] <stakkars> stillnot clear to me what he thinks. Do I need to reference the object to extend its lifetime?

[20:07] <stakkars> and would a simple del do it?

[20:09] <stakkars> and it recycle doesn't stay alive until end of the function, this would be another rule to be addedto RPython

[20:09] <stakkars> s/and it/and if/

[20:13] <pedronis_afk> you need something like def use(obj): pass

[20:13] <pedronis_afk> use(recycled) at the end

[20:13] <pedronis_afk> del recycled[:]

[20:13] <pedronis_afk> would also work

[20:15] idnar (mithrandi@idnar.user) joined #pypy.

[20:17] <stakkars> and del recycled without [] ?

[20:18] <pedronis_afk> no, the graph does not sees python frame variables as such

[20:18] <pedronis_afk> so that's a no op

[20:19] <stakkars> but with use(obj), we would get trouble later, too, when it comes to inlining of functions

[20:19] <stakkars> and we figure out that this is a no-op as well. I don't believe we are doing the right thing here.

[20:21] <pedronis_afk> well, it's a trade-off between doing things as CPython and somehow risk to leak objects (that means keeping object alive we don't want to)

[20:21] <pedronis_afk> and needing to figure out other way to fine control this stuff

[20:21] <stakkars> if we let the object alive aslong as it is defined, why is this so bad?

[20:22] <pedronis_afk> because it could be a huge object that we forget to del

[20:23] <stakkars> better safe than sorry. In CPython,everything must be deled by hand.

[20:23] <stakkars> In RPython, I always assume that objects get destructed at the end of the procedure.

[20:24] <stakkars> where is the advantage to break with this convention?

[20:24] <pedronis_afk> that's not how things are implemented right now

[20:24] <pedronis_afk> I don't think it was a voluntary decision

[20:25] <pedronis_afk> as I said it's a matter between leaking objects vs. what to do to control when GC is triggered

[20:25] <pedronis_afk> for the usual case this does not matter

[20:25] <stakkars> you mean it just the way it happened to be implemented?

[20:25] <pedronis_afk> it seems that psyco introduce del of its own

[20:26] <stakkars> I always had the C++ model in mind. Yes, I read about this.

[20:26] <pedronis_afk> well, don't know if this was thought about when the simplication that removes unused vars

[20:26] <pedronis_afk> was added

[20:26] <stakkars> when we create a local object, it gets destructed when it leaves scope.

[20:27] <pedronis_afk> this is a probably an issue for geninterp too

[20:27] <pedronis_afk> because there we probably want the CPython way of doing things

[20:27] <stakkars> ahh, I see. The unused vars removal was introduced primarily to get rid of variables introduced

[20:27] <stakkars> by the flow blocks, not those which were explicitly used by the source code.

[20:28] <stakkars> yes, of course an issue, because it really changes the RPython semantics.

[20:29] <pedronis_afk> well, but the fact that recycled is kept alive is just reflected in the NOT simplified graph by unused vars copied from block to block

[20:30] <stakkars> so we need an explicit way to express that "I want this var to exist until here" thing

[20:30] <pedronis_afk> well a use function could do that, it would need to be special cased at some point to avoid to remove its effect

[20:30] <stakkars> or in other words, a variable that is a container has a side effect on the contained variables. And in that sense, it *is* not an unused variable.

[20:31] <pedronis_afk> but you know it's container only later

[20:31] <pedronis_afk> and its not clear that is always what is intended

[20:31] <pedronis_afk> and huge containers are indeed the kind of things

[20:31] <pedronis_afk> you don't want to keep alive unless necessary

[20:32] <stakkars> if you look into the listobject code, you see that I'm creating the variable and then assigning other variables

[20:32] <stakkars> into it. That means,the life time of the other variables is influenced by the lifetimeof the container.

[20:33] <stakkars> So we need to figure this out: Either keep recycle alive, *or* keep the referenced variables alive.

[20:33] <stakkars> Both would have the desired effect.

[20:34] <pedronis_afk> but Armin also proposed to use locks instead of this kind of things

[20:34] <stakkars> and I convinced him that this is not ok.

[20:35] <pedronis_afk> ok, then you need to use something like use

[20:35] <stakkars> a lock would cause a deadlock, or would raise an Exception,but the user cannot know the side effect of deletion.

[20:38] <stakkars> ok, how about distinction between simple local block vars and those which stem from source code?

[20:39] <stakkars> I would keep the latter until no more mentioned in the source.

[20:41] <stakkars> alternatively, adopting the new rule, we could try to implement the list operation in a way that keeps

[20:41] <pedronis_afk> to change the var simplication we need to agree on what we want

[20:42] <stakkars> it consistent, regardless what operations are triggered in between.

[20:42] <pedronis_afk> so we need Armin opinion

[20:42] <stakkars> yes.

[20:42] <stakkars> I could do a full rotation of the list, keeping everything alive, and at the end just

[20:43] <stakkars> nullifying the elements at the end, always sonsitently.

[20:43] <stakkars> consistently

[20:44] <stakkars> actually, this seems to be a good alternative, anyway!

----- silence for 21 minutes -----

[21:05] <stakkars> pedronis_afk: I checked in a version that should be quite safe, while a bit slower.

[21:15] <stakkars> eek, no good change! The old version is much better.

[21:16] <stakkars> Object deletin could even cause appends to the list, so I should change its size just once

[21:16] <stakkars> and do all desctruction out of the list's memory. Lists are hard to get right...

----- silence for 18 minutes -----

[21:34] arigo (~arigo@stups.cs.uni-duesseldorf.de) joined #pypy.

----- silence for 54 minutes -----

[22:28] dialtone (~dialtone@host111-56.pool80117.interbusiness.it) joined #pypy.

[22:42] idnar (mithrandi@idnar.user) left irc: Nick collision from services.

[22:42] idnar_ (mithrandi@idnar.user) joined #pypy.

----- silence for 36 minutes -----

[23:18] arigo (~arigo@stups.cs.uni-duesseldorf.de) left irc: Remote closed the connection

----- silence for 19 minutes -----

[23:37] arigo (~arigo@134.99.18.225) joined #pypy.

[23:42] idnar_ (mithrandi@idnar.user) left irc: Read error: 104 (Connection reset by peer)

[23:42] idnar (mithrandi@idnar.user) joined #pypy.

[00:00] --- Mon Apr 4 2005