|
|
| Who | When |
Messages | |
(not accepting new messages)
|
|
| Marc M. Adkins
|
67
|
 |
|
03-21-2002 02:28 AM ET (US)
|
|
Hmm, actually looked (albeit briefly) at the specifications you reference.
So...is an RDF channel the equivalent of a QuickTopic? Then in a channel are elements...which (with the threading module) can have children. So at this point we have the single-level equivalent of QuickTopic. If I understand...and I'm really skimming this stuff quickly.
I'm not seeing the equivalent of a "Thread" object, but then perhaps that isn't actually necessary. You can always think of the threads as being defined by all messages that have no parent messages (roots of the lattice, as it were). But wouldn't there be a need for information attached to the threads themselves, necessitating actual thread objects?
Then I keep thinking that there absolutely must be some sort of unique ID for messages/threads. Otherwise what happens if a user imports the same message/thread twice into his/her repository? You don't want stuff duplicated, right?
But then I start thinking about what to use for a unique ID and my head starts to spin. The closest I get is some sort of URL (from the RDF/channel/BBS, whatever) coupled with a unique sequence number that can only be generated (supposedly) by the owner of the URL. So the user then imports messags with URLs not connected with the user. And if the user is bad and removes them...well then my head actually hurts.
Regarding the "how does a child node refer to another item in the same file question," it seems like there is plenty w/in XML to provide for that. I've approached XML by way of XSLT, mostly, but certainly the use of ID and IDREF attributes allows items in an XML file to be linked during XSLT processing. Shouldn't this be sufficient? Especially if we assume unique message IDs? Then there's the whole XLink thang, about which I understand little.
But it is late and I'm probably not thinking this through. Interested in any thoughts whatsoever, glad to see some progress continues.
|
| Steve Yost
|
68
|
 |
|
04-01-2002 09:12 PM ET (US)
|
|
Edited by author 04-01-2002 09:15 PM
Marc, sorry it's taken awhile to get back to you on this. A thread for our purposes is any arbitrary linear time-sequential series of messages in a particular forum on a particular topic. It can have zero or more branching points within it, and not all messages on all branches need be included. For example, let's say we have a tree-structured thread with many small branches and sub-branches. I can pick up all the messages from 3-Jan-2002 to 4-Apr-2002 on one arbtirarily selected depth-first traversal through the tree and call that a thread. I can also pick *all* the messages between those dates and call it a thread, though it has many branches. The idea is that I should be able to export an import either of these structures. Yes, there's a need to identify each message in a thread. That's currently done via the identifiers in the <items><rdf:Seq> section of the document. Each item in the sequence has a URI (e.g <rdf:li rdf:resource=" http://www.quicktopic.com/7/H/rhSrjkWgjnvRq/p1.1" />). That works fine for Quick Topic, where each message does have a URI. But what about email, for example? In email, each message has a Message-ID, which would suffice for tracking successive import/exports. The purpose for the ID is to manage successive export/imports from one particular source to one particular destination (e.g. email to Quick Topic), not to be able to (for example) transport threads arbitrarily to many services and gather them up again later. So a globally unique ID isn't necessary. Are we being too short-sighted or just expedient? If we need to, can we choose a unique naming later? Duplicate messages with different IDs will be floating around in various services. Is that OK? Maybe we should say that the original ID (which may be service-specific) should at least be preserved and identified as such. I like that. So, say I start with email, move the thread to QT, then move it on to Topica. The export from QT should include the original email Message-IDs. Oh, and regarding intra-file parent-child node references, yes XML has ID and IDREF. My question is what we include in our particular threadsML schema in this regard. Thanks for your thoughts, and for helping to keep *this* thread moving.
|
| Ben Hammersley
|
69
|
 |
|
05-06-2002 12:43 PM ET (US)
|
|
*thread-merging from offlist discussion - the formatting may go astray.*
> Have you had any more thoughts about the unique naming? I'm especially > interested in the ideas around moving from one service to another. > Say, taking a thread from email, to Quicktopic, to a blog, to IM and > back to email again. Off the top of my coffee addled brain I can see > all sorts of very interesting visualisation applications here - > graphically travelling down threads, and so on. As the thread gets > longer and more branched, the very structure gets as interesting as > the content. > > Again, off the top of my head, giving a *topic* a guid would allow for > multiple services to syndicate conversations off each other. Add in > Publish and Subscribe, and you could follow a slashdot thread (which > itself is a branching of a quicktopic discussion) using an IM client, > which is talking to a usenet gateway... Ahh - braindumpy, that, but it > could be very powerful. Especially if you add in enough dc and rdf > stuff and make them searchable: "find me all the conversations where > Rael and Dave discussed football" "find me all the conversation > points, where Rael replied to Dave in Spanish, about the History of > the Walkman, and where he was using Jabber" > > Gnutellaish searching between conversation services: ooooh.
|
| Steve Yost
|
70
|
 |
|
05-06-2002 12:56 PM ET (US)
|
|
And my email response:
Yes! In fact I've been mulling about the necessity of the similar tools among weblogs: e.g. the ability to search in weblog-only space or following links in both directions. For weblogs these exist (and I think they ought to be snapped up by Google). For web-based message threads, they should exist, including the idea of thread-hopping you mention. The example of searching you give is an great (if extreme :) specimen of the intelligent search I'm thinking of.
Given your input in addition to Marc's, I'm starting to think that it's worth the trouble(?) of adding to the RSS spec to allow for unique IDs (unless there's a module that includes one now). The IDs could be just GUIDs or they could contain information about the originating site. Maybe a separate identifier should denote the orignating site.
|
| Peter Kaminski
|
71
|
 |
|
05-06-2002 12:58 PM ET (US)
|
|
Ben Hammersley writes, >very interesting visualisation applications here graphically travelling >down threads Alex Shapiro has a very nice applet for generalized visualization of networks that might be nice for a highly-branched, multi-service group of related threads: TouchGraph < http://www.touchgraph.com/>. It's inspirational, at least, to play with such a nice visualizer -- my favorite application so far is the Google Similar Pages Browser. -- Pete http://www.istori.com/peterkaminski
|
| Marc M. Adkins
|
72
|
 |
|
05-06-2002 02:07 PM ET (US)
|
|
If one of the goals is to grab any arbitrary set of messages (say by topic and time frame) and call that a thread than I wonder if we _can_ assign an ID to a thread. It's like assigning a unique ID to a SQL query.
I think we can only assign an ID to a thread that is published somewhere. That is to say, if I have a forum and I decide that these 18 messages constitute a thread it can have an ID which is a combination of my forum's ID and some unique identifier in the context of my forum. Or I can allow any top-level message (not in response to an existing message) to define a new thread. Or whatever.
If you then come along and pull a subset of my thread, that query is not in itself a new thread. If you then publish the results of that query on your own forum then it becomes a new thread, but in the context of your forum.
A little like the distinction between an XML node or document and an XML node list, which is somewhat ephemeral.
---
I have also been thinking about graphical tools for viewing threads and messages. Here's an additional twist:
Consider mind maps and other graphical idea organizing tools. Wouldn't it be great if a set of messages on a forum (possibly/probably in multiple threads or even in multiple forums???) could be viewed/organized as a mind map?
Use case: we all decide to actually build the specification for ThreadML. We go in and organize all the message traffic herein into a mind map (or whatever model makes the most sense). We then add messages to the nodes in the mind map, filling in the (now obvious blanks). From there we create a second projection of the underlying data, which is an outline of the projected specification document. After that, writing the document should be easy (or at least not so hard).
I found a nice site with a lot of different "mind map-like" diagrams but I can't find the linkage right now.
|
| Ben Hammersley
|
73
|
 |
|
05-06-2002 02:45 PM ET (US)
|
|
> If one of the goals is to grab any arbitrary set of > messages (say by topic and time frame) and call that a > thread than I wonder if we _can_ assign an ID to a > thread.
I think you can - you just have to make the ID extensible: the id represents not just the message's URI, and it's position in the thread and the UID of the discussion itself (ie. the ultimate root post), but also the UID of the most root-post post on that system...
So if I grab a branch of a thread and use it elsewhere, it may look like a new thread - it may even act like a new thread, but the UIDs of new messages would contain the UID of their post-split root post, which in turn contains the UID of the original root post.
This would also allow the newly created thread, made from the cutting so to speak, to eventually be reassociated with the genuine root post, with everything in its right place.
Does that make sense?
|
| Marc M. Adkins
|
74
|
 |
|
05-06-2002 03:08 PM ET (US)
|
|
> I think you can - you just have to make the ID extensible: > the id represents not just the message's URI, and it's > position in the thread and the UID of the discussion > itself (ie. the ultimate root post), but also the UID of > the most root-post post on that system...
I'm probably not following you, but it sounds like the UIDs would keep getting longer and longer. Or that they would refer to previous UIDs (which would in turn refer to other UIDs) which would mean that the entire set of UIDs would need to exist forever or the ancestral data would be lost.
I agree that we want to keep the provenance [sic] of each message and thread and so forth. I'm just unsure how it happens efficiently.
Perhaps an example would clarify this? In your copious free time. ;)
|
| Ben Hammersley
|
75
|
 |
|
05-06-2002 03:21 PM ET (US)
|
|
> I'm probably not following you, but it sounds like the UIDs > would keep getting longer and longer. > > Perhaps an example would clarify this? In your copious free > time. ;)
You're right - they would keep getting longer and longer. But they would never be that long, and the additional utility would be of greater value than the cost of another 20, say, characters.I'll try and get an example going. But even with enormous amounts of thread branches, I've a feeling it could be done with a minimum of fuss. It just needs an *evil* encoding scheme. Well, it's worth a thought or two anyway. I'll give it a go.
Question for everyone: how many characters should a UID have max? 20? 50? 100?
|
| Marc M. Adkins
|
76
|
 |
|
05-06-2002 03:50 PM ET (US)
|
|
Edited by author 05-06-2002 03:51 PM
Try this on for size... Let's say I have a forum at http://www.Doorways.org/Forum/DarkKnight. On the forum are a ton of messages which would be identified as http://www.Doorways.org/Forum/DarkKnight/0001 or some such. These are unique IDs. Thread IDs would look like http://www.Doorways.org/Forum/DarkKnight/Thread/0001. Again, unique IDs. You come along and ask for all messages from http://www.Doorways.org/Forum/DarkKnight/Thread/0023 from April of 2002. You get back a structure like: <thread id=" http://www.Doorways.org/Forum/DarkKnight/Thread/0023"> <message id=" http://www.Doorways.org/Forum/DarkKnight/0691"> Yeah, well Batman could beat up the Green Lantern ANY day! </message> </thread> Now you put the messages on your site, under your forum: http://www.ComicNerds.net/Forum/Batman/Thread/0187. Let me suggest that you store, for each message, the _original_ id (e.g. http://www.Doorways.org/Forum/DarkNight/0056). This original ID should _always_ remain with each message since it unique identifies it across all time and space. So if a third party grabs your 0187 thread including my message they'll get something like: <thread id=" http://www.ComicNerds.net/Forum/Batman/Thread/0187"> <message id=" http://www.Doorways.org/Forum/DarkKnight/0691"> Yeah, well Batman could beat up the Green Lantern ANY day! </message> <message id=" http://www.ComicNerds.net/Forum/Batman/0123"> Green Lantern would whip his butt! </message> </thread> So if the thread eventually makes its way back to the original http://www.Doorways.org/Forum/DarkNight forum the message(s) that have originated there will be re-matched properly. What happens if the original forum is disbanded? Well, the URL is still unique. The message is uniquely identified. If it shows up on some other site from two different message pulls it will not be duplicated. Keep in mind that XML namespaces work this way, they don't necessarily represent an actual data object on a site (though they often do point to a schema file), the URLs just specify unique IDs for the namespaces. The only problem I see here is if the forum is _restarted_ and the same unique URL is assigned to a new message. Of course, this could be easily handled by simply checking messages with matching IDs to see if they actually match. Now I don't see this scaling to thread IDs. For one thing, the query may only pull part of a thread. For another, the thread changes over time (I'm assuming that the message doesn't, but that's probably open for discussion). So if you pull my entire Thread/0023 and place it on your site as Forum/Batman/Thread/1117 it shouldn't necessarily have an original ID of DarkNight/Thread/0023.
|
| David Weinberger
|
77
|
 |
|
05-06-2002 04:04 PM ET (US)
|
|
Off topic, but easier...
Since blogthreads are neither simply chronological nor hierarchical, a map that shows all the links among them is likely to become less useful as it becomes more comprehensive. Once y'all have solved the hard problems, do you think it'd make sense to consider having an attribute that codes for "direct reference" or "main reference" or some such? For example, if C replies to the outrageous lies in A, but in passing mentions that B's blog entry also shows that A is lying through his teeth, it'd be nice to know that even though C mentions both A and B, C is really replying to A.
Other attributes that capture relevancy and popularity (e.g., "found this blog entry helpful" as per a discussion with Ben Hammersley) would be useful for whatever apps decide to make these threads visible. It might be useful to have some set of such attributes built in, in addition to of course having the standard be extensible. But this is why I don't write standards.
|
| Marc M. Adkins
|
78
|
 |
|
05-06-2002 04:13 PM ET (US)
|
|
> It might be useful to have some set of [link] attributes > built in, in addition to of course having the standard be > extensible. But this is why I don't write standards.
XLink will provide a lot of extra information on links. For example, an XLink can have a "role" attribute that specifies a URL that identifies some aspect of how the link would be used (TBD, so far as I know).
On the down side, XLinks are _really_ complicated to use and probably force a bunch of other design decisions. Like how would they be specified when submitting a message.
|
| Ben Hammersley
|
79
|
 |
|
05-06-2002 06:17 PM ET (US)
|
|
Right, following a nice cigar and a walk of the dog, here's my tuppence worth for the night: First, some assumptions: 1. At the time of writing, a message knows only his parents. It can have no knowledge of his children. 2. A message has One parent. 3. Parsers can traverse the message tree. 4. The only thing you are sure to know at the time of writing your message is a) The Parent Message, b) The Thread ID and c) The Time 5. You can only look back in time Soooo.... We define these terms (keeping with the tree metaphor for now, and in this hierarchy) 1. The Root UID - Which is the same as the Message UID of the First Post 2. The Cutting UID - Which is the Message UID of the First Post *known to the new host*, when part of the thread has moved to a new host 3. The Parent UID - The Message UID of the Parent Post 4. The Message UID 5. The Time For the First Post of the thread, all of 1-4 are the same. The Root UID never changes for any of the messages. Ok, so the first message is this: <item> <root uid=" http://example/1"/> <cutting uid=" http://example/1"/> <parent uid=" http://example/1"/> <message uid=" http://example/1"/> <time value="202002T202020Z"/> <title>... <link>... <description>... </item> And the second message is this: <item> <root uid=" http://example/1"/> <cutting uid=" http://example/1"/> <parent uid=" http://example/1"/> <message uid=" http://example/2"/> <time value="202002T202520Z"/> <title>... <link>... <description>... </item> And the third, in reply to the second: <item> <root uid=" http://example/1"/> <cutting uid=" http://example/1"/> <parent uid=" http://example/2"/> <message uid=" http://example/3"/> <time value="202002T203020Z"/> <title>... <link>... <description>... </item> Now say this goes on for a while, and some rascally blogger comes along and grabs a chunk of the thread, and posts it to their site. The next reply at that site would be: <item> <root uid=" http://example/1"/> <cutting uid=" http://rascallyblogger/1"/> <parent uid=" http://example/47"/> <message uid=" http://rascallyblogger/1"/> <time value="202002T212020Z"/> <title>... <link>... <description>... </item> And the next one there: <item> <root uid=" http://example/1"/> <cutting uid=" http://rascallyblogger/1"/> <parent uid=" http://rascallyblogger/1"/> <message uid=" http://rascallyblogger/2"/> <time value="202002T202020Z"/> <title>... <link>... <description>... </item> And so on. To reinsert this cutting, or to make sense of it, all the parser has to do is transverse the tree back up to the Cutting UID, where it finds the cutting point (The Cutting UID's Parent UID) and then continue on its way. If messages are lost, well you're stuck anyway, but at least you have the Root UID and Time to give you a clue as to the general idea. If you want to refer to more than one site in the reply, that's cool - use the DC:isRelatedto or DC:refersTo - that's fine, as threading is inherently 2 Dimensional you can't give more than one Parent. If you want to give some form of approval metric - as per David Weinberger's message - then insert this as a sub-element inside the element you are referring to eg: <root uid=" http://example/1"><approval:metoo>true</approval:metoo></root> This can, of course be used on all the URLs you refer to: <root uid=" http://example/1"><approval:metoo>true</approval:metoo></root> <parent uid=" http://rascallyblogger/1"><approval:metoo>false</approval:metoo></p arent> <dc:isRelatedto rdf:about=" http://othersite"><approval:metoo>true</approval:metoo></dc:i sRelatedto> Erm...that's all I've got so far... Thoughts? Ben
|
| Steve Yost
|
80
|
 |
|
05-06-2002 10:43 PM ET (US)
|
|
Edited by author 05-06-2002 10:53 PM
Ben and Marc, your examples help a lot (being a bear of little brain, I work best with something concrete). One telling statement was this: > On the down side, XLinks are _really_ complicated > to use and probably force a bunch of other > design decisions. ... from which I'll take the lesson "don't let ThreadsML become too complicated, (like XLinks) or people won't want to implement it" :-) Let's revisit the use case that Ben started with: > ...giving a *topic* a guid would allow for > multiple services to syndicate conversations off each other. Syndication implies to me that there's an authoritative source for the items, and others are simply re-publishing. For that I don't think thread IDs are necessary, only message IDs that identify the source. Syndication only requires a publishing schedule, which is covered by the RSS 1.0 Syndication module. So let's isolate "syndication" as a use case and say we've got it covered. Is that correct? Then, there's this: > Add in Publish and Subscribe, and you could > follow a slashdot thread (which > itself is a branching of a quicktopic > discussion)... This implies that a thread can be appended at its copied-to site (a QT-originated thread, copied to and appended at Slashdot, in this case). So far message IDs will still suffice. One question at this point: can a thread be spliced into an existing thread in its new home, i.e. can it be re-parented? I think not. A defining use case: I do a web-wide search using a search engine that understands ThreadsML and I find a message in a thread. I should be able to follow the thread up and down its hierarchy (at the search site); going down along different branches may lead to different sites, but going up should lead only to one site. Note that given the unique message IDs, the search engine recognizes identical messages and gives me the best one (probably at the originating site unless it's down). So, at the destination site of a thread transfer, the thread should be considered some kind of root. Does that make sense? And finally, Ben adds > you could follow a slashdot thread > ...using an IM client, which is talking > to a usenet gateway... The IM client implies what I meant way way back by "tool independent messaging" (I think I said that here). That implies to me an inter-tool protocol for live connections, which is beyond the data standard for exchange. But you could say that the data standard will be the payload of the protocol, so should support live inter-tool messaging. Let's say I'm reading and posting to a particular Slashdot thread via IM. The IM client gets the thread from Slashdot (starting with some date and message ID) and populates the IM client. I then post a message. Do I need a thread ID? I think I just need a parent message ID when posting. The parent message ID that Slashdot provides should (by our definition) be sufficient for Slashdot to locate it and post a reply. Marc added another use case: > So if the thread eventually makes its way back > to the original http://www.Doorways.org/Forum/DarkNight> forum the message(s) that have originated there > will be re-matched properly. That is, a round trip back to the originating site. This is similar to successive export/imports in the same direction between two sites, in that proper matching has to occur. Again, message IDs suffice, I think. What's required for some of these cases is that the importing site maintain the original IDs of the imported messages if it ever exports them again. Should we make this a requirement for anyone to claim full compliance? It adds a storage onus. I don't think it should be a requirement because I think basic one-way, one-time export/import is a big value at a reasonably low cost to implementors. So maybe we should have different levels of support: ThreadsML export (hohum) ThreadsML import (hmmm) ThreadsML round-trip (maintains message IDs) (woooo) Note that the search engine example (a pretty powerful application, I think) requires that sites be able to deliver up ThreadsML with original message IDs. So ThreadsML round-trip (maybe we should call it ThreadsML Complete) is at least strongly encouraged for web-based sites. Does this make sense, or do I have a big blind spot?
|
| Marc M. Adkins
|
81
|
 |
|
05-07-2002 02:24 AM ET (US)
|
|
It's late and the last two messages require some in-depth thought so this isn't my full response thereto. ;)
I think that Steve and I are both saying that unique message IDs solve most of the issues. It's the idea of moving threads around that makes head-scratching.
I'm going to go out on a limb and suggest that the entire concept of a thread is a single example of a view or query on a set of messages. That is to say, if we consider a forum to be equivalent to a relational database table, then a thread is a formalized view on a forum in the same sense that a SQL query can be formalized as a view on a table.
What I want to point out is that threads are only one possible example of such a view. Another might be an outline. Or a Wiki. Or a mind map or other graphical overlay.
So what would it mean to take a collection of records from a table in my database and give it to Steve? Would the query on my database equal the result of the same query on his? And if so at the time the new records are entered, what about a week later, after new records are entered to both databases? The answer is going to tend to be 'no', and I think the same holds through for moving threads.
Moreover, the most useful forum software is going to allow me to go in and change threads. So after Steve takes all the threads on Pizza out of my recipe forum I may go back and re-classify some of the messages he took as Calzone recipes and move them to a different thread. So now some of the messages he took are marked with an originating thread that no longer contains those messages, even in the originating forum.
Another angle is Ben's assumption that a message has only one parent. I actually assume the opposite. My example is a thread on a computer-related site. A question is asked, beginning the thread. Over a week 22 messages respond and the last one actually sums up the issue very nicely. As a forum moderator, I want to be able to place the summation message onto a separate FAQ forum or thread, but I don't want it to disappear from the original thread. Thus it has two parents (and may live on multiple forums, as well).
What I'm trying to get at here is that threads are ephemeral and tracking the messages by originating thread may not make a lot of sense. Unique message IDs are doable and necessary, but thread IDs are problematic and of limited utility.
In addition, re-reading my example above, it may be that even the concept of a forum (or topic in QT) is similarly arbitrary. But I'll not go there just yet...
---
BTW, just because XLinks are complicated is not a definitive reason to avoid them. I agree, it's a telling statement (and a knee-jerk reaction). They're complicated because they're powerful. I brought them up because they seemed to answer the question posed.
What we may need to do is provide uplevel and downlevel options. If you go downlevel, you get the usual HTML anchor tag with no frills. If you go uplevel you get XLink links with full specificity. Of course, that's even more complex...so I think I'm fried and I'm a'gonna stop now.
|
| Ben Hammersley
|
82
|
 |
|
05-07-2002 04:06 AM ET (US)
|
|
> Another angle is Ben's assumption that a message has only one > parent. I actually assume the opposite. My example is a > thread on a computer-related site. A question is asked, > beginning the thread. Over a week 22 messages respond and > the last one actually sums up the issue very nicely. As a > forum moderator, I want to be able to place the summation > message onto a separate FAQ forum or thread, but I don't want > it to disappear from the original thread. Thus it has two > parents (and may live on multiple forums, as well). Just quickly, before I have any coffee this morning, this is incorrect: the FAQ post does not have two parents - it might have two children, but that's not the same. The FAQ's post's one singular parent is the message it replied to, which is message 21 in the previous thread in this example. Its children live in different neighbourhoods, that's all. This is not a problem under my scheme (in fact, it's the whole point of it). Moving a message to a separate thread just starts another Cutting UID. So the original message 22 (the above example) is this: <item> <root uid=" http://example/1"/> <cutting uid=" http://example/1"/> <parent uid=" http://example/21"/> <message uid=" http://example/22"/> <time value="202002T202020Z"/> <title>... <link>... <description>... </item> And the FAQ thread root post would be: <item> <root uid=" http://example/1"/> <cutting uid=" http://faq/1"/> <parent uid=" http://example/21"/> <message uid=" http://example/22"/> <time value="202002T202520Z"/> <title>... <link>... <description>... Note that the Parent UID is the same in both these cases. And any subsequent FAQ post would be: <item> <root uid=" http://example/1"/> <cutting uid=" http://faq/1"/> <parent uid=" http://example/22"/> <message uid=" http://faq/2"/> <time value="202002T202520Z"/> <title>... <link>... <description>... </item> </item> Mmm sweet sweet caffeine, Ben
|
|
|