QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Friendster spider
Views: 6795, Unique: 2988 
Subscribers: 4
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages    << 20-35  4-19 of 51  1-3 >>
About these ads
Who | When
Messagessort recent-top   
Post a new message
 
nonym  4
08-13-2003 04:18 PM ET (US)
He might have gotten away doing some casual scraping himself, but publishing the tool is asking for trouble... Friendster certainly doesn't have bandwidth to spare for this guy, so I'm pretty sure they'll take steps to block his robot.

If friendster was cool and wanted to encourage this sort of thing, they might just open their MySQL port to the public (I think K5 may do this? someone does...)
Dav ColemanPerson was signed in when posted  5
08-13-2003 04:22 PM ET (US)
Whoa, I'm not so sure that was a good idea. When I wrote my friendster spider so I could graph my network I purposely didn't post the code in public (a few people got copies, including Ben iirc) because I was worried about the amount of traffic it would generate if everyone decided to crawl their network with it.

It seemed the only way to let everyone have at their own network without overloading the already beleaguered Friendster servers is to do some sort of shared p2p database away from the official site so that data already scraped by someone doesn't have to get scraped again. Of course, that gets into legal issues of using their data for a different 'service' (plus up-to-date issues for the data) so I never went forward with it.

I hope the servers hold up....
PeganthyrusPerson was signed in when posted  6
08-13-2003 04:58 PM ET (US)
Edited by author 08-13-2003 04:59 PM
This sort of thing is always fascinating! I've fooled around with similar stuff based on Livejournal's "friends" network; see here for a graph I made of my own friends network near the beginning of the year. (requires SVG)

Also, this is a pretty cool Java program for interactively graphing LJ friend networks.
Eli the BeardedPerson was signed in when posted  7
08-13-2003 06:23 PM ET (US)
The coolest thing about this was the link to the graphviz
site. I like graphing tools.
B. MindfulPerson was signed in when posted  8
08-13-2003 08:45 PM ET (US)
A couple months ago I started working on my own spider, but the purpose was to point out to Freindster and their users that their sandbox was still in the middle of a dangerous world. A quick series of extensive data grabs can be done by any markting department, cultural aggregator, trend research group, nefarious advertising agency to cull and sell a huge treasure trove of personal data that most users would never make public. I ran out of time to finish it up, but now almost anyone could build their own cultural aggregator and better target products to you you don't want.
Ben Discoe  9
08-14-2003 05:47 AM ET (US)
I don't think that overloading Friendster's server is an issue currently, because of the slow response. 30 sec timeout wasn't long enough, i had to push it to 90! That means that the spider spends a long time waiting for each page - no worse than a real person trying to click around their network.

Now, if the spider tried to put in multiple simultaneous requests to try to go faster, then THAT would be obnoxious and inadvisable to distribute..
banksean  10
08-14-2003 02:16 PM ET (US)
Edited by author 08-14-2003 02:19 PM
I just posted the source to a friendster scraper if anyone's interested. It's single threaded, so it doesn't totally whack their servers. I used it to gather the data for my touchgraph-based browser. And to the p2p/friendster caching idea: why not export it to FOAF?
Dav ColemanPerson was signed in when posted  11
08-14-2003 02:48 PM ET (US)
You guys don't get it. It doesn't matter if the spider is single threaded. Mine was single threaded and it was even nice enough to pause between each request. The problem is that by releasing a spider so that anybody can just download it there could potentially be a huge net increase in traffic to the site. Especially, for that matter, something as easy to use as a windows EXE, where even a non-geek can just kick off a spider process (...well, at least they aren't coming with installers yet). Imagine if thousands of people are running these spiders to map their own networks. Sure the spider isn't doing anything other than clicking on links. But it's clicking on every linked friendster it sees, incessantly which is not normal human behavior. Seriously, how do you not see that this is going to be a giant increase in traffic (on an already overloaded set of servers)? It took me days to scrape just a small fraction of my network, by the way.

And FOAF is fine, however it doesn't address the only important issue with caching the data somewhere off of friendsters databases, which is the legal issue of using the company's data for non-approved purposes.

Maybe this is the right approach though. If all the scraping causes problems it may force Friendster to open a net interface to their data so people can build apps on top of it.
Ben Discoe  12
08-14-2003 02:59 PM ET (US)

Dav, i think we simply disagree. I believe:
1. adding an additional pause, when it already takes 3-20 seconds to retrieve a page, seems ineffective.
2. the number of people that are really into graphing their network is smaller than you might think.
3. the number of friendsters that it is worthwhile to graph is not that large - after a few hundred people, people are too unrelated to be interesting, so spider won't run that long.
4. it's not clicking on every link it sees, but moves gradually outward into the network, gently
5. yes, 'encouraging' Friendster to open a net interface would be a nice side-effect of this stuff
Eli the BeardedPerson was signed in when posted  13
08-14-2003 03:02 PM ET (US)
Dav has a point. Does this follow the robots exclusion protocol?
(Not that www.friendster.com has a robots.txt now, but they
might someday.) How long does it wait between page hits? If
it takes 90 seconds to get a page, the site is clearly
overloaded, and a spider should wait an hour or more before
getting the next page. If the response time is speedy, one
page every minute to five minutes is my rule of thumb.
Ben Discoe  14
08-14-2003 03:06 PM ET (US)

Regarding B.Mindful's warning of "extensive data grabs can be done by any markting department", i feel this is not a big issue because of the anonymous nature of friendster. No real names or email addresses are exposed, so the marketing uses of the data are forced to be rather benign. I also fail to see what "treasure trove of personal data that most users would never make public" is on Friendster. I mean, favorite books and movies? Posting such things anonymously hardly seems like a sensitive disclosure.
Dav ColemanPerson was signed in when posted  15
08-14-2003 03:10 PM ET (US)
Edited by author 08-14-2003 03:19 PM
(corrected spelling mistake)

1. adding an additional pause, when it already takes 3-20 seconds to retrieve a page, seems ineffective.

Yes indeed. I put the pause in before I ever ran the spider just in case. But my point was exactly that it didn't matter, jsut as being single threaded didn't matter.

2. the number of people that are really into graphing their network is smaller than you might think.

Yes I disagree. There are many more reasons to get a copy of a friendster network other than making pretty pictures.

3. the number of friendsters that it is worthwhile to graph is not that large - after a few hundred people, people are too unrelated to be interesting, so spider won't run that long.

Again, you're thinking only of how you intended to use it for your own purposes.

4. it's not clicking on every link it sees, but moves gradually outward into the network, gently

What do you mean? I didn't say every link, I said "every linked friendster." And certainly your spider is doing that? I can see how that could be misunderstood though, so to be clear the behavior I'm talking about is traversing the friend links.

5. yes, 'encouraging' Friendster to open a net interface would be a nice side-effect of this stuff

Crossing my fingers, but not holding my breath :)
Dav ColemanPerson was signed in when posted  16
08-14-2003 03:16 PM ET (US)
/m15: i feel this is not a big issue because of the anonymous nature of friendster. No real names or email addresses are exposed, so the marketing uses of the data are forced to be rather benign

OK, I didn't want to let this out but I feel like I need to make a point. My cousin is going to be producing an event this fall and I want to help her publicize it. I'm going to write a spider that will traverse my friendster network (it will actually start from a fakester account) and send a friendster email to every person who lists a bay area city as their location. I'm going to spam friendster.

This isn't a terribly clever idea, and I'm sure I'm not the only person who's thought of it.
Ben Discoe  17
08-14-2003 03:31 PM ET (US)
> send a friendster email to every person who lists a
> bay area city as their location.

Ah. The Friendster 'Bulletin Board' feature is supposed to already provide this capability: announcements to your personal network,
although clumsy. I can see your use of the spider in this way as somewhat valid, since it works around the following failings of
Friendster:

1. They need to make the 'Bulletin Board' more powerful, so that you can specify how many hops you want to sent do, and geographically filter.
2. They need to give users control of who they want to hear from, ie. how close a user has to be, either geographically or # of hops, to be allowed to send a Friendster message. This is necessary to balance out the power of the other features, and limit the appeal of spam-like uses. The current default of "any # of hops" is clearly wide open to mis-use.
To summarize, i think that when Friendster adds the safeguards that it clearly needs, the potential misuses and the appeal of large spidering will decrease.

-Ben
Ben Discoe  18
08-14-2003 03:43 PM ET (US)
On another (related) subject: Fakesters

I feel strongly that both the pro-fakester people and the anti-fakester people (esp. Jonathan of Friendster) are both making a big deal out of nothing. The obvious solution is to simply add a flag for fakesters. This could be either voluntary, or a review process the way it currently is, except that fakesters are flagged rather than removed. Then, each person can choose whether they want to see their Friendster network with or without fakesters in it, and everyone is happy.

Personally, i have a "fake" flag in my spider output format, though of couse it can't be gathered, but manually filled in, so that visualization can toggle those nodes on or off depending on whether you want to see the "Real" social network or not.
banksean  19
08-14-2003 04:40 PM ET (US)
re: fakesters: Fakester Manifesto

Dav: a combo applet of your blogosphere + my touchgraph would be neat.

I've also experimented with some derived measures of my friendster network like ranking users by the number of single people of the opposite sex in their immediate network, if they are single ("mackness").

Friendster really needs to come up with reasons to keep people around (besides dating) after they've collected all their real life friends. Something like the contact list your describe is probably the way to go. I know they've pulled a few fake accounts like "Austin Parties" that are being used for this purpose. I bet they want to add support for it formally and charge for the service in some way.
RSS link What's this?
All messages    << 20-35  4-19 of 51  1-3 >>
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.