QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: CSE 291: Analytics and data mining
Printer-Friendly Page
All messages            226-241 of 241  210-225 >>
About these ads
Who | When
Messagessort recent-bottom    (not accepting new messages)
 
Messages 241-239 deleted by topic administrator 07-15-2009 12:31 PM
Charles Elkan  238
06-27-2009 02:24 AM ET (US)

Subject: Netflix Prize: Last Call for Grand Prize

Date: Sat Jun 27 00:29:54 2009 UTC
From: noreply@netflixprize.com

As of the submission by team "BellKor's Pragmatic Chaos" on June 26, 2009 18:42:37 UTC, the
Netflix Prize competition entered the "last call" period for the Grand Prize. In accord with
the Rules, teams have thirty (30) days, until July 26, 2009 18:42:37 UTC, to make submissions
that will be considered for this Prize. Good luck and thank you for participating!
 
Messages 237-236 deleted by topic administrator 07-14-2009 01:40 PM
Charles ElkanPerson was signed in when posted  235
06-12-2009 06:24 PM ET (US)
Final exam and letter grades

The final exam was out of 139. The mean score was 94, with standard deviation 19.

Overall, I assigned six letter grades of A- and higher, and seven of B+ and lower.

If you'd like to pick up your final exam, please email me after July 6.
 Person was signed in when posted  234
06-12-2009 07:20 AM ET (US)
Deleted by topic administrator 06-12-2009 06:01 PM
Charles Elkan  233
06-09-2009 12:01 PM ET (US)
Final exam is today at 6:30pm

In our usual room, CSE 2154. The exam is two hours long. You will be able to stay until 9pm.

You may bring and use all materials handed out in class, printouts from the website and my notes, any handwritten notes of your own, copies of your own quizzes and assignments, and a calculator.

You may also bring one book, but I don't expect any books to be especially useful.
Charles Elkan  232
06-09-2009 11:58 AM ET (US)
Two data mining contests with cash prizes

Machine learning in immunology competition:
http://www.kios.org.cy/ICANN09/MLI.html

Environmental Toxicity Prediction Challenge:
http://www.cadaster.eu/node/65
Charles Elkan  231
06-08-2009 11:43 AM ET (US)
Last quiz and assignment

Out of 6 and 10 respectively, as usual.

Quiz: mean 4.7, stdev 1.3. Assignment: mean 7.8, stdev 1.2.
Charles Elkan  230
06-06-2009 02:09 PM ET (US)
Edited by author 06-06-2009 02:10 PM
/m229: a well-calibrated probability means that at a given probability the % of positive examples with the same or higher probability is the same as the probability value. Not "or higher": at a given probability the % of positive examples with *the same* probability is the same as the probability value.

In order to verify calibration, you have to look at the same probability +/- some tolerance.

With a prediction threshold of 0.5, all examples are predicted to be negative, so the error rate is 5%. The standard threshold is 0.5 because that is optimal if false negatives and false positives have the same cost. Yes, error rate os 1-accuracy.

What is the max possible lift at 10% of this classifier? In this context, 10% means taking the tenth of test examples with the highest predicted scores.

There are many classifiers (SVMs for example) that output real-valued scores that are not well-calibrated. For these classifiers, it makes sense to rank test examples and consider the highest-ranked ones. These are the examples with highest probability, even though we don't know what their actual probabilities are.
Dave  229
06-06-2009 01:17 PM ET (US)
Edited by author 06-06-2009 01:54 PM
I have a question about the meaning of well-calibrated probability and some of the questions on the quiz that we had concerning this.

As I understand it a well-calibrated probability means that at a given probability the % of positive examples with the same or higher probability is the same as the probability value. Is this correct?

On part b your answer says "With a prediction threshold of 0.5, all examples are predicted to be negative, so the error rate is 5%....". How did you pick a threshold of 0.5? Or is this supposed to be 0.05? Also is the "error rate" defined to be 1 - accuracy?


Based on this understanding, the part (c) question asks "What is the max possible lift at 10% of this classifier." I am left wondering what does the 10% mean? Is that a 10% probability value, or something else? If it is the probability value then if 0.1 is between (or equal to) a and b then wouldn't the answer be 0.1/0.05=2?
Charles ElkanPerson was signed in when posted  228
06-02-2009 08:21 PM ET (US)
/m199: For anyone interested in yesterday's talk "MINING LARGE-SCALE CELL PHONE DATA" by Jean Bolot from Sprint. the paper was published at KDD last year and can be found at https://research.sprintlabs.com/publications/uploads/dpln.pdf
Charles ElkanPerson was signed in when posted  227
06-02-2009 06:48 PM ET (US)
/m226: Thanks for finding this interesting paper. The first author is now a professor of Biology at UCSD, coincidentally.
Kristen Jaskie  226
06-02-2009 05:33 PM ET (US)
/m224: There is a paper I found titled: "Incremental and Decremental Support Vector Machine Learning" that was published in NIPS 2000 and so would have been available at the time this paper was written. The abstract states that "An on-line recursive algorithm for training support vector machines, one vector at a time, is presented." I believe that this is the definition of online that the authors of our paper used.

http://cbcl.mit.edu/projects/cbcl/publicat...enberghs-nips00.pdf
RSS link What's this?
All messages            226-241 of 241  210-225 >>
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.