top bar
QuickTopic free message boards logo
Skip to Messages

TOPIC:

Mining of concurrent text and time series

5
Deleted by author 11-16-2002 01:00 PM
4
Dave Kauchak
04-30-2001
04:32 AM ET (US)
One thing that I think that this paper does a very good job of is creating ideas for new research, particularly refining the use of the system and also expanding the uses. As Greg mentioned, I would be interested to see how well the system would do on a larger time scale. Although day trading is interesting, particularly in making a quick dollar, I think an even more interesting experiment would be to examine a much larger time scale on the order of weeks, months or even years. The application could not only be used for monetary gain, but also as a good model for a company’s future value and success.

Another idea that was also curious to me was the way that the news articles were selected. The paper briefly mentions that the stories are simply taken from the relevant stories that Yahoo keeps. I think that the choice of news stories has a strong impact on how well their system would work. It would be interesting to see what exact factor the choice made and also how well the system reacted to noisy or incomplete data.

Dave
3
sameer agarwal
04-30-2001
01:52 AM ET (US)
hello,


frankly I was quite surprised with
:



1. That the authors were able to get correlation with trends with such a cude language model




2. A 5-10 hour window was sufficent to get the correlations. (information does indeed seem to flow fast) but perhaps an explanation for it can be that by the time the news agencies get their news.. so do the traders on the floor.. yahoo biz news is not exactly the source on which people doing real time trading will depend on), so there is an implicit time delay already. The model I think wil have to be caliberated for each news source seperately. A single time window might not be sufficent.





since the length of the trends is variable, it would be interesting to see if its possible to correlate the length of a run (trend) and predict it.



Another things is that there are stories which are relevant to an entire market/sub market or the entire industry.. but which may not be tagged to a particular stock. It might be of some use to build a hierarchial set of models that take a news feed which is tagged to specific industries, to the whole economy and so on.. and correlate their effects with the trends.



Overall I think the paper is rather well written, I especially like the use of t-tests to decide the splits for trends and clustering. But if I am not mistaken the authors did not mention the significance level at which they tested their hypothesis.



sameer
2
Hector Jasso
04-29-2001
06:58 PM ET (US)
I agree with Greg's comment on the time scale of the
experiments. My concern about that is that if one
does not have an infinite credit (an assumption the
authors make in testing their model), then false
positives and false negatives can have a large impact.
For example, take the case with the Timer Warner/AOL
merge that happened this (last?) year. There was a
lot of talk about whether the merger
would happen. So, this would result in many predictions
of steep increase, because the news were constantly
talking about a possible merge. Let's assume that
as long as the merge is not approved, any sell/buy
action results in random gains, corresponding to
daily fluctuations of the market, with no particular
trend. When the merge is approved, what is the actual
gain? I would say small compared to all the buys
and sells that the system did over the months of
speculation about a possible merge. I can think of
many examples when a decision is "in the air" until
someone confirms it, but the important piece of
information is the confirmation, not the speculation.

But maybe more important than that is whether specific words
represent actual pieces of information: I wonder if
the system can differentiate between a note saying that
the new Pentium chip "has no bugs" and one saying that
it "has bugs." They both have the words "has" and "bugs"!
Edited 04-29-2001 06:58 PM
1
Greg Hamerly
04-27-2001
05:09 PM ET (US)
This is a very interesting paper, and well-written. My one curiosity that wasn't fulfilled was the classification error rate on predicting the other four classes besides surge: slight+, no recommendation, slight-, and plunge.

I was surprised at the small time scale at which these experiments were performed. Certainly it is useful for day-trading techniques, but I am curious what the performance would be at larger time scales (say, looking for trends a day or week long).

They used a text/time series alignment window that allowed documents to be predictive up to the time of the trend start. It may be more realistic to give a buffer of time between the last story available and the start of the trend? In other words, if a trend starts at 11 AM, only allow articles before 10:30 AM to be predictive for that trend. I don't know how fast news flows in the stock trading world (I'm sure it's fast), but this may be something to consider.

It's interesting that they cluster the time series data (section 2.1.2) according to slope and confidence, only to realize that they don't need to account for confidence in their model.

- greg
Upgrade to PRO

Upload pictures, personalize your board, and more!

Print | RSS Views: 311 (Unique: 200 ) / Subscribers: 0 | What's this?