QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: 134A general questions, Spring 2001
Views: 1516, Unique: 628 
Subscribers: 0
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages            23-38 of 38  7-22 >>
Who | When
Messagessort recent-bottom   
Post a new message
 
fda  38
05-05-2001 06:02 PM ET (US)
Deleted by author 05-05-2001 06:03 PM
pom  37
04-28-2001 05:01 PM ET (US)
i just wanted to try quicktopic
Charles Elkan  36
04-16-2001 06:14 PM ET (US)
You should write your own "before" and "after" functions using the builtin "split" or "explode" functions.

Or, there are alternative ways to use regular expressions for information extraction. Whatever strategy you choose, it should be well-organized and efficient, and as clear and simple as possible. One part of your report should be an explanation of your strategy for information extraction.
curious  35
04-16-2001 02:01 PM ET (US)
The methods before and after don't seem to exist. Is there some sort of file we need to require? Or were you suggesting in the notes that we should write our own methods before and after to clean up our code?

    $text = after("",$text);
    $title = before("
",$text);
Hector Jasso (TA)  34
04-14-2001 08:16 PM ET (US)
******* PLEASE READ *******

A new discussion board replacing this one has been set up in
DISCUS. The URL is:

http://discus.ucsd.edu

DISCUS supports threads, which should make everyone's life
easier.

This board (QuickTopic) will not be erased, but please start using Discus
instead.

I have set up sub-topics for:

- General Questions
- Project 1
- Project 2
- Project 3
- Tests

Hector Jasso

******* PLEASE READ *******
Charles Elkan  33
04-14-2001 07:40 PM ET (US)
The information provided by "It's Easy" is correct. Congratulations on discovering how the Homeadvisor site works!

If you want to use this information source to do the project, that is fine. In this case, you should also pull maps from Yahoo and some interesting relevant information of your choice from an HTML source. Your project should demonstrate that you know how to do simple information extraction from HTML using regular expressions.

If a house was last sold before 1996, it is ok that you don't have the sale price for it. Your recent comparable sale should always be from 2000 or 2001, and should have a known sale price.

The Microsoft Homeadvisor site is an example of what not to do in web site design, for at least four reasons:

(1) It does not work properly in Netscape browsers. (This may be deliberate Microsoft policy, for anti-competitive reasons.)
(2) It is slow. In comparison, the basic GET operation from the server 63.78.183.70 is very fast.
(3) The combination of server-side generated Javascript with style sheets and basic HTML is bad software engineering: tricky programming, with no clearly visible benefit. For example, house info should be packaged as XML, not Javascript.
(4) Security is very weak. A determined programmer could take much more information from the server 63.78.183.70/CICS/CWBA than the owners of this data would want to release.

CICS is IBM software that is over 30 years old that manages information transactions over a wide-area network. CWBA is the CICS web interface software that allows "legacy" CICS software to be used with web sites. For more info on CICS see http://www-4.ibm.com/software/ts/cics/
Alan  32
04-14-2001 07:07 PM ET (US)
we're still in the process of setting up a web server with CURL-enabled PHP. this should be ready by Monday at the latest.
photon  31
04-14-2001 03:53 PM ET (US)
Where is the cURL library installed in ieng9? I want to setup the cURL path, so I can use the functions.
Thanks
Alan  30
04-14-2001 12:17 PM ET (US)
on the topic of the homeworth site, we now know a few things about the web server serving this content:

  • it uses SSL
  • it checks for particular "user agents" (i.e. in most cases, this means your browser)
  • it requires that the client store and return cookies


in order to handle these cases, we are in the process of making a web server with an enhanced version of PHP installed. the enhancement involves software called CURL (http://curl.haxx.se/). this added functionality in the PHP language will allow you to deal with these complications. point your browser at http://www.php.net/manual/en/ref.curl.php for information on the PHP-CURL interface. you'll need to do at least the following tasks using this interface:

  • force the SSL version to version 2; the server doesn't appear to always correctly negotiate between version 2 and 3.
  • tell CURL to follow URL redirections
  • tell CURL to masquerade as an "acceptable" browser
  • retrieve, store, and send cookies


if you take some time to learn the PHP-CURL interface, you'll be in good shape when the server is up.
Alan  29
04-14-2001 11:58 AM ET (US)
instead of office hours on Monday (4/16), i'm going to have lab hours in the second floor lab, from 1pm-2pm. if you were planning on coming to regularly scheduled office hours at 9am, and you can't come to the lab at this new time, send me email.
wondering  28
04-14-2001 01:21 AM ET (US)
for finding the sale price of the most recent sale, the yahoo site http://classifieds.yahoo.com/reinfo/homecompsquery.html only gives sale prices for all records 1996 and later. If the house was last sold before 1996, is it okay that we don't have a sale price?
Its Easy!  27
04-14-2001 01:00 AM ET (US)
Edited by author 04-14-2001 01:00 AM
Take a look closely at homeadvisor's source. The simply "yank" alll the data from an HTTP GET into java script variables. Take a look at http:// 63.78.183.70/CICS/CWBA/HVEJS?a=STREETADDRESS&c=&s=&z=ZIP&p=&d=&t=1&pt=1 (I put a space between in the url so that this board doesn't snip it) and see what you get. This is NOT html, so your browser wants to save it somewhere. Look at it via a text viewer. This is super easy to parse. If you get tricky, this is the only site that you need to pull data from. This will give you map coordinates and everything.

So to sum up, this is 100 times easier than doing SSL, 10 times easier than parsing someone else's html and makes the project quick and easy.
Alan  26
04-13-2001 06:36 PM ET (US)
Edited by author 04-13-2001 06:49 PM
"134A Student" is correct in that the information might not be readily available at the same location in the output where one would expect to find it. however, if the information is able to be displayed in a javascript window, there must be a way to extract it from the page source. in the best case, there is some javascript array which stores all the data, which is arguably easier to parse than some HTML with the information embedded within it. in the worst case, you'd essentially need to write a javascript interpreter. i haven't looked at the referenced site, so i don't know which case is more applicable. however, if it is the latter, i humbly suggest that this is going to complicate the project more than it's worth.
134A Student  25
04-13-2001 02:11 PM ET (US)
Minh,
Go to the home advisor website: http://homeadvisor.msn.com and scroll down near the middle of the page where it asks for a street address and zip code. enter an address and zip code( for example: 10646 KEMERTON RD 92126 ) and view the source on the resulting page.

instead of using html to say something like:

price of home = 200,000

it uses javascript and has:

var string = "price of home = " + priceVariable;
document.write(string);
Minh  24
04-13-2001 01:46 AM ET (US)
For the javascript.
Javascript being client side, it should always be possible to find the value of a variable : var price = 100;
but it is true that if they assign new values to that variable all the time (price=former_price*1.4;), it could be hard to find out what it is equal to.
I had a look at the msn website but didn't find this case. Could you provide the url, or how you found the page where you couldn't access the information ?
134A Student  23
04-12-2001 08:15 PM ET (US)
just wanted to mention that the alternative site mentioned in class http://homeadvisor.msn.com doesn't seem to work because the information is not actually displayed in html but is in javascript variables. is there a way to access these variables with php?

i can do it if i stick the page in a frame with some javascript code looking into that page, but that wouldn't be using php.
RSS link What's this?
All messages            23-38 of 38  7-22 >>
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2006 Internicity Inc. All rights reserved.