PDA

View Full Version : Parsing HTML Data


dtb
02-24-2005, 11:01 PM
I'm a new user for ams5 and don't know mutch about programming.

I'm looking for a way to parse a html file in order to change data in it. In AMS4 I use the find delemited string function and get delilimited string function to find the code and change them all.

It dosent seems to work in ams5, now i've been reading this forum and found some stuff about a function [SubStringSearch(sStringToSearch, sFind)] but can't seems to make it work.

What a basicaly need to do is let AMS5 find in the HTML file all the following code:

<a href="bla bla bla"> and change the "bla bla bla" part. Take note that the "bla bla bla" part is not always the same link on my web page.

If anyone can help me it would be greatly appreciated..

thanks.

Corey
02-24-2005, 11:05 PM
Hi. I'm not 100% sure about your exact context but you can read an .html file into a string using TextFile.ReadToString() and then you can use String.Replace() to replace text, and rewrite the .html file using TextFile.WriteFromString(). That's one way anyhow, hope that helps. :)

dtb
02-24-2005, 11:33 PM
Thanks corey, but I don't really understand. Maybe this could help you help me.

What I did in AMS4 was d/l my web page then take only the links and build a new webpages (kinda using templates)
so I could update my webpage look with the only a push of a button on my desktop.

Now in AMS5 I can't get it to work since (String.GetDelimetedString & String.CountDelimitedStrings) don't exist.

I read the forum and found some ideas on how to do it (table etc...) but can't seems to be able to make it
work, so if someone can help me with a sample code i'm sure I could get it to work.

thanks.

------[SAMPLE CODE FROM AMS4]------

//SUBMIT TO WEB
%HT% = Internet.SubmitToWeb (POST, "http://www.mysite.com/index.html","")

//GET ONLY RELEATED DATA I NEED
%HT% = String.GetDelimetedString ("%ht%","<div id="Layer1"",1)
%HT% = String.GetDelimetedString ("%ht%","</div>",0)
//COUNT HOW MANY ITEMS
%total% = String.CountDelimitedStrings ("%ht%", "a href='")
//START GETTING DATA AND BUILD A STRING
%a% = Evaluate (1)
WHILE (%a% <= %total%)
%item% = String.GetDelimitedString("%ht%", "<a href='", %a%)
%item% = String.GetDelimitedString("%ht%", "</a>", 0)
//GET THE URL
%url% = String.GetDelimitedString("%item%", "'", 0)
//GET THE NAME
%title% = String.GetDelimitedString("%item%", "<font color=", 1)
%title% = String.GetDelimitedString("%title%", ">", 1)
%title% = String.GetDelimitedString("%title%", "<", 0)
%a% = %a% +1
%data% = String.Insert("%data%", "%title%,%url%;;",0)
END WHILE
//WRITE T0 TEMP FILE
TextFile.Write ("%tempdir%\data.dat", %data%)

//THEN I REFORMAT THE CODE
%html% = " "
%TextFile% = TextFile.Read("Tempdir\data.dat")
%NewString% = StringCountDelimitedStrings ("%TextFile%", ";;")
%a% = Evaluate (0)
WHILE (%a% <= %NewString%)
%item% = String.GetDelimitedString ("%textFile%",";;",%a%)
%title% = String.GetDelimitedString ("%item%",",",0)
%url% = String.GetDelimitedString ("%item%",",",1)
%html% = String.Insert("%html%","<TR><TD>&nbsp;</TD><TD><img src="images/dl.gif"><A HREF="%URL%" >%title%</A></TD></TR>",0)
%a% = Evaluate (%a%+1)
END WHILE
//WRITE FINAL HTML
TextFile.Write ("%tempdir%\newhtml.html","<!DOCTYPE HTML PUBLIC ...%html%...">

------[SAMPLE CODE FROM AMS4 END]------

Corey
02-25-2005, 12:02 AM
Hi. Sure, here's the best place to get started:

http://www.indigorose.com/webhelp/ams50/How_do_I/Load_and_display_a_text_file.htm
http://www.indigorose.com/webhelp/ams50/Program_Reference/Actions/String.Replace_Examples.htm
http://www.indigorose.com/webhelp/ams50/How_do_I/Write_text_to_a_file.htm

They apply to .html files as well as text files. Simply use those three actions to:

1. Read your existing HTML file into a string.
2. Replace the necessary text/code.
3. Resave the new .html.

There are full working examples on the pages linked to above. Hope that helps. :)

dtb
02-25-2005, 12:08 AM
Well not really cuz this will only allow me to change the text inside a string, what I need to do, is get all the needed data from the file (in this case the all the url and all the names of the link) and put them back into another file.

The 3 exemples you provided did not help me with getting all the data.

thanks

Protocol
02-25-2005, 12:43 AM
Hey there,

Don't worry. I was an avid AMS4 user and the switch was pretty hard for me at first, but it got a LOT better! =) Trust me when I say you can do everything in AMS5 that you could do in AMS4...and a lot more...

Download the html file from the web to your temp directory...

Read the text file (.html) to a string (read to text works on all forms of basic text documents such as html)

Do a search for <a href=" and it will return the first instance of that (by default). Save the returned location number (the number of characters it counted in the string before it reached the first letter of the substring you were searching) to another variable

Then do another search starting from that number (not the default which is 1) for ">

You will then have the starting point and ending point for <a href="bla bla bla"> (if you add 2 for the "> at the end)

If you use basic math, you can use the original file string (the html text) and cut off everything before bla bla bla"> and everything after "> (roughly) you will then get bla bla bla

At which point you can use the original text and do a replace function on the original text string.

Sounds difficult, but it's actually very simple and can be done using a few drag and drop lines...probably 7 or so.

It could have been written in the same time I wrote this... =(

Anyways, it is possible and I just did something similar tonight as a matter of fact. Don't give up...likely someone will post a working demo, but respect their time by learning from it... ;)

Corey
02-25-2005, 12:45 AM
You can also use String.Find to locate specific points to replace data. Sounds like your best bet is to just get in there and start experimenting with the string functions, I think it will become apparent to you quite quickly. :) Here's String.Find():

http://www.indigorose.com/webhelp/ams50/Program_Reference/Actions/String.Find_Examples.htm

And a page on string manipulation you might find helpful:

http://www.indigorose.com/webhelp/ams50/Scripting_Guide/String_Manipulation.htm

Hope that helps. :)

dtb
02-25-2005, 01:12 AM
Thanks Protocol,

I use a simple website to test your code and i must say it works perfectly.

Here is the resulting code to get a IP from http://www.findmyip.com/

...obvious http submit here using RAWDATA as the string

--Parse Data
start = String.Find(RAWDATA, "is:<br>", 1, false);
start = start+7
endd = String.Find(RAWDATA, "</FONT>", start, false);
tmp = (endd - start)
ip = String.Mid(RAWDATA, start, tmp);
result = Dialog.Message("And your IP is:", ip, MB_OK, MB_ICONINFORMATION, MB_DEFBUTTON1);

--End Code

that it?

now if I add a do while loop how can I do a String.Insert ?

thanks