.comment-link {margin-left:.6em;}

.code

.. Hello Earthling .. \o/ :)

encoding urls
Tuesday, June 03, 2008

So I work on this application which has all its code in c++. I need to communicate with an http server alot, so i frequently send data through wininet win32 apis (no mfc wrappers). So anyone who sends raw text data knows that there are some kind of characters that really messes up your requests to the server, like "#", {, & etc. If you want to deal with this data and convert it to something and is legit according to the http specs, you need to do a conversion called url encoding (i think :p). Its basically pretty simple, all that is needed to be done is convert the nasty characters to their hex ascii equivilent and prefix it with a '%' sign. So for example if you have a new line as \r\n, it should become %0D%0A, which is 0D for \r and 0A for \n respectively. So as I was searching for an api that could make this easy for me I found some interestingly named api called InternetCanonicalizeUrl. I used this api and it helped a little, but it does not work for all the nasty characters. For example the first weird thing that you discover about this api is that it strips the \r\n from your data if they exist anywhere so your data is left with no line breaks (not to mention "#"). So a friend suggested that why not we write a code to convert all the characters to their ascii hex equivilent and leave the obvious ones like a-z and 0-9 as they are. So we came up with this simple code to do the conversion and it works so far for all the data that we our sending to our server:


#define ENCODE_BUF_LENGTH 10000
void UrlEncodePlz (char * src, char * destallocatedbuf)
{
char * buf = src;
char tmpbuf[ENCODE_BUF_LENGTH];
memset(tmpbuf, 0, ENCODE_BUF_LENGTH);
char cbuf[10];
for (int i = 0, x=0; i< strlen (buf); i++)
{
char c = buf[i];
if ( ((c>='a' && c<='z') || (c>='A' && c<='Z')
|| (c>='0' && c<='9')))
{
tmpbuf [x++] = buf[i];

}
else
{
sprintf (cbuf, "%%%02X", c);
strcat (tmpbuf, cbuf);
x+=strlen(cbuf);
}
}

strcpy(destallocatedbuf, tmpbuf);
}

Ok some words about the above code: it may not seem so efficient in terms of the size of the tmpbuf that I took which is 10000. Its just something I choose, you can of course pick any number that suits you, but know this that for a single \n character the hex equivilent is %0A which is 3 characters. I am really bad in naming functions and variables (5 years of programming and yes I still dont do much of the so called *engineering* formalities well). I think this code is going to give enough good shape to your data that will guarantee its safe delivery to the server.


Comments: Post a Comment

Links to this post:

Create a Link



<< Home



Blogs:

MSDN Blogs
Joel Pobar
Don Syme

Friends:
Adeel
Aqeel
#Fahad
Haroon
Omer
Muhammad Ali
Lahore Food Blog

Links:
Rotor
CodeGuru
Mozilla
OpenSourceNokia
Tech Blog

Languages:
IronPython
F#

Archives

June 2004   July 2004   August 2004   September 2004   October 2004   November 2004   December 2004   February 2005   April 2005   June 2005   July 2005   September 2005   October 2005   November 2005   June 2007   December 2007   January 2008   March 2008   April 2008   June 2008   October 2008   February 2009   May 2009   June 2009   July 2009   August 2009   March 2011   June 2011   July 2011   September 2011   October 2011   November 2011   December 2012   May 2013   October 2013   May 2014   March 2015   July 2015   August 2015   December 2015   March 2016   July 2016   October 2016   November 2016  

This page is powered by Blogger. Isn't yours?