Saturday, August 19, 2006

HTTP: Proxy Design Considerations.

At first let’s make short description of what HTTP proxy does.

Basically, it receives HTTP requests and routes them to remote web server or another proxy.

How does proxy know where to send requests?
Well, in order to known that proxy has to parse incoming HTTP requests and obtain URI part of the HTTP request.

Note:
HTTP request consists of header line, headers and values and possibly content. Header line with headers is terminated with double CRLF sequence ( CRLF stands for carriage return and line feed or  \r\n escape characters). Then may or may not come content (it depends on request type GET, POST etc).

So, the workflow will be: proxy receives HTTP request, parses/analyzes it and routes to appropriate server or another proxy.

How efficient is that?

Well, if we want proxy with ability to process HTTP content, then we'll design it in such a way that whole HTTP request's content will be received by proxy and then parsed/analyzed ( I will not cover that in this post). But if we want our proxy to merely route requests then the approach described above will be very inefficient. Because total size of HTTP request can be quite large, receiving it completely can lead to great memory consumption.

Solution here can be quite simple. HTTP header contains all the info proxy needs to route the request. So, proxy can be designed in such a way that it will receive only full HTTP header, parse/analyze it.  And if there is content pending it will be immediately routed to destination pointed out by request's header.
An indication of that fact that content is pending is: we have HTTP POST request, Content-Length header is bigger then 0. 

 This approach will be more efficient, since it assumes that less memory will be allocated to process one HTTP request. Also this approach will speed up traffic through proxy.

No comments:

Post a Comment