Sunday, January 22, 2006

Proxy Server behavior with different HTTP protocol versions

In one of my previous posts I mention HTTP tunnel application that I’m working on. Basically it can be considered http proxy. Lets see now what difficulties can be encountered when implementing proxy server

At present there are 2 official versions of HTTP protocol – 1.0 and 1.1 From proxy point of view most significant difference between them is that be default 1.0 doesn’t support persistent connections with web server, while the other ( 1.1 ) by default supports them. Also to control web server behavior Connection header can be specified. That is Connection: close will signal web server to close connection when response was sent.

In this situation proxy server can have several option modes:
- operate as if there is no proxy server at all – that is if client uses HTTP 1.0 then proxy also uses HTTP 1.0 and the same for HTTP 1.1
- for the client side (local) use client specified protocol, for web server connection (remote) maintain other protocol. As an example proxy server with client works under HTTP 1.0 and with server under HTTP 1.1. Proxy server doesn’t ignore Connection header
- the same as the above except proxy server ignores Connection header that is proxy server by all means tries to maintain persistent connection.

Last scenario can be used when the remote endpoint isn’t web server but another proxy.

In RFC 2616 [HTTP/1.1] proxy behavior is described like this:
“It is especially important that proxies correctly implement the
properties of the Connection header field as specified in section
14.10.

The proxy server MUST signal persistent connections separately with
its clients and the origin servers (or other proxy servers) that it
connects to. Each persistent connection applies to only one transport
link.

A proxy server MUST NOT establish a HTTP/1.1 persistent connection
with an HTTP/1.0 client (but see RFC 2068 for information and
discussion of the problems with the Keep-Alive header implemented by
many HTTP/1.0 clients).”

From this part of the RFC we can see that the last operation mode can be considered as “hard optimization”.

Another tricky moment while implementing proxy server for HTTP 1.1 is Content-Length header. Generally setting this header simplifies content retrieval from the server. However, when content size is big - modern web servers can omit Content-Length header in order to boost performance and reduce the amount of resources allocated on the server. In practice we have the situation, when client is receiving data without any clue how much data is still not received or where will be the end of the data stream.

Proxy server has to know about this issue and handle it correctly. Here we also have 2 options: proxy server can receive whole data from the server, set Content-Length header and transmit whole data to the server, and the second option will be redirecting data stream to the client as if there is no proxy at all. One of the pitfalls here is that if content is too large proxy server can consume great amount of system resources ( 1 option ). While in the 2-nd mode proxy doesn’t know when data stream will finish.
At first I’ll implement 1 mode ( that is caching content on the proxy and then sending it to client ), then probably I’ll experiment with the second option.