English Post your requirements here

Computing a web fat index?

Bernard Paques -- le 16 mar. 2004 à 11:10 GMT, depuis nearby-an-airport
[b]YACS Leader[/b]

We introduce in this page a brand new method to help people believing that the WWW acronym means 'World Wide Wait'. Too often bad response times are due to fat content. Keep reading, and you will discover the equivalent of the body fat index, but adapted to web pages.

[title]How much bits are necessary to transmit a message?[/title]

This article is based on information theory, that states that the actual value of any messge is inversely proportional to the probability of receiving it.

Let's assume that a message has the probability p to appear on the screen of some user. Then this message conveys b = -log(p) bits of information. This means that in theory, only b bits would be necessary to transmit this message.

In the actual world we have to add a lot to this, including: human language rules (which have not been optimized to computing transmissions, but to uncertainties characterizing mankind), presentation overhead (i.e., style sheets, colors, images) and, of course, network overhead (i.e., error recovery mechanisms).

[title]How does this translate to web communications?[/title]

How much bytes are necessary to display the infamous 'Hello World!' message? Well, it depends. Literally, this message has only 12 characters. It is likely that a shorter equivalent could be found thanks to the information theory. But if the message is downloaded from a remote server and displayed in a web browser, it will take by far more than that.

Let assume a standard HTTP request to get it, such as the next one, which has 382 bytes:
GET /hello.html HTTP/1.1
Host: www.server.com
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-powerpoint, application/vnd.ms-excel,
application/msword, application/pdf, */*
Accept-Encoding: gzip, deflate
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.0; T312461; .NET CLR 1.1.4322)
Connection: Keep-Alive


The message will be packaged as a valid HTML document, that is 111 bytes:
<html>
<head><title>The infamous hello page</title></head>
<body>
<h1>Hello World!</h1>
</body>
</html>


Also, the server will prefix the HTML with some HTTP headers, meaning 417 more bytes:
HTTP/1.1 200 OK
Date: Tue, 09 Mar 2004 15:35:37 GMT
Server: Apache/1.3.20 (Win32) PHP/4.0.6
X-Powered-By: My wonderful software (http://www.mywonderfulsoftware.com/)
Set-Cookie: PHPSESSID=f1b49b404054c7afabb7f1b2d95cd5cd; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-length: 11640
Connection: close
Content-Type: text/html


This makes 382 + 111 + 417 = 910 bytes. These would be transmitted in two TCP segments, meaning an additional overhead of 80 bytes.

Now you have a better understanding of what overhead means to computers. Our message is only 12 bytes in plain English, but 990 bytes have to be exchanged to display it.

Of course on a regular web site we would have this nice background image, and tons of pretty boxes around our main banner message because this is what the corporate policy states about web sites. These days it is common to push up to 50-100 kilo-bytes just to shout 'hello world'.

[title]How to compute the web fat index?[/title]

Refering to the previous analysis, computers are just adding noise on top of the actual transmitted message. Therefore, we will evaluate a noise/signal ratio, expressed in dB, as a fair estimate of fat.

To evaluate the overhead we would use the following algorithm: [list=1] [*] Assess U, the amount of readable and useful bytes displayed to the end user [*] Assess T, the number of bytes transmitted through the network, not counting TCP, UDP, IP, nor Ethernet overhead [*] Compute F = 10 log(T/U) [/list]

Of course, the objective is to achieve a score as low as possible. Ideally, the number of Transmitted bytes would equal the number of Useful bytes, meaning a Fat index of 10 log(1) = 0.

With the previous example, we would have:
U = 12 bytes
T = 990 bytes
F = 10 log(990/12) = 19dB noise/signal ratio


Of course, a single measure is not significant by itself. We need to evaluate the Fat index of several pages, at several sites, to get an idea of what is practical today.

[title]How to proceed?[/title]

Well, that sounds good, but we also need a practical method to achieve actual computations. Ok, here is the way to proceed:

[list=1] [*] With Netscape Navigator, open the web page you want to assess [*] Select all displayed text with Ctrl-A, then copy and paste it into any text editor. Save the file and check its size. This will give an approximate U value. [*] Save the page to your hard drive, and add the size of all involved objects. This will give our T value. [*] Compute F = 10 log(T/U) [/list]

The table below gives you some sample figures for well-known sites.

AddressUseful (bytes)Transferred (bytes)Fat = 10 log(T/U)
http://www.google.com/26312,56217 dB
http://www.yahoo.com/4,139102,81014 dB
http://www.cisco.com/4,187191,38617 dB
http://www.allot.com/1,358122,88020 dB
http://www.microsoft.com/3,51877,61913 dB
http://www.wired.com/7,773117,84212 dB
http://www.lemonde.fr/7,421269,72216 dB
http://www.w3.org/People/Raggett/tidy/36,11247,1041.2 dB
http://www.alistapart.com/1,48630,56613 dB
http://www.airbus.com/1,521120,42219 dB
http://www.sita.aero/2,256203,16120 dB
http://www.yacs.fr/5,429115,40413 dB


As a matter of fact, your web site is ok if the noise over signal ratio is below 15 dB. In the other direction it is difficult to say something useful. How much fat is too much? Well, it depends on your way of life...

Of course, this bare method should be adapted to specific conditions (i.e., content is transferred compressed or not), and to the cacheability of objects that make a web page (images, cascaded style sheets, etc.) Your suggestions are welcome.