Software Development

Monday, April 27, 2009

Discovering System Endianess

When doing network programming we send bytes to and from peers. These bytes sometimes constitute complex protocols.

Let us assume we have simple message exchange protocol with some header and some data. Like in the picture here.

Our prefix contains the size of the message. When peer receives bytes from network it reads header first (which is fixed length) then decodes information from the header (data size etc) and finally tries to read the specified number of bytes from the network.

Network applications usually have at least two peers. These peers can be hosted on different systems. Say, peer1 is working on Windows OS, while peer2 - is Java app working in Unix environment.
Our protocol contains integer value that holds message's data size. This value is 4-bytes long.

Windows OS works with numbers that are considered to be little endian, that is least significant byte is placed on the lowest address. Vice versa for Java number. This division is known as endianess.

To transfer multi-byte values over network in uniform manner an agreement was established. Multi-byte data is written into network in big endian byte order.

Earlier I have said that Windows is little endian system, do you really believe me?
If you do not - check yourself, here's how you can do this in C#.

First approach:

int number = 0x00000001;
byte[] bytes = BitConverter.GetBytes(number);
bool isBigEndian = bytes[0] == 0x00;

Second approach (for geeks):

int number = 0x00000001;
int* p = &number;
bool isBigEndian = p[0] == 0x00;

And finally the third one:

bool isBigEndian = !BitConverter.IsLittleEndian;

When you want to write self-contained code, the above approaches can be used to determine endianess of the system your code operates on.

P.S. Third method is the best :).

Thursday, March 26, 2009

Windows Vista Defragmentation Tools

Windows Vista by default uses NTFS file system. Sooner or later files on it will start to fragment.

Fragmentation can lead to significant disk I/O performance decrease. Common way how to handle this problem is a process called defragmentation

It turns out that Vista's defragmentation tools is little bit oversimplified

As you can see, user interface lacks volume fragmentation information. So it is hard to say whether my volume needs defragmentation.

Be afraid not as Vista has command-line based tool - defrag.exe.

Windows Disk Defragmenter
Copyright (c) 2006 Microsoft Corp.
Description:  Locates and consolidates fragmented files on local volumes to
              improve system performance.

Syntax:  defrag  -a [-v]
         defrag  [{-r | -w}] [-f] [-v]
         defrag       -c [{-r | -w}] [-f] [-v]

Parameters:

Value         Description

      Specifies the drive letter or mount point path of the volume to
              be defragmented or analyzed.

-c            Defragments all volumes on this computer.

-a            Performs fragmentation analysis only.

-r            Performs partial defragmentation (default). Attempts to
              consolidate only fragments smaller than 64 megabytes (MB).

-w            Performs full defragmentation. Attempts to consolidate all file
              fragments, regardless of their size.

-f            Forces defragmentation of the volume when free space is low.

-v            Specifies verbose mode. The defragmentation and analysis output
              is more detailed.

-?            Displays this help information.

Examples:

defrag d:
defrag d:\vol\mountpoint -w -f
defrag d: -a -v
defrag -c -v

Here's sample output on my system volume (C:),
defrag c: -a
Windows Disk Defragmenter
Copyright (c) 2006 Microsoft Corp.

Analysis report for volume C: VISTA

Volume size = 70.00 GB
Free space = 21.84 GB
Largest free space extent = 6.20 GB
Percent file fragmentation = 4 %

Note: On NTFS volumes, file fragments larger than 64MB are not included in the fragmentation statistics

You do not need to defragment this volume.

It is also possible to increase the amount of information returned by the tool, just specify -v switch.

Now we have some information about disk fragmentation and hence can decide when to start defragmentation process.

P.S. I had a strange filling when writing this post. In an operating system like Windows Vista with redesigned user interface it is awkward too use command-line tools to perform common task like disk defragmentation.

Sunday, March 15, 2009

Image Watermarking

We all know that when image is posted on the internet it no longer belongs to you.

It can be arguable, but nevertheless, any user with browser can simply save it on HDD and you can do nothing about it.

One of the ways how to control image distribution is Digital watermarking.

In my case I had ~100 images that had to be watermarked. There are two ways how to do this: manually (e.g using Photoshop) or write some code to do the job automatically.

In case of simple watermarks: horizontal/vertical text everything is simple, but things become harder when watermark text should be positioned diagonally.

After poking around the web I found this brilliant article. The code did it job well, so I wired it into small console app, and voilà - 105 files watermaked in less then 20 seconds.

Monday, January 19, 2009

Searching for Similar Words. Similarity Metric

How one can find out if two or more words are similar? I do not mean semantically similar (synonyms aren't taken into consideration), but visually similar.

Consider, these two words, "sample1" and "sample2". Do they look similar? Well, at least they have the same start - "sample". The next two have merely common letters in them: "fox" and "wolf".

One of the methods that can be used to measure words similarity is Euclidian distance. It is used to measure distance beetwen two point in space.

The code to measure similarity of 2 strings:

public static double Euclidian(string v1Orig, string v2Orig)
{
    string v1, v2;
    if (v1Orig.Length > v2Orig.Length)
    {
        v1 = v1Orig; v2 = v2Orig;
    }
    else
    {
        v1 = v2Orig; v2 = v1Orig;
    }

    double d = 0.0;
    for (int i = 0; i < v1.Length; i++)
    {
        if ( i < v2.Length )
           d += Math.Pow((v1[i] - v2[i]), 2);
        else
           d += Math.Pow((v1[i] - 0.0), 2);
    }
    return Math.Sqrt(d);
}

Using the code above we can get numbers that measure words similarity:
"sample1" and "sample2" will give 1.0. While "wolf" and "fox" give 104.10091258005379. Words that are identical will give 0.0. Thus the less number you get the more two words are similar.

In the context of Euclidean distance, "fox" and "wolf" have greater distance then "sample1" and "sample2".

This measurement approach can be used when searching for word groups in the text.

Monday, November 03, 2008

Bit Flags: The Simple Way

Time from time we face the need or (for some of us) an opprotunity to mess with the bit fields. As we all know bytes consist of bits. Bit can have two values "0" and "1".

Using this knowledge we can store several flags under single byte. In one byte of information we can efficiently store eight one bit flags (1 byte has 8 bits). If we use int values, which has four bytes - we can store 32 bit flags (4*8=32).

So, lets define 8 bit flags. We should receive following picture.

00000001 - 1-st flag
00000010 - 2-nd flag
00000100 - 3-rd flag
00001000 - 4-th flag
00010000 - 5-th flag
00100000 - 6-th flag
01000000 - 7-th flag
10000000 - 8-th flag

There are two ways how to define these flags in C#. Here is the first way: convert each binary number into hexadecimal representation (decimal can be also used). The result will be like this:

[Flags]
public enum FirstApproach : byte
{
    First = 0x01,
    Second = 0x02,
    Third = 0x04,
    Forth = 0x08,
    Fifth = 0x10,
    Sixth = 0x20,
    Seventh = 0x40,
    Eighth = 0x80, 
}

We obtained rather compact definition. But if someone cannot convert binary (01000000) into hex (0x40) number very fast - she will have to use some special tool like calc.exe :). I consider the above mentioned way little bit tiresome.

Here is the second approach: we define "base" at first.

[Flags]
public enum DefBase : byte
{
    First = 0,
    Second = 1,
    Third = 2,
    Forth = 3,
    Fifth = 4,
    Sixth = 5,
    Seventh = 6,
    Eigth = 7,
}

Then, using "base" we can define our flags:

[Flags]
public enum SecondApproach : byte
{
    First   = 1 << (byte)DefBase.First,
    Second  = 1 << (byte)DefBase.Second,
    Third   = 1 << (byte)DefBase.Third,
    Forth   = 1 << (byte)DefBase.Forth,
    Fifth   = 1 << (byte)DefBase.Fifth,
    Sixth   = 1 << (byte)DefBase.Sixth,
    Seventh = 1 << (byte)DefBase.Seventh,
    Eighth  = 1 << (byte)DefBase. Eigth, 
}

Using 2-nd approach does not involve conversion from binary to hex. Using this method we can reuse DefBase multiple times to create required bit flags. Defining next bit flag is no more pain.

If application will have a lot of bit flags declarations then it is more usefull to use 2-nd approach as it can save time of not using additional tools.

Thursday, October 30, 2008

Handling Windows Operating System Version Mess

Operating System (OS) like any other software should have a version. So do new OSes from Microsoft.

Sometimes OS version is crucial for the installation software development process. Some products can work on XP and Vista, but cannot work on Windows Server 2003. MSI (Microsoft Installer) has special properties called VersionNT and WindowsBuild to ensure OS version.

Quite logically, eh?

Official documentation gives a reference table with OS versions and build numbers.

Using information from table we can define following WiX condition:

<Condition Message='This software requires the Windows Server
 2003 to operate correctly'><![CDATA[VersionNT = "502"]]>
</Condition>

At the bottom of this table we can see strange thing: Vista and Windows Server 2008 have the same VersionNT and WindowsBuild numbers.

...
Windows Vista	600	6000	Not applicable
Windows Vista Service Pack 1 (SP1)	600	6001	1
Windows Server 2008	600	6001	Not applicable
...

So, at this time to determine whether your installer runs on Windows Server 2008 you have to rely on ServicePackLevel property. It is bad, very bad - because when Microsoft will release service pack for the Server 2008 your WIX condition will not be telling truth...

Nontheless, here's how to include Windows Server 2008 launch condition into WIX script:

<Condition Message='This software requires the Windows Server
 2003 or 2008 to operate correctly'><![CDATA[Installed OR (VersionNT = 502
 OR VersionNT = "600" AND MsiNTProductType > 1)]]>
</Condition>

Update: In the launch condition above property MsiNTProductType was used to differentiate server from workstation

Friday, July 18, 2008

"Using" magic or working with type aliases

Generics in C# allow us specify and construct rather complex types. For instance, we can create a dictionary that maps Id number with the name:

Dictionary<int, string> idNameMapping;

We can create even more complex data structure, when Id number has to be mapped on another dictionary:

Dictionary<int, Dictionary<string, int>> complexIdMapping;

Nothing can stop us if we want put another types into dictionary (or any generic type) declaration.

You've noticed already that the more types we put into declaration the more clumsier it becomes. I'm not even mentioning typing effort :).
Here's how "using" keyword can help us reduce typing and introduce some level of self-documentation for the code.

using Item = System.Collections.Generic.Dictionary<string, string>;
namespace Sample
{
    using Records = Dictionary<int, Item>;
    public class Controller
    {
      Records recordDictionary = new Records();
    }
}

Isn't that cool? Instead of typing in a lot of nested types with all that < and > we can get "normal" looking type names.

What do I read:

Friday, June 27, 2008

The basics of secure data exchange under TCP

Doing data exchange in plain text is very convenient and easy to implement but what can you do to prevent eavesdropping, tampering, and message forgery of the data you send back and forth? Here's where secure communication comes into play. At present the most common secure communication method is using Transport Layer Security (TLS) or Secure Sockets Layer (SSL). In web context you can see secure data exchange in action when browsing web-sites with HTTPS prefix

In .NET framework secure communications can be done with SslStream class. It can use both TLS and SSL protocols.

TLS and SSL for authentication process use public key infrastructure or PKI. It requires certificates.

Here's nice explanation how to create certificate using makecert utility

After reading and doing what was said in the above mentioned blogpost we should end up with 2 installed certificates. They're depicted on the picture below.

The certificate we'll use will be "vadmyst-enc".

SslStream gives you the look and feel of a common .NET stream.

So, what are the basic steps to start secure communication with SslStream?
Very often the communication happens between server (e.g web server) and client (e.g. browser).
Here are the steps for the server side:

Start listening on specific address and port

When connection is accepted wrap obtained NetworkStream with SslStream

Call SslStream::AuthenticateAsServer

Start doing I/O (in our case that's basic echo server

In code it looks like this:

TcpListener listener = new TcpListener(ipEndpoint);
listener.Start(5);
TcpClient tcpClient = listener.AcceptTcpClient();

SslStream secureStream = new SslStream(tcpClient.GetStream(), false);
     
secureStream.AuthenticateAsServer(serverCertificate);
//use anonymous delegate for simplicity
ThreadPool.QueueUserWorkItem(new WaitCallback(delegate(object unused)
{
    //simple echo server
    byte[] tempBuffer = new byte[1024];
    int read = 0;
    try
    {
        while ((read = secureStream.Read(tempBuffer, 0, tempBuffer.Length)) > 0)
        {
            secureStream.Write(tempBuffer, 0, read);
        }
    }
    finally
    {
        secureStream.Close();
    }
}), null);

serverCertificate is obtained from certificate storage on the local machine:

X509Store store = new X509Store(StoreName.My, StoreLocation.LocalMachine);
store.Open(OpenFlags.ReadOnly);
X509Certificate serverCertificate = null;
for (int i = 0; i < store.Certificates.Count; i++)
{
    serverCertificate = store.Certificates[i];
    if (serverCertificate.Subject.Contains("vadmyst-enc"))
        break;
}
store.Close();

In this post I'll will not cover usage of client certificates to perform client authentication for the simplicity's sake. Client will only authenticate server.
The steps required by the client:

Open TCP connection to the remote server

Wrap obtained NetworkStream with SslStream instance

Call SslStream::AuthenticateAsClient

Begin do the I/O

Source code below demonstrates basic TCP client that transfers data in a secure way.

TcpClient client = new TcpClient();
client.Connect(endPoint);
SslStream sslStream = new SslStream(client.GetStream(), false);
sslStream.AuthenticateAsClient("vadmyst-enc");

byte[] plaintext = new byte[5*1024 + 35];
byte[] validation = new byte[plaintext.Length];
            
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetNonZeroBytes(plaintext);

sslStream.Write(plaintext);
sslStream.Flush();

int read = 0;
int totalRead = 0;
while( (read = sslStream.Read(validation, totalRead, 
                              validation.Length - totalRead)) > 0)
{
    totalRead += read;
    if (totalRead == plaintext.Length)
        break; //we've received all sent data
}
//check received data
for(int i=0; i < plaintext.Length; i++)
{
    if ( validation[i] != plaintext[i] )
        throw new InvalidDataException("Data is not the same");
}
sslStream.Close();

SslStream appeared in .NET framework on version 2.0. As you can see doing secure communications with it is very easy. However, there are number of situations that require additional coding: client authentication using client certificates, using other algorithms when doing secure I/O. I will cover these advanced topics on the next posts. Stay tuned!

Tuesday, June 17, 2008

Hashing in .NET (cryptography related battle tactics)

Those who think I'm going to talk about stuff related to hashish or hash brown are totally not right. (By the way I do like hash brown as well as this great Japanese liquor ;) )

I will be talking about hashing that is related to cryptography and security. Hashing can be described as a process of getting small digital "fingerprint" from any kind of data. Those interested in general information can get it here.

In .NET, security and cryptography related stuff is located in System.Security.Cryptography namespace. Our hero of the day will be SHA1 algorithm. .NET class SHA1Managed implements it. According to .NET cryptography model this class implements abstract class SHA1. The same, by the way, is valid for other hash algorithms e.g. MD5. They both inherit from HashAlgorithm class. It is very likely if new hashing algorithm is added to the .NET framework it will inherit from HashAlgorithm.

There are three ways how to calculate hash value for some data.

Use ComputeHash method

Use TransformBlock/TransformFinalBlock directly

Use CryptoStream

I'll show how to use above mentioned approaches. Let's assume we have some application data


RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
byte[] data = new byte[5 * 4096 + 320];
//fill data array with arbitrary data
rng.GetNonZeroBytes(data);
//initialize HashAlgorithm instance
SHA1Managed sha = new SHA1Managed();

The first way:


byte[] hash1 = sha.ComputeHash(data);

It very straightforward and simple: pass data and get hash output. But this method is not suitable when hash has to be calculated for several byte arrays or when data size is very large (calculate hash value for the binary file).
This leads us to the second way:


int offset = 0;
int blockSize = 64;
//reset algorithm internal state
sha.Initialize(); 
while (data.Length - offset >= blockSize)
{
    offset += sha.TransformBlock(data, offset, blockSize, 
                                 data, offset);
}
sha.TransformFinalBlock(data, offset, data.Length - offset);
//get calculated hash value
byte[] hash2 = sha.Hash;

This way is much more flexible because: we can reuse HashAlgorithm instance (using Initialize method) and calculate hash value for large data objects.
However, to do that we still have to write additional code to read chunks from file and then pass them to TransformBlock method.

Finally, the third way:


//reuse SHA1Managed instance
sha.Initialize(); 
MemoryStream memoryStream = new MemoryStream(data, false);
CryptoStream cryptoStream = new CryptoStream(memoryStream, sha,
    CryptoStreamMode.Read);
//temporary array used by the CryptoStream to store 
//data chunk for which hash calculation was performed
byte[] temp = new byte[16];
while (cryptoStream.Read(temp, 0, temp.Length) > 0) { }
cryptoStream.Close();
hash3 = sha.Hash;

Isn't it beautiful? CryptoStream can use any Stream object to read from. Thus calculating hash value for a large file isn't a problem - just pass FileStream to CryptoStream constructor!
Under the hood CryptoStream uses TransformBlock/TransformFinalBlock, so the third way is derivative from the second one.
CryptoStream links data streams to cryptographic transformations: it can be chained together with any objects that implement Stream, so the streamed output from one object can be fed into the input of another object.

The first approach is good when you're calculating hash values from time to time.
The second and third are best when large part of your application's operation is connected with hash calculations (like using cryptography in network I/O).

Wednesday, May 28, 2008

Seeking CSS Enlightenment

Nowadays it is hard to find software developer or web-designer that doesn't know what CSS is all about.

I must say that until recently, I didn't realize what potential CSS has. I found a place in the web that helps people become Enlightened with CSS power.

Please, welcome to the Zen Garden.

In the above mentioned CSS garden the layout of the same markup or content is changed via different styles applied.

Software Development

Monday, April 27, 2009

Discovering System Endianess

Thursday, March 26, 2009

Windows Vista Defragmentation Tools

Sunday, March 15, 2009

Image Watermarking

Monday, January 19, 2009

Searching for Similar Words. Similarity Metric

Monday, November 03, 2008

Bit Flags: The Simple Way

Thursday, October 30, 2008

Handling Windows Operating System Version Mess

Friday, July 18, 2008

"Using" magic or working with type aliases

Friday, June 27, 2008

The basics of secure data exchange under TCP

Tuesday, June 17, 2008

Hashing in .NET (cryptography related battle tactics)

Wednesday, May 28, 2008

Seeking CSS Enlightenment

Subscribe to This Blog

Blog Archive

Categories

Blogs I Read

About Me

Monday, April 27, 2009

Thursday, March 26, 2009

Sunday, March 15, 2009

Monday, January 19, 2009

Monday, November 03, 2008

Thursday, October 30, 2008

Friday, July 18, 2008

Friday, June 27, 2008

Tuesday, June 17, 2008

Wednesday, May 28, 2008

Subscribe to This Blog

Blog Archive

Categories

Blogs I Read

About Me

Subscribe To