Sunday, March 15, 2009

Image Watermarking



We all know that when image is posted on the internet it no longer belongs to you.

It can be arguable, but nevertheless, any user with browser can simply save it on HDD and you can do nothing about it.

One of the ways how to control image distribution is Digital watermarking.

In my case I had ~100 images that had to be watermarked. There are two ways how to do this: manually (e.g using Photoshop) or write some code to do the job automatically.

In case of simple watermarks: horizontal/vertical text everything is simple, but things become harder when watermark text should be positioned diagonally.

After poking around the web I found this brilliant article. The code did it job well, so I wired it into small console app, and voilà - 105 files watermaked in less then 20 seconds.

Monday, January 19, 2009

Searching for Similar Words. Similarity Metric



How one can find out if two or more words are similar? I do not mean semantically similar (synonyms aren't taken into consideration), but visually similar.

Consider, these two words, "sample1" and "sample2". Do they look similar? Well, at least they have the same start - "sample". The next two have merely common letters in them: "fox" and "wolf".

One of the methods that can be used to measure words similarity is Euclidian distance. It is used to measure distance beetwen two point in space.

The code to measure similarity of 2 strings:

public static double Euclidian(string v1Orig, string v2Orig)
{
string v1, v2;
if (v1Orig.Length > v2Orig.Length)
{
v1 = v1Orig; v2 = v2Orig;
}
else
{
v1 = v2Orig; v2 = v1Orig;
}

double d = 0.0;
for (int i = 0; i < v1.Length; i++)
{
if ( i < v2.Length )
d += Math.Pow((v1[i] - v2[i]), 2);
else
d += Math.Pow((v1[i] - 0.0), 2);
}
return Math.Sqrt(d);
}

Using the code above we can get numbers that measure words similarity:
"sample1" and "sample2" will give 1.0. While "wolf" and "fox" give 104.10091258005379. Words that are identical will give 0.0. Thus the less number you get the more two words are similar.

In the context of Euclidean distance, "fox" and "wolf" have greater distance then "sample1" and "sample2".

This measurement approach can be used when searching for word groups in the text.

Monday, November 03, 2008

Bit Flags: The Simple Way


Time from time we face the need or (for some of us) an opprotunity to mess with the bit fields. As we all know bytes consist of bits. Bit can have two values "0" and "1".

Using this knowledge we can store several flags under single byte. In one byte of information we can efficiently store eight one bit flags (1 byte has 8 bits). If we use int values, which has four bytes - we can store 32 bit flags (4*8=32).

So, lets define 8 bit flags. We should receive following picture.

00000001 - 1-st flag
00000010 - 2-nd flag
00000100 - 3-rd flag
00001000 - 4-th flag
00010000 - 5-th flag
00100000 - 6-th flag
01000000 - 7-th flag
10000000 - 8-th flag

There are two ways how to define these flags in C#. Here is the first way: convert each binary number into hexadecimal representation (decimal can be also used). The result will be like this:
[Flags]
public enum FirstApproach : byte
{
First = 0x01,
Second = 0x02,
Third = 0x04,
Forth = 0x08,
Fifth = 0x10,
Sixth = 0x20,
Seventh = 0x40,
Eighth = 0x80,
}
We obtained rather compact definition. But if someone cannot convert binary (01000000) into hex (0x40) number very fast - she will have to use some special tool like calc.exe :). I consider the above mentioned way little bit tiresome.

Here is the second approach: we define "base" at first.
[Flags]
public enum DefBase : byte
{
First = 0,
Second = 1,
Third = 2,
Forth = 3,
Fifth = 4,
Sixth = 5,
Seventh = 6,
Eigth = 7,
}
Then, using "base" we can define our flags:
[Flags]
public enum SecondApproach : byte
{
First = 1 << (byte)DefBase.First,
Second = 1 << (byte)DefBase.Second,
Third = 1 << (byte)DefBase.Third,
Forth = 1 << (byte)DefBase.Forth,
Fifth = 1 << (byte)DefBase.Fifth,
Sixth = 1 << (byte)DefBase.Sixth,
Seventh = 1 << (byte)DefBase.Seventh,
Eighth = 1 << (byte)DefBase. Eigth,
}
Using 2-nd approach does not involve conversion from binary to hex. Using this method we can reuse DefBase multiple times to create required bit flags. Defining next bit flag is no more pain.

If application will have a lot of bit flags declarations then it is more usefull to use 2-nd approach as it can save time of not using additional tools.

Thursday, October 30, 2008

Handling Windows Operating System Version Mess



Operating System (OS) like any other software should have a version. So do new OSes from Microsoft.

Sometimes OS version is crucial for the installation software development process. Some products can work on XP and Vista, but cannot work on Windows Server 2003. MSI (Microsoft Installer) has special properties called VersionNT and WindowsBuild to ensure OS version.

Quite logically, eh?

Official documentation gives a reference table with OS versions and build numbers.

Using information from table we can define following WiX condition:

<Condition Message='This software requires the Windows Server
2003 to operate correctly'><![CDATA[VersionNT = "502"]]>
</Condition>
At the bottom of this table we can see strange thing: Vista and Windows Server 2008 have the same VersionNT and WindowsBuild numbers.

...
Windows Vista6006000Not applicable
Windows Vista Service Pack 1 (SP1)60060011
Windows Server 20086006001Not applicable
...


So, at this time to determine whether your installer runs on Windows Server 2008 you have to rely on ServicePackLevel property. It is bad, very bad - because when Microsoft will release service pack for the Server 2008 your WIX condition will not be telling truth...

Nontheless, here's how to include Windows Server 2008 launch condition into WIX script:
<Condition Message='This software requires the Windows Server
2003 or 2008 to operate correctly'><![CDATA[Installed OR (VersionNT = 502
OR VersionNT = "600" AND MsiNTProductType > 1)]]>
</Condition>

Update: In the launch condition above property MsiNTProductType was used to differentiate server from workstation

Friday, July 18, 2008

"Using" magic or working with type aliases



Generics in C# allow us specify and construct rather complex types. For instance, we can create a dictionary that maps Id number with the name:

Dictionary<int, string> idNameMapping;
We can create even more complex data structure, when Id number has to be mapped on another dictionary:
Dictionary<int, Dictionary<string, int>> complexIdMapping;
Nothing can stop us if we want put another types into dictionary (or any generic type) declaration.

You've noticed already that the more types we put into declaration the more clumsier it becomes. I'm not even mentioning typing effort :).
Here's how "using" keyword can help us reduce typing and introduce some level of self-documentation for the code.
using Item = System.Collections.Generic.Dictionary<string, string>;
namespace Sample
{
using Records = Dictionary<int, Item>;
public class Controller
{
Records recordDictionary = new Records();
}
}
Isn't that cool? Instead of typing in a lot of nested types with all that < and > we can get "normal" looking type names.

What do I read:

Friday, June 27, 2008

The basics of secure data exchange under TCP


Doing data exchange in plain text is very convenient and easy to implement but what can you do to prevent eavesdropping, tampering, and message forgery of the data you send back and forth? Here's where secure communication comes into play. At present the most common secure communication method is using Transport Layer Security (TLS) or Secure Sockets Layer (SSL). In web context you can see secure data exchange in action when browsing web-sites with HTTPS prefix

In .NET framework secure communications can be done with SslStream class. It can use both TLS and SSL protocols.

TLS and SSL for authentication process use public key infrastructure or PKI. It requires certificates.

Here's nice explanation how to create certificate using makecert utility

After reading and doing what was said in the above mentioned blogpost we should end up with 2 installed certificates. They're depicted on the picture below.

The certificate we'll use will be "vadmyst-enc".

SslStream gives you the look and feel of a common .NET stream.

So, what are the basic steps to start secure communication with SslStream?
Very often the communication happens between server (e.g web server) and client (e.g. browser).
Here are the steps for the server side:

  • Start listening on specific address and port

  • When connection is accepted wrap obtained NetworkStream with SslStream

  • Call SslStream::AuthenticateAsServer

  • Start doing I/O (in our case that's basic echo server

In code it looks like this:
TcpListener listener = new TcpListener(ipEndpoint);
listener.Start(5);
TcpClient tcpClient = listener.AcceptTcpClient();

SslStream secureStream = new SslStream(tcpClient.GetStream(), false);

secureStream.AuthenticateAsServer(serverCertificate);
//use anonymous delegate for simplicity
ThreadPool.QueueUserWorkItem(new WaitCallback(delegate(object unused)
{
//simple echo server
byte[] tempBuffer = new byte[1024];
int read = 0;
try
{
while ((read = secureStream.Read(tempBuffer, 0, tempBuffer.Length)) > 0)
{
secureStream.Write(tempBuffer, 0, read);
}
}
finally
{
secureStream.Close();
}
}), null);
serverCertificate is obtained from certificate storage on the local machine:
X509Store store = new X509Store(StoreName.My, StoreLocation.LocalMachine);
store.Open(OpenFlags.ReadOnly);
X509Certificate serverCertificate = null;
for (int i = 0; i < store.Certificates.Count; i++)
{
serverCertificate = store.Certificates[i];
if (serverCertificate.Subject.Contains("vadmyst-enc"))
break;
}
store.Close();
In this post I'll will not cover usage of client certificates to perform client authentication for the simplicity's sake. Client will only authenticate server.
The steps required by the client:
  • Open TCP connection to the remote server

  • Wrap obtained NetworkStream with SslStream instance

  • Call SslStream::AuthenticateAsClient

  • Begin do the I/O

Source code below demonstrates basic TCP client that transfers data in a secure way.
TcpClient client = new TcpClient();
client.Connect(endPoint);
SslStream sslStream = new SslStream(client.GetStream(), false);
sslStream.AuthenticateAsClient("vadmyst-enc");

byte[] plaintext = new byte[5*1024 + 35];
byte[] validation = new byte[plaintext.Length];

RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetNonZeroBytes(plaintext);

sslStream.Write(plaintext);
sslStream.Flush();

int read = 0;
int totalRead = 0;
while( (read = sslStream.Read(validation, totalRead,
validation.Length - totalRead)) > 0)
{
totalRead += read;
if (totalRead == plaintext.Length)
break; //we've received all sent data
}
//check received data
for(int i=0; i < plaintext.Length; i++)
{
if ( validation[i] != plaintext[i] )
throw new InvalidDataException("Data is not the same");
}
sslStream.Close();

SslStream appeared in .NET framework on version 2.0. As you can see doing secure communications with it is very easy. However, there are number of situations that require additional coding: client authentication using client certificates, using other algorithms when doing secure I/O. I will cover these advanced topics on the next posts. Stay tuned!

Tuesday, June 17, 2008

Hashing in .NET (cryptography related battle tactics)




Those who think I'm going to talk about stuff related to hashish or hash brown are totally not right. (By the way I do like hash brown as well as this great Japanese liquor ;) )

I will be talking about hashing that is related to cryptography and security. Hashing can be described as a process of getting small digital "fingerprint" from any kind of data. Those interested in general information can get it here.

In .NET, security and cryptography related stuff is located in System.Security.Cryptography namespace. Our hero of the day will be SHA1 algorithm. .NET class SHA1Managed implements it. According to .NET cryptography model this class implements abstract class SHA1. The same, by the way, is valid for other hash algorithms e.g. MD5. They both inherit from HashAlgorithm class. It is very likely if new hashing algorithm is added to the .NET framework it will inherit from HashAlgorithm.

There are three ways how to calculate hash value for some data.

  1. Use ComputeHash method

  2. Use TransformBlock/TransformFinalBlock directly

  3. Use CryptoStream

I'll show how to use above mentioned approaches. Let's assume we have some application data

RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
byte[] data = new byte[5 * 4096 + 320];
//fill data array with arbitrary data
rng.GetNonZeroBytes(data);
//initialize HashAlgorithm instance
SHA1Managed sha = new SHA1Managed();

The first way:

byte[] hash1 = sha.ComputeHash(data);

It very straightforward and simple: pass data and get hash output. But this method is not suitable when hash has to be calculated for several byte arrays or when data size is very large (calculate hash value for the binary file).
This leads us to the second way:

int offset = 0;
int blockSize = 64;
//reset algorithm internal state
sha.Initialize();
while (data.Length - offset >= blockSize)
{
offset += sha.TransformBlock(data, offset, blockSize,
data, offset);
}
sha.TransformFinalBlock(data, offset, data.Length - offset);
//get calculated hash value
byte[] hash2 = sha.Hash;

This way is much more flexible because: we can reuse HashAlgorithm instance (using Initialize method) and calculate hash value for large data objects.
However, to do that we still have to write additional code to read chunks from file and then pass them to TransformBlock method.

Finally, the third way:

//reuse SHA1Managed instance
sha.Initialize();
MemoryStream memoryStream = new MemoryStream(data, false);
CryptoStream cryptoStream = new CryptoStream(memoryStream, sha,
CryptoStreamMode.Read);
//temporary array used by the CryptoStream to store
//data chunk for which hash calculation was performed
byte[] temp = new byte[16];
while (cryptoStream.Read(temp, 0, temp.Length) > 0) { }
cryptoStream.Close();
hash3 = sha.Hash;

Isn't it beautiful? CryptoStream can use any Stream object to read from. Thus calculating hash value for a large file isn't a problem - just pass FileStream to CryptoStream constructor!
Under the hood CryptoStream uses TransformBlock/TransformFinalBlock, so the third way is derivative from the second one.
CryptoStream links data streams to cryptographic transformations: it can be chained together with any objects that implement Stream, so the streamed output from one object can be fed into the input of another object.

The first approach is good when you're calculating hash values from time to time.
The second and third are best when large part of your application's operation is connected with hash calculations (like using cryptography in network I/O).

Wednesday, May 28, 2008

Seeking CSS Enlightenment

Nowadays it is hard to find software developer or web-designer that doesn't know what CSS is all about.

I must say that until recently, I didn't realize what potential CSS has. I found a place in the web that helps people become Enlightened with CSS power.

Please, welcome to the Zen Garden.

In the above mentioned CSS garden the layout of the same markup or content is changed via different styles applied.

Tuesday, May 06, 2008

Sample code for TCP server using completion ports

As I have promised in my previous post I'm presenting sample code of TCP server that is receiving variable length messages in specific format. Data transfer protocol implies that network messages consist of prefix holding body size and body part.

At first, code defines application state object

/// 
/// Server state holds current state of the client socket
///

class AsyncServerState
{
public byte[] Buffer = new byte[512]; //buffer for network i/o
public int DataSize = 0; //data size to be received by the server

//flag that indicates whether prefix was received
public bool DataSizeReceived = false;

public MemoryStream Data = new MemoryStream(); //place where data is stored
public SocketAsyncEventArgs ReadEventArgs = new SocketAsyncEventArgs();
public Socket Client;
}

To preserve application state between async operations SocketAsyncEventArgs.UserToken is used.
/// 
/// Async server sample, demonstates usage of XxxAsync methods
///

class AsyncServer
{
Socket listeningSocket;
List messages = new List();
const int PrefixSize = 4;

SocketAsyncEventArgs acceptEvtArgs;

public AsyncServer()
{
this.listeningSocket = new Socket(AddressFamily.InterNetwork,
SocketType.Stream, ProtocolType.Tcp);
this.acceptEvtArgs = new SocketAsyncEventArgs();
}

public void Start(IPEndPoint listeningAddress)
{
acceptEvtArgs.Completed += new EventHandler(
Accept_Completed);

listeningSocket.Bind(listeningAddress);
listeningSocket.Listen(1);

ProcessAccept(acceptEvtArgs);
}

///
/// Accept completion handler
///

void Accept_Completed(object sender, SocketAsyncEventArgs e)
{
if (e.SocketError == SocketError.Success)
{
Socket client = e.AcceptSocket;
AsyncServerState state = new AsyncServerState();
state.ReadEventArgs.AcceptSocket = client;
state.ReadEventArgs.Completed += new EventHandler(
IO_Completed);
state.ReadEventArgs.UserToken = state;
state.Client = client;
state.ReadEventArgs.SetBuffer(state.Buffer, 0, state.Buffer.Length);

if (!client.ReceiveAsync(state.ReadEventArgs))
{ //call completed synchonously
ProcessReceive(state.ReadEventArgs);
}
}
ProcessAccept(e);
}

private void ProcessAccept(SocketAsyncEventArgs e)
{
e.AcceptSocket = null;
if (!listeningSocket.AcceptAsync(acceptEvtArgs))
{ //operation completed synchronously
Accept_Completed(null, acceptEvtArgs);
}
}

///
/// Genereic I/O completion handler
///

void IO_Completed(object sender, SocketAsyncEventArgs e)
{
switch (e.LastOperation)
{
case SocketAsyncOperation.Receive:
ProcessReceive(e);
break;
case SocketAsyncOperation.Send:
ProcessSend(e);
break;
default:
throw new NotImplementedException("The code will "
+"handle only receive and send operations");
}
}

///
/// In future will process server send operations
///

private void ProcessSend(SocketAsyncEventArgs e) { }

///
/// Implements server receive logic
///

private void ProcessReceive(SocketAsyncEventArgs e)
{
//single message can be received using several receive operation
AsyncServerState state = e.UserToken as AsyncServerState;

if (e.BytesTransferred <= 0 || e.SocketError != SocketError.Success) { CloseConnection(e); } int dataRead = e.BytesTransferred; int dataOffset = 0; int restOfData = 0; while (dataRead > 0)
{
if (!state.DataSizeReceived)
{
//there is already some data in the buffer
if (state.Data.Length > 0)
{
restOfData = PrefixSize - (int)state.Data.Length;
state.Data.Write(state.Buffer, dataOffset, restOfData);
dataRead -= restOfData;
dataOffset += restOfData;
}
else if (dataRead >= PrefixSize)
{ //store whole data size prefix
state.Data.Write(state.Buffer, dataOffset, PrefixSize);
dataRead -= PrefixSize;
dataOffset += PrefixSize;
}
else
{ // store only part of the size prefix
state.Data.Write(state.Buffer, dataOffset, dataRead);
dataOffset += dataRead;
dataRead = 0;
}

if (state.Data.Length == PrefixSize)
{ //we received data size prefix
state.DataSize = BitConverter.ToInt32(state.Data.GetBuffer(), 0);
state.DataSizeReceived = true;

state.Data.Position = 0;
state.Data.SetLength(0);
}
else
{ //we received just part of the headers information
//issue another read
if (!state.Client.ReceiveAsync(state.ReadEventArgs))
ProcessReceive(state.ReadEventArgs);
return;
}
}

//at this point we know the size of the pending data
if ((state.Data.Length + dataRead) >= state.DataSize)
{ //we have all the data for this message

restOfData = state.DataSize - (int)state.Data.Length;

state.Data.Write(state.Buffer, dataOffset, restOfData);
Console.WriteLine("Data message received. Size: {0}",
state.DataSize);

dataOffset += restOfData;
dataRead -= restOfData;

state.Data.SetLength(0);
state.Data.Position = 0;
state.DataSizeReceived = false;
state.DataSize = 0;

if (dataRead == 0)
{
if (!state.Client.ReceiveAsync(state.ReadEventArgs))
ProcessReceive(state.ReadEventArgs);
return;
}
else
continue;
}
else
{ //there is still data pending, store what we've
//received and issue another BeginReceive
state.Data.Write(state.Buffer, dataOffset, dataRead);

if (!state.Client.ReceiveAsync(state.ReadEventArgs))
ProcessReceive(state.ReadEventArgs);

dataRead = 0;
}
}
}

private void CloseConnection(SocketAsyncEventArgs e)
{
AsyncServerState state = e.UserToken as AsyncServerState;

try
{
state.Client.Shutdown(SocketShutdown.Send);
}
catch (Exception) { }

state.Client.Close();
}
}

Code sample above gives basic idea how completion ports asynchronous pattern can be used in TCP server development.

High peformance TCP server using completion ports

Completion ports were first introduced in Windows NT 4.0. This technology makes simultaneous asynchronous I/O possible and extremely effective. When building high performance network software one has to think of effective threading model. Having too many or too little server threads in the system can result in poor server performance.


The goal of a server is to incur as few context switches as possible by having its threads avoid unnecessary blocking, while at the same time maximizing parallelism by using multiple threads. The ideal is for there to be a thread actively servicing a client request on every processor and for those threads not to block if there are additional requests waiting when they complete a request. For this to work correctly however, there must be a way for the application to activate another thread when one processing a client request blocks on I/O (like when it reads from a file as part of the processing). (read more...)

Smart reader may admit that I/O completion ports (IOCP) are not directly available in .NET. Well, that was pure truth until SP1 of .NET 2.0. From .NET 2.0 SP1 this marvelous technology can be accessed using following Socket class methods:
  • AcceptAsync
  • ConnectAsync
  • DisconnectAsync
  • ReceiveAsync
  • SendAsync
  • and other XxxAsync methods
In one of my previous posts about receiving variable length messages I used BeginXXX/EndXXX asynchronous approach. Main drawback of this approach is presence of the repeated allocation and synchronization of objects during high-volume asynchronous socket I/O. That is because BeginXXX/EndXXX design pattern currently implemented by the System.Net.Sockets.Socket class requires a System.IAsyncResult object be allocated for each asynchronous socket operation.

Completion port approach on the other hands avoids the above mentioned problems altogether. Asynchronous operations are described by instances of SocketAsyncEventArgs. These objects can be reused by the application, more over, application can create as many SocketAsyncEventArgs objects that it needs to perform well under sustainable load.

The pattern for performing an asynchronous socket operation with this class consists of the following steps:
  1. Allocate a new SocketAsyncEventArgs context object, or get a free one from an application defined pool
  2. Set properties on the context object to the operation about to be performed (the completion callback method, the data buffer, the offset into the buffer, and the maximum amount of data to transfer, for example).
  3. Call the appropriate socket method (XxxAsync) to initiate the asynchronous operation
  4. If the asynchronous socket method (XxxAsync) returns true, in the callback, query the context properties for completion status
  5. If the asynchronous socket method (XxxAsync) returns false, the operation completed synchronously. The context properties may be queried for the operation result
  6. Reuse the context for another operation, put it back in the pool, or discard it
In How to Transfer Fixed Sized Data With Async Sockets I've presented server code that uses BeginXXX/EndXXX model for data receive handling. In the next post I'll show how that code can be rewritten to use IOCP server model.
Sample code of IOCP based TCP server