Software Development: tips'n'tricks

Showing posts with label tips'n'tricks. Show all posts

Friday, February 03, 2012

Fast Conversion of Hex String Into Decimal Number

Today's post will be about performance. More specifically about converting hex string into decimal number faster than using built-in .NET Framework methods.

I will compare performance of three methods used to convert hex into decimal number. Two of those methods are built into .NET Framework.

1) Convert.ToInt32(hexNumberString, 16)
2) int.Parse(hexNumber, NumberStyles.HexNumber);
3) Custom method using pre-populated table. Let us call it TableConvert.

Here's the code for the TableConvert class. This class just illustrates the idea behind pre-populated table - it is not production code.

class TableConvert
{
static sbyte[] unhex_table =
{ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,-1,-1,-1,-1,-1,-1
,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1
,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1
,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
};

public static int Convert(string hexNumber)
{
int decValue = unhex_table[(byte)hexNumber[0]];
for (int i = 1; i < hexNumber.Length; i++)
{
decValue *= 16;
decValue += unhex_table[(byte)hexNumber[i]];
}
return decValue;
}
}

The approach uses simple technique of pre-populated table of hex-to-decimal values and some simple math. To measure performance of these three methods I've wrote small test application that performed conversion in the loop and measured time.

Tests were made using Intel Core i7 2.80 GHz, .NET Framework 4.0. Time measurements were made in ticks using Stopwatch class.

Results:
Hex string: FF01

Iterations	TableConvert	Convert Method	Parse Method
100	28	35	63
10000	2134	2593	5915
1000000	191935	252252	433490

Hex string: 4AB201

Iterations	TableConvert	Convert Method	Parse Method
100	26	27	47
10000	1930	2775	4284
1000000	192801	269016	481308

Conclusions

Not surprisingly Parse method has the worst performance of all methods. While TableConvert method is the fastest. Usually about 1.2 - 1.4 times faster (20%-40%) than Convert Method.

If you ever happen to do a lot of convert operations of hex string into decimal number and want to perform it as fast as possible - you can use TableConvert method.

Friday, October 14, 2011

Determining .NET Assembly Target Platform

.NET assembly can be built with different platform targets: any CPU, x86 or x64.
But if all you got is already built binary file: how to determine what is exe or dll target platform?

In .NET information about assemblies and types is stored using metadata. For instance, assembly metadata is contained in the assembly manifest.

Assembly manifest can be viewed using special tools, e.g. IL Disassembler

Other tool that we can use for determining CPU type of the assembly is CorFlags Conversion Tool (CorFlags.exe). I will demonstrate how this tool works and what output it generates.

Let us assume we have 3 assemblies: CpuTypeSample_x86.exe, CpuTypeSample_x64.exe and CpuTypeSample_any.exe. Having x86, x64 and "Any CPU" respectively platofrm targets.

In the command prompt we will execute following commands:
Command: "corflags CpuTypeSample_x86.exe" will produce output:
Microsoft (R) .NET Framework CorFlags Conversion Tool. Version 3.5.21022.8
Copyright (c) Microsoft Corporation. All rights reserved.

Version : v4.0.30319
CLR Header: 2.5
PE : PE32
CorFlags : 3
ILONLY : 1
32BIT : 1
Signed : 0

The output above means that CpuTypeSample_x86.exe has PE32 executable PE type and 32BIT flag set to 1. If you observe such output on other assemblies it means they were built for x86 platform.

Here is the output for the x64 assembly:
Command: "corflags CpuTypeSample_x64.exe"
Microsoft (R) .NET Framework CorFlags Conversion Tool. Version 3.5.21022.8
Copyright (c) Microsoft Corporation. All rights reserved.

Version : v4.0.30319
CLR Header: 2.5
PE : PE32+
CorFlags : 1
ILONLY : 1
32BIT : 0
Signed : 0

Contrary to x86 it has PE32+ type of the PE file and 32BIT flag is turned off.

Finally, here is the output for the "Any CPU" assembly:
Command: "corflags CpuTypeSample_any.exe"
Microsoft (R) .NET Framework CorFlags Conversion Tool. Version 3.5.21022.8
Copyright (c) Microsoft Corporation. All rights reserved.

Version : v4.0.30319
CLR Header: 2.5
PE : PE32
CorFlags : 1
ILONLY : 1
32BIT : 0
Signed : 0

PE type in this file is similar to x86 assembly but 32BIT flag is turned off.
Summary table of the above output

	PE Type	32BIT
x86	PE32	1
x64	PE32+	0
Any	PE32	0

Sunday, July 18, 2010

Fastest Way To Retrieve Custom Attributes for a Type Member

In my previous posts (Performance Issues When Comparing Strings in .NET and When string.ToLower() is Evil) string related operations were discussed.

In this post we'll examine performance issues when querying for type member's custom attributes.
Let us define two attributes and a class. Class will have its single method decorated with an attribute. Here's the code:

class FooAttribute : Attribute
{ }

class BarAttribute : FooAttribute
{ }

class Item
{
    [Bar]
    public int Action()
    {
        return 0;
    }
}

Now the question is what is the fastest way to check Action method for Bar custom attribute. Custom attributes can be queried using instance of a type that implements ICustomAttributeProvider interface. In our case we shall use Assembly class and MethodInfo.

The code below queries custom attributes using Assembly class and then using MethodInfo instance. Query operation executes 10000 times and duration is measured using Stopwatch class. Code below also measures time required to check if attribute is applied.

int count = 10000;
Type tBar = typeof(Item);
MethodInfo mInfo = tBar.GetMethod("Action");
//warm up
mInfo.IsDefined(typeof(FooAttribute), true);
object[] attribs = null;
Stopwatch sw = new Stopwatch();

sw.Start();
for (int i = 0; i < count; i++)
{
 attribs = Attribute.GetCustomAttributes(mInfo, typeof(FooAttribute), true);
}
sw.Stop();

Console.WriteLine("Attribute(specific): {0}, Found: {1}", sw.ElapsedMilliseconds, 
 attribs.Length);
sw.Reset();

sw.Start();
for (int i = 0; i < count; i++)
{
 attribs = mInfo.GetCustomAttributes(typeof(FooAttribute), true);
}
sw.Stop();
Console.WriteLine("MethodInfo: {0}, Found: {1}", sw.ElapsedMilliseconds, 
 attribs.Length);
sw.Reset();

sw.Start();
for (int i = 0; i < count; i++)
{
 attribs = Attribute.GetCustomAttributes(typeof(FooAttribute), true);
}
sw.Stop();

Console.WriteLine("Attribute(general): {0}, Found: {1}", sw.ElapsedMilliseconds, 
 attribs.Length);
sw.Reset();
   
sw.Start();
for (int i = 0; i < count; i++)
{
 Attribute.IsDefined(mInfo, typeof(FooAttribute), true);
}
sw.Stop();

Console.WriteLine("Attribute::IdDefined: {0}", sw.ElapsedMilliseconds);
sw.Reset();

sw.Start();
for (int i = 0; i < count; i++)
{
 mInfo.IsDefined(typeof(FooAttribute), true);
}
sw.Stop();

Console.WriteLine("MethodInfo::IdDefined: {0}", sw.ElapsedMilliseconds);
sw.Reset();

Code above produces the output:

Attribute(specific): 137, Found: 1
MethodInfo: 130, Found: 1
Attribute(general): 569, Found: 1
Attribute::IdDefined: 40
MethodInfo::IdDefined: 33

Results indicate that the fastest method is querying custom attributes via MethodInfo class. To generalize the results above we can say that the fastest way to get custom attributes - is to use the closest reflection equivalent of type member. (e.g. Method - MethodInfo, Property - PropertyInfo etc)

Last two results show the time of IsDefined operation. Use this operation in cases when only a check is needed whether attribute is applied to a type member.

Thursday, July 08, 2010

Type inference in generic methods

Did you know that in .NET generic methods have type inference? It can also be named as implicit typing.

Let's see how type inference looks in code. In the sample below there is a class with generic methods

class NonGenericType
    {
        public static int GenericMethod1<TValue>(TValue p1, int p2)
        {
            Console.WriteLine(p1.GetType().Name);
            return default(int);
        }

        public static TValue GenericMethod2<TValue>(int p1, TValue p2)
        {
            Console.WriteLine(p2.GetType().Name);
            return default(TValue);
        }

        public static TReturn GenericMethod3<TValue, TReturn>(int p1, TValue p2)
        {
            Console.WriteLine(p2.GetType().Name);
            Console.WriteLine(typeof(TReturn).Name);
            return default(TReturn);
        }
    }

Here's the traditional way of using the above defined methods:

NonGenericType.GenericMethod1<string>("test", 5);
NonGenericType.GenericMethod2<double>(1, 0.5);

Nothing fancy here, we specify what type to place instead of TValue type parameter.
Type inference gives us the possibility to omit directly specifying type parameters. Instead we just use methods as if they're non generic.

NonGenericType.GenericMethod1("test", 5);
NonGenericType.GenericMethod2(1, 0.5);

Type inference can become handy as it reduces typing, but in my opinion it makes code less readable.
Also type inference cannot "guess" the return type of the method:

NonGenericType.GenericMethod3(1, 0.5);
 //error CS0411: The type arguments for method 'TypeInference.NonGenericType.GenericMethod3<TValue,TReturn>(int, TValue)' cannot be inferred from the usage. Try specifying the type arguments explicitly.

Nice explanation why inference does not work in the scenario above was given by Eric Lippert
Happy coding :)

Friday, June 18, 2010

Thread Safe Collection Iteration Techniques

Under multithreaded environment every operation should be tested and analyzed from the viewpoint of thread-safety. That is check every data structure what will happen if it is accessed/changed from multiple threads

Imagine, we need to iterate over a collection of items and perform some actions over each item of the collection. Since we're talking about threading - iteration should be done in a thread safe way. That is while we are iterating over collection no other thread is allowed to add or remove items from it.
No problemo! you may think - do the iteration under a lock.

But it is not that simple.

Code sample below illustrates two approaches how to do the iteration. Both have pros and cons. More on that after the code sample.

int initialItems = 5;
ICollection<string> coll = new List<string>(initialItems);

for(int i = 1; i <= initialItems; i++)
 coll.Add("item" + i.ToString());
   
//#1 iterating with lock approach
lock(coll)
{
 foreach(string item in coll)
 {
  PerformWorkWithItem(item);    
 }
}
//

//#2 iteration over a copy 
ICollection<string> copyColl = null;
lock(coll)
{
 copyColl = new List<string>(coll);
}

foreach(string item in copyColl)
{
 PerformWorkWithItem(item);    
}
//    

void PerformWorkWithItem(string item)
{
 //
 // perform operations that can take some 
 // considereable amount of time     
 //
}

Welcome back.

Approach #1 uses global lock for iteration. That means that while iterating collection is protected by the lock.
The pros are:

simplicity (just put the lock and do the job)
Memory efficiency - no new object are constructed

The cons are:

if PerformWorkWithItem takes long time to complete or is blocking (i.e. reading data from the network) access to collection is blocked for considerable amount of time
action with a collection item is also protected by the lock

Approach #2 uses different technique. It locks access to the collection only to perform a copy (snapshot) of the original collection. Iteration and PerformWorkWithItem action is made over a snapshot and is not protected by the lock.
The pros are:

Operations on collection items are done without locking the collection. If PerformWorkWithItem takes long time to complete original collection is not locked as in #1
Allows to schedule actions on collection items using separate thread

The cons are

If original collection is large enough performing data copy can become inefficient
Add complexity. While performing actions on snapshot items of the original collection may have been already changed.

Now that we know pros and cons of these two approaches we can deduce some hints that can help choose appropriate technique.

For instance, if PerformWorkWithItem action is relatively fast and there is no problem for the rest of the application to wait for iteration process then approach #1 is the best.

On the other hand if PerformWorkWithItem can take considerable amount of time and other parts of the application frequently access the collection (i.e. it is not desirable to block access to the collection for a long time) then #2 can do.

P.S. There also exists an approach #3. It utilizes lock-free data structures. But it is a whole new story and a topic for separate post.

Tuesday, June 15, 2010

AesManaged class Key and KeySize properties issue

Today when working with AesManaged class I've encountered very strange behavior.
If you have a code like this - you're in trouble:

AesManaged aes = new AesManaged();
aes.Key = key;
aes.KeySize = key.Length; //the problem

The problem with this code is setting KeySize after setting Key value.
When you set KeySize after Key - the previously specified key is discarded and a brand new key value is generated and put into Key property

I find this behavior rather strange, especially that there is no information describing what will happen after setting KeySize.

I would expect that when Key value is set setting KeySize will throw exception if specified key's size is bigger or smaller than the new one.

Friday, April 16, 2010

Refactoring code with lambda expressions

Without much ado lets go straight to the code that needs to be refactored:

bool SomeMethod(long param)
{
   //
   // some prefix code
   //
   try
   {
      //do specific job here
      return DoSpecificJob(param);
   }
   finally
   {
      //
      // some suffix code
      //
   }
}

Result SomeOtherMethod(string name, int count)
{
   //
   // some prefix code
   //
   try
   {
      return DoOtherSpecificJob(name, count);
   }
   finally
   {
      //
      // some suffix code
      //
   }
}

The question is how we can bring prefix and suffix code from the example above in one place (method) without changing code logic.
The goal is to have these two methods rewritten like this:

bool SomeMethod(long param)
{
   return DoSpecificJob(param);
}

Result SomeOtherMethod(string name, int count)
{
   return DoOtherSpecificJob(name, count);
}

While another method will be created that executes prefix and suffix code.

There are several ways how to do that:
1. Create method that contains prefix and suffix code, accepts Delegate object and params object[] array

object ExecuteCommon(Delegate d, params object[] args)
{
   //
   // prefix code
   //
   try { return d.DynamicInvoke(args); }
   finally
   {
      //
      // suffix code
      //
   }
}

bool SomeMethod_First(long param)
{
   Delegate d = new Func<long, bool>((b) => SomeMethod(b));
   return (bool)ExecuteCommon(d, new object[] {param});
}

Result SomeOtherMethod_First(string name, int count)
{
   Delegate d = new Func<string, int, Result>((n, c) => SomeOtherMethod(n, c));
   return (Result)ExecuteCommon(d, new object[] { name, count });
}

The approach looks nice but has several caveats. The problems here are: boxing (wrapping value types into reference types) and casting.
2. Move repeated code up on the call stack
This approach is possible if SomeMethod and SomeOtherMethod are on the same call stack level or called from the same method.

3. Create a generic method that accepts generic delegate and defines several parameters

TResult ExecuteCommon<T1,TResult>(Func<T1, TResult> func, T1 param1)
{
   //
   // some prefix code
   //
   try { return func(param1); }
   finally
   {
      //
      // some suffix code
      //
   }
}

TResult ExecuteCommon<T1, T2, TResult>(Func<T1, T2, TResult> func, T1 param1, T2 param2)
{
   //
   // some prefix code
   //
   try { return func(param1, param2); }
   finally
   {
      //
      // some suffix code
      //
   }
}

bool SomeMethod_Third(long param)
{
   return ExecuteCommon<long, bool>((p) => SomeMethod(param), param);
}

Result SomeOtherMethod_Third(string name, int count)
{
   return ExecuteCommon<string, int, Result>(
      (n, c) => SomeOtherMethod(n, c), name, count);
}

This method does not use casting and there is no boxing present when value type parameters are specified. However, the caveat is that you need to define multiple methods with variable type parameters count. In my opinion the third approach is the best although we have to define two methods that execute prefix/suffix code.

P.S. Another way of refactoring here is code generation: emitting code on the fly or using template tools like T4 templates in Visual Studio.

Sunday, December 13, 2009

MSSQL DATEDIFF Equivalent in MySQL

Recently I was porting T-SQL (MSSQL) code into SQL dialect used by MySQL.
Process went smoothly until I have stuck with dates. Especially intervals between two dates. In T-SQL datediff function is used to get interval between to datetime values.

Let us consider this T-SQL sample:

declare @d1 datetime;
declare @d2 datetime;
set @d1 = '2009-01-18 15:22:01'
set @d2 = '2009-01-19 14:22:01'
select datediff(hour, @d1, @d2) as hour, 
       datediff(day, @d1, @d2) as day,
       datediff(second, @d1, @d2) as second

Query results are:

hour	day	second
23	1	82800

After doing some searching I found out that MySQL equivalent is:

set @d1 = '2009-01-18 15:22:01';
set @d2 = '2009-01-19 14:22:01';
select timestampdiff(hour, @d1, @d2) as hour,
       timestampdiff(day, @d1, @d2) as day,
       timestampdiff(second, @d1, @d2) as second;

Query results are:

hour	day	second
23	0	82800

Query results are nearly the same except day difference. Somehow, MSSQL treats 23 hours as one day.

In general we can think of timestampdiff (MySQL) as 1-to-1 equivalent of datediff (MSSQL). To make them truly equal it is better to get difference in seconds and then convert (calculate) required interval (hours, days).

Wednesday, November 18, 2009

Local Computer Connection Failure When Using ActiveSync

Not so long ago I have encountered strange problem with Windows Mobile device connectivity when using ActiveSync.

Here is long story cut short and the solution I came up with.

Server software was located on the Host1. Windows Mobile device was supposed to connect to that server software.

When device was cradled and connected to Host1's ActiveSync it was unable to open connection to server software. However, it could connect perfectly well to the same server software located on Host2. Strange?

It turned out that for local connections (connections to the host where ActiveSync is located) ActiveSync substitutes remote address device wants to connect to with 127.0.0.1. It is loopback address.
Server software was listening on particular IP address (at that time this was by design).
I discovered this by using netstat command and connecting with Windows Mobile browser to local IIS web server.
The command I have used was: netstat -anbp tcp

Obvious fix: listen on 0.0.0.0 (in .NET it is IpAddress.Any).
Any time when connection to local computer from cradled device fails - check if software you are connecting to listens on all IP addresses or is configured to listen on loopback (127.0.0.1) too.

Thursday, November 12, 2009

Performance Issues When Comparing .NET Strings

Every time when you want to use string.Compare("str1", "str2", true) for case insensitive string comparison - think twice.

To illustrate my point I am bringing this example:


int iters = 100000;
string cmp1 = "SomeString";
string cmp2 = "Someotherstring";

Stopwatch sw = Stopwatch.StartNew();
for(int i =0 ;i < iters;i++)
{
    int res = string.Compare(cmp1, cmp2, true);
}
sw.Stop();

Console.WriteLine("First:" + sw.ElapsedMilliseconds);
            
sw = Stopwatch.StartNew();

for (int i = 0; i < iters; i++)
{
    int res = string.Compare(cmp1, cmp2, StringComparison.OrdinalIgnoreCase);
}
sw.Stop();

Console.WriteLine("Second:" + sw.ElapsedMilliseconds);

Quick question which method is faster, first or second?

...
...

Here is my result in milliseconds:
First:77
Second:26

Wow, second sample nearly 3 times faster!!!
This is because first method uses culture-specific information to perform comparison, while the second uses ordinal compare method (compares numeric values of each char of the string).

Knowing the above we can deduce general rule of thumb: when culture-specific string comparison is not required we should use second way otherwise first one.

Thursday, September 24, 2009

Howto: C++ Class Conversion Operator in .CPP file

In case someone did not know how to do this. It took me some time to figure out the right syntax for writing conversion operator implementation in the CPP file. Here is the definition of the conversion operator.

In a header (.h) file we have TestCase class declared.

class TestCase
{
public:
    operator std::string ();
};

While in .CPP we should have declaration written in the form TestCase::.
The declaration of the "to std::string" conversion operator will look like this

TestCase::operator std::string()
{
   std::string msg("TestCase internals");
   return msg;
}

Now we can use this operator in the code

TestCase testClass;
std::string msg = testClass;

msg variable will be equal to "TestCase internals" string.

Tuesday, June 23, 2009

Complex Keys In Generic Dictionary

Let us start with the quiz about generic dictionary.

Dictionary simpleDict = new Dictionary(StringComparer.OrdinalIgnoreCase);
simpleDict["name1"] = "value";
simpleDict["Name1"] = "value2";

What value will simpleDict["name1"] return?

Let's get back to using complex keys in generic dictionary.

.NET Framework provides IEqualityComparer<T> interface that can be used by dictionary to distinguish between different keys.

Imagine we have a complex class that we want to serve as a key in our dictionary.

public class ComplexKey
{
    public int Part1 { get; set; }
    public string Part2 { get; set; }
}

The implentation of the comparer will be the following:

public class ComplexKeyComprarer : IEqualityComparer
{
    public bool Equals(ComplexKey x, ComplexKey y)
    {
        return x.Part1.Equals(y.Part1) && x.Part2.Equals(y.Part2);
    }

    public int GetHashCode(ComplexKey obj)
    {
        return obj.Part1.GetHashCode() ^ obj.Part2.GetHashCode();
    }
}

Having created the comparer we can now instantiate dictionary and operate with complex keys in the same way as with simple ones.

Dictionary<ComplexKey, string> complexDict = 
    new Dictionary<ComplexKey, string>(new ComplexKeyComprarer());

ComplexKey ck1 = new ComplexKey() { Part1 = 1, Part2 = "name1" };
ComplexKey ck2 = new ComplexKey() { Part1 = 1, Part2 = "name2" };

complexDict[ck1] = "value1";
complexDict[ck2] = "value2";

Very convenient by the way :)

Thursday, May 07, 2009

Check If Local Port Is Available For TCP Socket

From time to time we need to check if specified port is not occupied. It can be some sort of setup action where we install server product and want to assure that tcp listener will start without any problems.

How, to check if port is busy - start listening on it.

Version #1

bool IsBusy(int port)
{
    Socket socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream,
        ProtocolType.Tcp);
    try
    {  
        socket.Bind(new IPEndPoint(IPAddress.Any, port));
        socket.Listen(5);
        return false;
    } catch { return true; }
    finaly { if (socket != null) socket.Close(); }
}

If another process is listening on specified address our code will return false. This will make our code think that port was free while it was not. Remember, we are checking port availability. We need exclusive access to the port.

Version #2

bool IsBusy(int port)
{
    Socket socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream,
        ProtocolType.Tcp);
    try
    {  
        socket.SetSocketOption(SocketOptionLevel.Socket, 
            SocketOptionName.ExclusiveAddressUse, true);
        socket.Bind(new IPEndPoint(IPAddress.Any, port));
        socket.Listen(5);
        return false;
    } catch { return true; }
    finaly { if (socket != null) socket.Close(); }
}

This version of the code is much better. It tries to bind the endpoint with exclusive access. If some other process is listening on the port or established connection is bound to the port an exception will be thrown.

And, of course, there is another way how to perform the check. We shall use classes from System.Net.NetworkInformation namespace

bool IsBusy(int port)
{
    IPGlobalProperties ipGP = IPGlobalProperties.GetIPGlobalProperties();
    IPEndPoint[] endpoints = ipGlobalProperties.GetActiveTcpListeners();
    if ( endpoints == null || endpoints.Length == 0 ) return false;
    for(int i = 0; i < endpoints.Length; i++)
        if ( endpoints[i].Port == port )
            return true;
    return false;          
}

Thursday, March 26, 2009

Windows Vista Defragmentation Tools

Windows Vista by default uses NTFS file system. Sooner or later files on it will start to fragment.

Fragmentation can lead to significant disk I/O performance decrease. Common way how to handle this problem is a process called defragmentation

It turns out that Vista's defragmentation tools is little bit oversimplified

As you can see, user interface lacks volume fragmentation information. So it is hard to say whether my volume needs defragmentation.

Be afraid not as Vista has command-line based tool - defrag.exe.

Windows Disk Defragmenter
Copyright (c) 2006 Microsoft Corp.
Description:  Locates and consolidates fragmented files on local volumes to
              improve system performance.

Syntax:  defrag  -a [-v]
         defrag  [{-r | -w}] [-f] [-v]
         defrag       -c [{-r | -w}] [-f] [-v]

Parameters:

Value         Description

      Specifies the drive letter or mount point path of the volume to
              be defragmented or analyzed.

-c            Defragments all volumes on this computer.

-a            Performs fragmentation analysis only.

-r            Performs partial defragmentation (default). Attempts to
              consolidate only fragments smaller than 64 megabytes (MB).

-w            Performs full defragmentation. Attempts to consolidate all file
              fragments, regardless of their size.

-f            Forces defragmentation of the volume when free space is low.

-v            Specifies verbose mode. The defragmentation and analysis output
              is more detailed.

-?            Displays this help information.

Examples:

defrag d:
defrag d:\vol\mountpoint -w -f
defrag d: -a -v
defrag -c -v

Here's sample output on my system volume (C:),
defrag c: -a
Windows Disk Defragmenter
Copyright (c) 2006 Microsoft Corp.

Analysis report for volume C: VISTA

Volume size = 70.00 GB
Free space = 21.84 GB
Largest free space extent = 6.20 GB
Percent file fragmentation = 4 %

Note: On NTFS volumes, file fragments larger than 64MB are not included in the fragmentation statistics

You do not need to defragment this volume.

It is also possible to increase the amount of information returned by the tool, just specify -v switch.

Now we have some information about disk fragmentation and hence can decide when to start defragmentation process.

P.S. I had a strange filling when writing this post. In an operating system like Windows Vista with redesigned user interface it is awkward too use command-line tools to perform common task like disk defragmentation.

Sunday, March 15, 2009

Image Watermarking

We all know that when image is posted on the internet it no longer belongs to you.

It can be arguable, but nevertheless, any user with browser can simply save it on HDD and you can do nothing about it.

One of the ways how to control image distribution is Digital watermarking.

In my case I had ~100 images that had to be watermarked. There are two ways how to do this: manually (e.g using Photoshop) or write some code to do the job automatically.

In case of simple watermarks: horizontal/vertical text everything is simple, but things become harder when watermark text should be positioned diagonally.

After poking around the web I found this brilliant article. The code did it job well, so I wired it into small console app, and voilà - 105 files watermaked in less then 20 seconds.

Monday, January 19, 2009

Searching for Similar Words. Similarity Metric

How one can find out if two or more words are similar? I do not mean semantically similar (synonyms aren't taken into consideration), but visually similar.

Consider, these two words, "sample1" and "sample2". Do they look similar? Well, at least they have the same start - "sample". The next two have merely common letters in them: "fox" and "wolf".

One of the methods that can be used to measure words similarity is Euclidian distance. It is used to measure distance beetwen two point in space.

The code to measure similarity of 2 strings:

public static double Euclidian(string v1Orig, string v2Orig)
{
    string v1, v2;
    if (v1Orig.Length > v2Orig.Length)
    {
        v1 = v1Orig; v2 = v2Orig;
    }
    else
    {
        v1 = v2Orig; v2 = v1Orig;
    }

    double d = 0.0;
    for (int i = 0; i < v1.Length; i++)
    {
        if ( i < v2.Length )
           d += Math.Pow((v1[i] - v2[i]), 2);
        else
           d += Math.Pow((v1[i] - 0.0), 2);
    }
    return Math.Sqrt(d);
}

Using the code above we can get numbers that measure words similarity:
"sample1" and "sample2" will give 1.0. While "wolf" and "fox" give 104.10091258005379. Words that are identical will give 0.0. Thus the less number you get the more two words are similar.

In the context of Euclidean distance, "fox" and "wolf" have greater distance then "sample1" and "sample2".

This measurement approach can be used when searching for word groups in the text.

Monday, November 03, 2008

Bit Flags: The Simple Way

Time from time we face the need or (for some of us) an opprotunity to mess with the bit fields. As we all know bytes consist of bits. Bit can have two values "0" and "1".

Using this knowledge we can store several flags under single byte. In one byte of information we can efficiently store eight one bit flags (1 byte has 8 bits). If we use int values, which has four bytes - we can store 32 bit flags (4*8=32).

So, lets define 8 bit flags. We should receive following picture.

00000001 - 1-st flag
00000010 - 2-nd flag
00000100 - 3-rd flag
00001000 - 4-th flag
00010000 - 5-th flag
00100000 - 6-th flag
01000000 - 7-th flag
10000000 - 8-th flag

There are two ways how to define these flags in C#. Here is the first way: convert each binary number into hexadecimal representation (decimal can be also used). The result will be like this:

[Flags]
public enum FirstApproach : byte
{
    First = 0x01,
    Second = 0x02,
    Third = 0x04,
    Forth = 0x08,
    Fifth = 0x10,
    Sixth = 0x20,
    Seventh = 0x40,
    Eighth = 0x80, 
}

We obtained rather compact definition. But if someone cannot convert binary (01000000) into hex (0x40) number very fast - she will have to use some special tool like calc.exe :). I consider the above mentioned way little bit tiresome.

Here is the second approach: we define "base" at first.

[Flags]
public enum DefBase : byte
{
    First = 0,
    Second = 1,
    Third = 2,
    Forth = 3,
    Fifth = 4,
    Sixth = 5,
    Seventh = 6,
    Eigth = 7,
}

Then, using "base" we can define our flags:

[Flags]
public enum SecondApproach : byte
{
    First   = 1 << (byte)DefBase.First,
    Second  = 1 << (byte)DefBase.Second,
    Third   = 1 << (byte)DefBase.Third,
    Forth   = 1 << (byte)DefBase.Forth,
    Fifth   = 1 << (byte)DefBase.Fifth,
    Sixth   = 1 << (byte)DefBase.Sixth,
    Seventh = 1 << (byte)DefBase.Seventh,
    Eighth  = 1 << (byte)DefBase. Eigth, 
}

Using 2-nd approach does not involve conversion from binary to hex. Using this method we can reuse DefBase multiple times to create required bit flags. Defining next bit flag is no more pain.

If application will have a lot of bit flags declarations then it is more usefull to use 2-nd approach as it can save time of not using additional tools.

Friday, July 18, 2008

"Using" magic or working with type aliases

Generics in C# allow us specify and construct rather complex types. For instance, we can create a dictionary that maps Id number with the name:

Dictionary<int, string> idNameMapping;

We can create even more complex data structure, when Id number has to be mapped on another dictionary:

Dictionary<int, Dictionary<string, int>> complexIdMapping;

Nothing can stop us if we want put another types into dictionary (or any generic type) declaration.

You've noticed already that the more types we put into declaration the more clumsier it becomes. I'm not even mentioning typing effort :).
Here's how "using" keyword can help us reduce typing and introduce some level of self-documentation for the code.

using Item = System.Collections.Generic.Dictionary<string, string>;
namespace Sample
{
    using Records = Dictionary<int, Item>;
    public class Controller
    {
      Records recordDictionary = new Records();
    }
}

Isn't that cool? Instead of typing in a lot of nested types with all that < and > we can get "normal" looking type names.

What do I read:

Friday, June 27, 2008

The basics of secure data exchange under TCP

Doing data exchange in plain text is very convenient and easy to implement but what can you do to prevent eavesdropping, tampering, and message forgery of the data you send back and forth? Here's where secure communication comes into play. At present the most common secure communication method is using Transport Layer Security (TLS) or Secure Sockets Layer (SSL). In web context you can see secure data exchange in action when browsing web-sites with HTTPS prefix

In .NET framework secure communications can be done with SslStream class. It can use both TLS and SSL protocols.

TLS and SSL for authentication process use public key infrastructure or PKI. It requires certificates.

Here's nice explanation how to create certificate using makecert utility

After reading and doing what was said in the above mentioned blogpost we should end up with 2 installed certificates. They're depicted on the picture below.

The certificate we'll use will be "vadmyst-enc".

SslStream gives you the look and feel of a common .NET stream.

So, what are the basic steps to start secure communication with SslStream?
Very often the communication happens between server (e.g web server) and client (e.g. browser).
Here are the steps for the server side:

Start listening on specific address and port

When connection is accepted wrap obtained NetworkStream with SslStream

Call SslStream::AuthenticateAsServer

Start doing I/O (in our case that's basic echo server

In code it looks like this:

TcpListener listener = new TcpListener(ipEndpoint);
listener.Start(5);
TcpClient tcpClient = listener.AcceptTcpClient();

SslStream secureStream = new SslStream(tcpClient.GetStream(), false);
     
secureStream.AuthenticateAsServer(serverCertificate);
//use anonymous delegate for simplicity
ThreadPool.QueueUserWorkItem(new WaitCallback(delegate(object unused)
{
    //simple echo server
    byte[] tempBuffer = new byte[1024];
    int read = 0;
    try
    {
        while ((read = secureStream.Read(tempBuffer, 0, tempBuffer.Length)) > 0)
        {
            secureStream.Write(tempBuffer, 0, read);
        }
    }
    finally
    {
        secureStream.Close();
    }
}), null);

serverCertificate is obtained from certificate storage on the local machine:

X509Store store = new X509Store(StoreName.My, StoreLocation.LocalMachine);
store.Open(OpenFlags.ReadOnly);
X509Certificate serverCertificate = null;
for (int i = 0; i < store.Certificates.Count; i++)
{
    serverCertificate = store.Certificates[i];
    if (serverCertificate.Subject.Contains("vadmyst-enc"))
        break;
}
store.Close();

In this post I'll will not cover usage of client certificates to perform client authentication for the simplicity's sake. Client will only authenticate server.
The steps required by the client:

Open TCP connection to the remote server

Wrap obtained NetworkStream with SslStream instance

Call SslStream::AuthenticateAsClient

Begin do the I/O

Source code below demonstrates basic TCP client that transfers data in a secure way.

TcpClient client = new TcpClient();
client.Connect(endPoint);
SslStream sslStream = new SslStream(client.GetStream(), false);
sslStream.AuthenticateAsClient("vadmyst-enc");

byte[] plaintext = new byte[5*1024 + 35];
byte[] validation = new byte[plaintext.Length];
            
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetNonZeroBytes(plaintext);

sslStream.Write(plaintext);
sslStream.Flush();

int read = 0;
int totalRead = 0;
while( (read = sslStream.Read(validation, totalRead, 
                              validation.Length - totalRead)) > 0)
{
    totalRead += read;
    if (totalRead == plaintext.Length)
        break; //we've received all sent data
}
//check received data
for(int i=0; i < plaintext.Length; i++)
{
    if ( validation[i] != plaintext[i] )
        throw new InvalidDataException("Data is not the same");
}
sslStream.Close();

SslStream appeared in .NET framework on version 2.0. As you can see doing secure communications with it is very easy. However, there are number of situations that require additional coding: client authentication using client certificates, using other algorithms when doing secure I/O. I will cover these advanced topics on the next posts. Stay tuned!

Tuesday, June 17, 2008

Hashing in .NET (cryptography related battle tactics)

Those who think I'm going to talk about stuff related to hashish or hash brown are totally not right. (By the way I do like hash brown as well as this great Japanese liquor ;) )

I will be talking about hashing that is related to cryptography and security. Hashing can be described as a process of getting small digital "fingerprint" from any kind of data. Those interested in general information can get it here.

In .NET, security and cryptography related stuff is located in System.Security.Cryptography namespace. Our hero of the day will be SHA1 algorithm. .NET class SHA1Managed implements it. According to .NET cryptography model this class implements abstract class SHA1. The same, by the way, is valid for other hash algorithms e.g. MD5. They both inherit from HashAlgorithm class. It is very likely if new hashing algorithm is added to the .NET framework it will inherit from HashAlgorithm.

There are three ways how to calculate hash value for some data.

Use ComputeHash method

Use TransformBlock/TransformFinalBlock directly

Use CryptoStream

I'll show how to use above mentioned approaches. Let's assume we have some application data


RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
byte[] data = new byte[5 * 4096 + 320];
//fill data array with arbitrary data
rng.GetNonZeroBytes(data);
//initialize HashAlgorithm instance
SHA1Managed sha = new SHA1Managed();

The first way:


byte[] hash1 = sha.ComputeHash(data);

It very straightforward and simple: pass data and get hash output. But this method is not suitable when hash has to be calculated for several byte arrays or when data size is very large (calculate hash value for the binary file).
This leads us to the second way:


int offset = 0;
int blockSize = 64;
//reset algorithm internal state
sha.Initialize(); 
while (data.Length - offset >= blockSize)
{
    offset += sha.TransformBlock(data, offset, blockSize, 
                                 data, offset);
}
sha.TransformFinalBlock(data, offset, data.Length - offset);
//get calculated hash value
byte[] hash2 = sha.Hash;

This way is much more flexible because: we can reuse HashAlgorithm instance (using Initialize method) and calculate hash value for large data objects.
However, to do that we still have to write additional code to read chunks from file and then pass them to TransformBlock method.

Finally, the third way:


//reuse SHA1Managed instance
sha.Initialize(); 
MemoryStream memoryStream = new MemoryStream(data, false);
CryptoStream cryptoStream = new CryptoStream(memoryStream, sha,
    CryptoStreamMode.Read);
//temporary array used by the CryptoStream to store 
//data chunk for which hash calculation was performed
byte[] temp = new byte[16];
while (cryptoStream.Read(temp, 0, temp.Length) > 0) { }
cryptoStream.Close();
hash3 = sha.Hash;

Isn't it beautiful? CryptoStream can use any Stream object to read from. Thus calculating hash value for a large file isn't a problem - just pass FileStream to CryptoStream constructor!
Under the hood CryptoStream uses TransformBlock/TransformFinalBlock, so the third way is derivative from the second one.
CryptoStream links data streams to cryptographic transformations: it can be chained together with any objects that implement Stream, so the streamed output from one object can be fed into the input of another object.

The first approach is good when you're calculating hash values from time to time.
The second and third are best when large part of your application's operation is connected with hash calculations (like using cryptography in network I/O).