Friday, February 03, 2012

Fast Conversion of Hex String Into Decimal Number

Today's post will be about performance. More specifically about converting hex string into decimal number faster than using built-in .NET Framework methods.

I will compare performance of three methods used to convert hex into decimal number. Two of those methods are built into .NET Framework.

1) Convert.ToInt32(hexNumberString, 16)
2) int.Parse(hexNumber, NumberStyles.HexNumber);
3) Custom method using pre-populated table. Let us call it TableConvert.

Here's the code for the TableConvert class. This class just illustrates the idea behind pre-populated table - it is not production code.

class TableConvert
  {
      static sbyte[] unhex_table =
      { -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
       ,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
       ,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
       , 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,-1,-1,-1,-1,-1,-1
       ,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1
       ,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
       ,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1
       ,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
      };
                                 
      public static int Convert(string hexNumber)
      {
          int decValue = unhex_table[(byte)hexNumber[0]];
          for (int i = 1; i < hexNumber.Length; i++)
          {
              decValue *= 16;
              decValue += unhex_table[(byte)hexNumber[i]];
          }
          return decValue;
      }
  }

The approach uses simple technique of pre-populated table of  hex-to-decimal values and some simple math. To measure performance of these three methods I've wrote small test application that performed conversion in the loop and measured time.

Tests were made using Intel Core i7 2.80 GHz, .NET Framework 4.0. Time measurements were made in ticks using Stopwatch class.

Results:
Hex string: FF01

IterationsTableConvertConvert MethodParse Method
100283563
10000213425935915
1000000191935252252433490

Hex string: 4AB201
IterationsTableConvertConvert MethodParse Method
100262747
10000193027754284
1000000192801269016481308

Conclusions
Not surprisingly Parse method has the worst performance of all methods. While TableConvert method is the fastest. Usually about 1.2 - 1.4 times faster (20%-40%) than Convert Method.

If you ever happen to do a lot of  convert operations of hex string into decimal number and want to perform it as fast as possible - you can use TableConvert method.

Friday, October 14, 2011

Determining .NET Assembly Target Platform



.NET assembly can be built with different platform targets: any CPU, x86 or x64.
But if all you got is already built binary file: how to determine what is exe or dll target platform?

In .NET information about assemblies and types is stored using metadata. For instance, assembly metadata is contained in the assembly manifest.

Assembly manifest can be viewed using special tools, e.g. IL Disassembler

Other tool that we can use for determining CPU type of the assembly is CorFlags Conversion Tool (CorFlags.exe). I will demonstrate how this tool works and what output it generates.

Let us assume we have 3 assemblies: CpuTypeSample_x86.exe, CpuTypeSample_x64.exe and CpuTypeSample_any.exe. Having x86, x64 and "Any CPU" respectively platofrm targets.

In the command prompt we will execute following commands:
Command: "corflags CpuTypeSample_x86.exe" will produce output:
Microsoft (R) .NET Framework CorFlags Conversion Tool. Version 3.5.21022.8
Copyright (c) Microsoft Corporation. All rights reserved.

Version : v4.0.30319
CLR Header: 2.5
PE : PE32
CorFlags : 3
ILONLY : 1
32BIT : 1
Signed : 0


The output above means that CpuTypeSample_x86.exe has PE32 executable PE type and 32BIT flag set to 1. If you observe such output on other assemblies it means they were built for x86 platform.

Here is the output for the x64 assembly:
Command: "corflags CpuTypeSample_x64.exe"
Microsoft (R) .NET Framework CorFlags Conversion Tool. Version 3.5.21022.8
Copyright (c) Microsoft Corporation. All rights reserved.

Version : v4.0.30319
CLR Header: 2.5
PE : PE32+
CorFlags : 1
ILONLY : 1
32BIT : 0
Signed : 0


Contrary to x86 it has PE32+ type of the PE file and 32BIT flag is turned off.

Finally, here is the output for the "Any CPU" assembly:
Command: "corflags CpuTypeSample_any.exe"
Microsoft (R) .NET Framework CorFlags Conversion Tool. Version 3.5.21022.8
Copyright (c) Microsoft Corporation. All rights reserved.

Version : v4.0.30319
CLR Header: 2.5
PE : PE32
CorFlags : 1
ILONLY : 1
32BIT : 0
Signed : 0


PE type in this file is similar to x86 assembly but 32BIT flag is turned off.
Summary table of the above output
PE Type32BIT
x86PE321
x64PE32+0
AnyPE320

Sunday, July 18, 2010

Fastest Way To Retrieve Custom Attributes for a Type Member

In my previous posts (Performance Issues When Comparing Strings in .NET and When string.ToLower() is Evil) string related operations were discussed.

In this post we'll examine performance issues when querying for type member's custom attributes.
Let us define two attributes and a class. Class will have its single method decorated with an attribute. Here's the code:

class FooAttribute : Attribute
{ }

class BarAttribute : FooAttribute
{ }

class Item
{
    [Bar]
    public int Action()
    {
        return 0;
    }
}
Now the question is what is the fastest way to check Action method for Bar custom attribute. Custom attributes can be queried using instance of a type that implements ICustomAttributeProvider interface. In our case we shall use Assembly class and MethodInfo.

The code below queries custom attributes using Assembly class and then using MethodInfo instance. Query operation executes 10000 times and duration is measured using Stopwatch class. Code below also measures time required to check if attribute is applied.

int count = 10000;
Type tBar = typeof(Item);
MethodInfo mInfo = tBar.GetMethod("Action");
//warm up
mInfo.IsDefined(typeof(FooAttribute), true);
object[] attribs = null;
Stopwatch sw = new Stopwatch();

sw.Start();
for (int i = 0; i < count; i++)
{
 attribs = Attribute.GetCustomAttributes(mInfo, typeof(FooAttribute), true);
}
sw.Stop();

Console.WriteLine("Attribute(specific): {0}, Found: {1}", sw.ElapsedMilliseconds, 
 attribs.Length);
sw.Reset();

sw.Start();
for (int i = 0; i < count; i++)
{
 attribs = mInfo.GetCustomAttributes(typeof(FooAttribute), true);
}
sw.Stop();
Console.WriteLine("MethodInfo: {0}, Found: {1}", sw.ElapsedMilliseconds, 
 attribs.Length);
sw.Reset();

sw.Start();
for (int i = 0; i < count; i++)
{
 attribs = Attribute.GetCustomAttributes(typeof(FooAttribute), true);
}
sw.Stop();

Console.WriteLine("Attribute(general): {0}, Found: {1}", sw.ElapsedMilliseconds, 
 attribs.Length);
sw.Reset();
   
sw.Start();
for (int i = 0; i < count; i++)
{
 Attribute.IsDefined(mInfo, typeof(FooAttribute), true);
}
sw.Stop();

Console.WriteLine("Attribute::IdDefined: {0}", sw.ElapsedMilliseconds);
sw.Reset();

sw.Start();
for (int i = 0; i < count; i++)
{
 mInfo.IsDefined(typeof(FooAttribute), true);
}
sw.Stop();

Console.WriteLine("MethodInfo::IdDefined: {0}", sw.ElapsedMilliseconds);
sw.Reset();
Code above produces the output:
Attribute(specific): 137, Found: 1
MethodInfo: 130, Found: 1
Attribute(general): 569, Found: 1
Attribute::IdDefined: 40
MethodInfo::IdDefined: 33
Results indicate that the fastest method is querying custom attributes via MethodInfo class. To generalize the results above we can say that the fastest way to get custom attributes - is to use the closest reflection equivalent of type member. (e.g. Method - MethodInfo, Property - PropertyInfo etc)

Last two results show the time of IsDefined operation. Use this operation in cases when only a check is needed whether attribute is applied to a type member.

Thursday, July 08, 2010

Type inference in generic methods

Did you know that in .NET generic methods have type inference? It can also be named as implicit typing.

Let's see how type inference looks in code. In the sample below there is a class with generic methods

class NonGenericType
    {
        public static int GenericMethod1<TValue>(TValue p1, int p2)
        {
            Console.WriteLine(p1.GetType().Name);
            return default(int);
        }

        public static TValue GenericMethod2<TValue>(int p1, TValue p2)
        {
            Console.WriteLine(p2.GetType().Name);
            return default(TValue);
        }

        public static TReturn GenericMethod3<TValue, TReturn>(int p1, TValue p2)
        {
            Console.WriteLine(p2.GetType().Name);
            Console.WriteLine(typeof(TReturn).Name);
            return default(TReturn);
        }
    }
Here's the traditional way of using the above defined methods:
NonGenericType.GenericMethod1<string>("test", 5);
NonGenericType.GenericMethod2<double>(1, 0.5);
Nothing fancy here, we specify what type to place instead of TValue type parameter.
Type inference gives us the possibility to omit directly specifying type parameters. Instead we just use methods as if they're non generic.
NonGenericType.GenericMethod1("test", 5);
NonGenericType.GenericMethod2(1, 0.5);
Type inference can become handy as it reduces typing, but in my opinion it makes code less readable.
Also type inference cannot "guess" the return type of the method:
NonGenericType.GenericMethod3(1, 0.5);
 //error CS0411: The type arguments for method 'TypeInference.NonGenericType.GenericMethod3<TValue,TReturn>(int, TValue)' cannot be inferred from the usage. Try specifying the type arguments explicitly.
Nice explanation why inference does not work in the scenario above was given by Eric Lippert
Happy coding :)

Friday, June 18, 2010

Thread Safe Collection Iteration Techniques

Under multithreaded environment every operation should be tested and analyzed from the viewpoint of thread-safety. That is check every data structure what will happen if it is accessed/changed from multiple threads

Imagine, we need to iterate over a collection of items and perform some actions over each item of the collection. Since we're talking about threading - iteration should be done in a thread safe way. That is while we are iterating over collection no other thread is allowed to add or remove items from it.
No problemo! you may think - do the iteration under a lock.

But it is not that simple.

Code sample below illustrates two approaches how to do the iteration. Both have pros and cons. More on that after the code sample.

int initialItems = 5;
ICollection<string> coll = new List<string>(initialItems);

for(int i = 1; i <= initialItems; i++)
 coll.Add("item" + i.ToString());
   
//#1 iterating with lock approach
lock(coll)
{
 foreach(string item in coll)
 {
  PerformWorkWithItem(item);    
 }
}
//

//#2 iteration over a copy 
ICollection<string> copyColl = null;
lock(coll)
{
 copyColl = new List<string>(coll);
}

foreach(string item in copyColl)
{
 PerformWorkWithItem(item);    
}
//    

void PerformWorkWithItem(string item)
{
 //
 // perform operations that can take some 
 // considereable amount of time     
 //
}

Welcome back.

Approach #1 uses global lock for iteration. That means that while iterating collection is protected by the lock.
The pros are:
  • simplicity (just put the lock and do the job)
  • Memory efficiency - no new object are constructed
The cons are:
  • if PerformWorkWithItem takes long time to complete or is blocking (i.e. reading data from the network) access to collection is blocked for considerable amount of time
  • action with a collection item is also protected by the lock

Approach #2 uses different technique. It locks access to the collection only to perform a copy (snapshot) of the original collection. Iteration and PerformWorkWithItem action is made over a snapshot and is not protected by the lock.
The pros are:
  • Operations on collection items are done without locking the collection. If PerformWorkWithItem takes long time to complete original collection is not locked as in #1
  • Allows to schedule actions on collection items using separate thread
The cons are
  • If original collection is large enough performing data copy can become inefficient
  • Add complexity. While performing actions on snapshot items of the original collection may have been already changed.

Now that we know pros and cons of these two approaches we can deduce some hints that can help choose appropriate technique.

For instance, if PerformWorkWithItem action is relatively fast and there is no problem for the rest of the application to wait for iteration process then approach #1 is the best.

On the other hand if PerformWorkWithItem can take considerable amount of time and other parts of the application frequently access the collection (i.e. it is not desirable to block access to the collection for a long time) then #2 can do.

P.S. There also exists an approach #3. It utilizes lock-free data structures. But it is a whole new story and a topic for separate post.

Tuesday, June 15, 2010

AesManaged class Key and KeySize properties issue

Today when working with AesManaged class I've encountered very strange behavior.
If you have a code like this - you're in trouble:

AesManaged aes = new AesManaged();
aes.Key = key;
aes.KeySize = key.Length; //the problem
The problem with this code is setting KeySize after setting Key value.
When you set KeySize after Key - the previously specified key is discarded and a brand new key value is generated and put into Key property

I find this behavior rather strange, especially that there is no information describing what will happen after setting KeySize.

I would expect that when Key value is set setting KeySize will throw exception if specified key's size is bigger or smaller than the new one.

Wednesday, April 28, 2010

The Big Bang Theory sitcom scientific background

Usually I do not write about TV. But the serial in the subject is one of my favorite.

Recently I've found blog of the guy who does scientific background for that sitcom.
There are a lot of interesting scientific facts on that blog in the context of the TV show.

I totally recommend reading it even if you do not watch The Big Bang Theory
The url of the blog is http://thebigblogtheory.wordpress.com/

Friday, April 16, 2010

Refactoring code with lambda expressions

Without much ado lets go straight to the code that needs to be refactored:

bool SomeMethod(long param)
{
   //
   // some prefix code
   //
   try
   {
      //do specific job here
      return DoSpecificJob(param);
   }
   finally
   {
      //
      // some suffix code
      //
   }
}

Result SomeOtherMethod(string name, int count)
{
   //
   // some prefix code
   //
   try
   {
      return DoOtherSpecificJob(name, count);
   }
   finally
   {
      //
      // some suffix code
      //
   }
}
The question is how we can bring prefix and suffix code from the example above in one place (method) without changing code logic.
The goal is to have these two methods rewritten like this:
bool SomeMethod(long param)
{
   return DoSpecificJob(param);
}

Result SomeOtherMethod(string name, int count)
{
   return DoOtherSpecificJob(name, count);
}
While another method will be created that executes prefix and suffix code.

There are several ways how to do that:
1. Create method that contains prefix and suffix code, accepts Delegate object and params object[] array
object ExecuteCommon(Delegate d, params object[] args)
{
   //
   // prefix code
   //
   try { return d.DynamicInvoke(args); }
   finally
   {
      //
      // suffix code
      //
   }
}

bool SomeMethod_First(long param)
{
   Delegate d = new Func<long, bool>((b) => SomeMethod(b));
   return (bool)ExecuteCommon(d, new object[] {param});
}

Result SomeOtherMethod_First(string name, int count)
{
   Delegate d = new Func<string, int, Result>((n, c) => SomeOtherMethod(n, c));
   return (Result)ExecuteCommon(d, new object[] { name, count });
}
The approach looks nice but has several caveats. The problems here are: boxing (wrapping value types into reference types) and casting.
2. Move repeated code up on the call stack
This approach is possible if SomeMethod and SomeOtherMethod are on the same call stack level or called from the same method.

3. Create a generic method that accepts generic delegate and defines several parameters
TResult ExecuteCommon<T1,TResult>(Func<T1, TResult> func, T1 param1)
{
   //
   // some prefix code
   //
   try { return func(param1); }
   finally
   {
      //
      // some suffix code
      //
   }
}

TResult ExecuteCommon<T1, T2, TResult>(Func<T1, T2, TResult> func, T1 param1, T2 param2)
{
   //
   // some prefix code
   //
   try { return func(param1, param2); }
   finally
   {
      //
      // some suffix code
      //
   }
}

bool SomeMethod_Third(long param)
{
   return ExecuteCommon<long, bool>((p) => SomeMethod(param), param);
}

Result SomeOtherMethod_Third(string name, int count)
{
   return ExecuteCommon<string, int, Result>(
      (n, c) => SomeOtherMethod(n, c), name, count);
}
This method does not use casting and there is no boxing present when value type parameters are specified. However, the caveat is that you need to define multiple methods with variable type parameters count. In my opinion the third approach is the best although we have to define two methods that execute prefix/suffix code.

P.S. Another way of refactoring here is code generation: emitting code on the fly or using template tools like T4 templates in Visual Studio.

kick it on DotNetKicks.com

Thursday, December 17, 2009

Mono: C# compiler bug with property inheritance

The bug appeared quite unexpectedly.
In Visual Studio code sample below compiled fine. But doing the same with Mono C# compiler results in error: Compiler Error CS0546: 'Derived2.Accessor.set': cannot override because 'Base.Accessor' does not have an overridable set accessor.
It must've been a bug with the compiler I thought to myself. Mono bugzilla search confirmed that bug was there.

Here is the code sample that produces error report (workaround is described under it):

abstract class Base
    {
        public virtual string Accessor
        {
            get { return "base"; }
            set { }
        }
    }

    class Derived1 : Base
    {
        public override string Accessor
        {
            get { return base.Accessor; }
        }
    }

    class Derived2 : Derived1
    {
        public override string Accessor
        {
            set { }
        }
    }
Workaround for this error is to add set property to Derived1 class:
public override string Accessor
        {
            get { return base.Accessor; }
            set { base.Accessor = value; }
        }
Happy coding :)

Sunday, December 13, 2009

MSSQL DATEDIFF Equivalent in MySQL

Recently I was porting T-SQL (MSSQL) code into SQL dialect used by MySQL.
Process went smoothly until I have stuck with dates. Especially intervals between two dates. In T-SQL datediff function is used to get interval between to datetime values.

Let us consider this T-SQL sample:

declare @d1 datetime;
declare @d2 datetime;
set @d1 = '2009-01-18 15:22:01'
set @d2 = '2009-01-19 14:22:01'
select datediff(hour, @d1, @d2) as hour, 
       datediff(day, @d1, @d2) as day,
       datediff(second, @d1, @d2) as second
Query results are:
hourdaysecond
23182800

After doing some searching I found out that MySQL equivalent is:
set @d1 = '2009-01-18 15:22:01';
set @d2 = '2009-01-19 14:22:01';
select timestampdiff(hour, @d1, @d2) as hour,
       timestampdiff(day, @d1, @d2) as day,
       timestampdiff(second, @d1, @d2) as second;
Query results are:
hourdaysecond
23082800
Query results are nearly the same except day difference. Somehow, MSSQL treats 23 hours as one day.

In general we can think of timestampdiff (MySQL) as 1-to-1 equivalent of datediff (MSSQL). To make them truly equal it is better to get difference in seconds and then convert (calculate) required interval (hours, days).