Thursday, October 18, 2007

Strategy Pattern in C# 2.0

Strategy pattern is very convenient when it comes to algorithms selection. The selection is possible because algorithm is encapsulated in single class, so it is easier to handle multiple algorithms.

There is a C# sample of Strategy pattern on the Wikipedia. Strategies there are passed as objects into the context. With the emergence of generics in the .NET 2.0 it is possible to specify strategy as a type parameter.

In this post I'll show sample list class "SmartList" that can be tuned with different allocation strategies. Such list can be used when developer needs full control on how memory is consumed by the list. For simplicity, there is no error handling and list class has limited functionality.

So, at first we define our algorithm's interface or contract.


interface IAllocationSrategy<T>
{
T[] Allocate(T[] data)
;
}


Then we define two allocation strategies: DoubleAllocator and FixedAllocator.
DoubleAllocator when doing allocation doubles the initial memory count, while FixedAllocator makes constant increment.


class DoubleAllocator<T> : IAllocationSrategy<T>
{
#region IAllocationSrategy Members

public T[] Allocate(T[] data)
{
int length = data.Length * 2;
T[] newData = new T[length];
Buffer.BlockCopy(data, 0, newData, 0, data.Length);
return
newData;
}

#endregion
}

class FixedAllocator<T> : IAllocationSrategy<T>
{
int fixedIncrement = 20;
#region
IAllocationSrategy Members

public T[] Allocate(T[] data)
{
int length = data.Length + fixedIncrement;
T[] newData = new T[length];
Buffer.BlockCopy(data, 0, newData, 0, data.Length);
return
newData;
}

#endregion
}

SmartList provides list functionality. Items can be added to the list and obtained by index.


class SmartList<A, T> where A : IAllocationSrategy<T>, new()
{
A allocator
= new A();

T[] innerList;
int
size = 0;

public
SmartList() : this(1)
{ }

public SmartList(int? capacity)
{
innerList
= new T[capacity ?? 1];
}

public void Add(T item)
{
if (size == innerList.Length)
innerList
= allocator.Allocate(innerList);
innerList[size++] = item;
}

public T this[int index]
{
get { return innerList[index]; }
}

public int Size
{
get { return size; }
}

public int Capacity
{
get { return innerList.Length; }
}
}



And finally, sample code that uses SmartList class with FixedAllocator and int type as item type.


class Sample
{
public Sample()
{
SmartList<FixedAllocator<
int>, int> smartList =
new
SmartList<FixedAllocator<int>, int>();

smartList.Add(1);
smartList.Add(2);
smartList.Add(3);

Console.WriteLine(smartList.Capacity);
}
}

Thursday, October 04, 2007

String Splitting and Tokenization Techniques in .NET

At start I'll give the explanation what token and tokenization is.

According to wikipedia article: "A token is a categorized block of text, usually consisting of indivisible characters known as lexemes. A lexical analyzer processes lexemes to categorize them according to function, giving them meaning. This assignment of meaning is known as tokenization. A token can look like anything: English, gibberish symbols, anything; it just needs to be a useful part of the structured text."

In .NET string class has a Split method. It returns an array of strings that are substrings delimited by a specified separator. Split method is convenient, however somewhat inefficient. Let us consider an example.

Input string is "59;21;67;72;111". String.Split(';') will return an array of five strings. But what will happen if we need to find specific token, say "21". String.Split will return an array, and then we'll have to iterate through it and search for "21". Do you notice pitfall here? No?

String.Spilt inefficiency comes from its API contract - it returns an array of string, every item of which consumes memory. But we're searching only for "21", we do not need other tokens.

This example shows inappropriate scenario for String.Split(...) method.

Other approach for string tokenizing is using special tokenizers that are returning one token at a time. An example of such tokenizer can be StringTokenizer class from Java world.
Sample code in Java:

StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}

We can see that StringTokenizer is returning one token at a time. It is extremely efficient for token search scenarios as we may not need to iterate through all the tokens thus allocate less memory and perform faster.

In .NET base class library (BCL) there is no such class. But there are 3-d party analogs or ports of Java StringTokenizer.

.NET BCL actually, has very powerful set of classes that can cope with string tokenization scenario easily. These classes "live" in System.Text.RegularExpressions.
Good example of string tokenization using Regex class is provided by TrackerRealm design team in their blog.

So, there are at least three ways in .NET how strings can be split into tokens. There is a rule of thumb here, if algorithm doesn't need all tokens from the string - do not use String.Split.

Sunday, September 23, 2007

Non Recursive Binary Tree Traversal

In my previous post I was talking about drawbacks of recursive approach when traversing tree structures.

In comments on that post, Daemin and w-g mentioned then tail-recursion can be used to avoid function call overhead. Tail-recursion really helps if:
a) language of use supports it
b) your algorithm makes it possible to make a recursive call the last operation in the function

Since, tail-recursion approach is still a recursive one and there is no guarantee that the language of use will support it or the algorithm will allow it - best way not to go into trouble is to make no recursive calls at all.

Here, I'll present non-recursive binary tree traversal. Binary tree is not balanced. To make non-recursive approach possible - tree node was extended with special meta information. It includes reference to the node's parent and a special flag. Flag indicates if node was previously visited during traversal process.

Binary tree node definition in code



class TreeNode
    {
        public int Value;
        public TreeNode Left;
        public TreeNode Right;
        public TreeNode Parent;
        public byte Visited;
 
        public TreeNode(int value)
        {
            Value = value;
        }
    }


Next comes recursive iterator. This method demonstrates recursive approach. It is self-explanatory and very easy to implement. All job is done via recursive call to RecursiveIterator method.


private void RecursiveIterator(TreeNode node)
        {
            Console.Write(node.Value + " ");
            if (node.Left != null)
                RecursiveIterator(node.Left);
 
            if (node.Right != null)
                RecursiveIterator(node.Right);
        }


Non-recursive approach on the other hand uses iteration and additional variables to do tree traversal.


private void NonRecursive()
        {
            TreeNode currNode = root;
            while (true)
            {
                if (currNode == null)
                    break;
 
                if (currNode.Left != null && currNode.Left.Visited != 1)
                    currNode = currNode.Left;
                else if (currNode.Right != null && currNode.Right.Visited != 1)
                    currNode = currNode.Right;
                else if (currNode.Visited == 0)
                {
                    Console.Write(currNode.Value + " ");
                    currNode.Visited = 1;
                }
else
                    currNode = currNode.Parent;
            }
        }

Iterative approach is more complicated then recursive, it uses additional data (Parent reference and Visited flag).

So, generally, when you know that your tree structures won't be very deep recursive approach will do. And in the case, when tree depth is quite big, it is better to consider iterative approach.

Wednesday, September 19, 2007

Why Avoid Recursion For Tree Structures

The subject may seem little bit contradictory. Recursive approach is very convenient when doing search or traversals in a tree structure (e.g. binary tree).

Use of recursion in an algorithm has both advantages and disadvantages. The main advantage is usually simplicity. The main disadvantage is often that the algorithm may require large amounts of memory if the depth of the recursion is very large. High memory consumption is due to large function call number (recursion means that function calls itself multiple times).



Every function call results in thread's stack memory consumption. Default stack size is 1 Mb. So, with high recursion depth there is a risk to run out of memory.

Next time I'll present recursive and non-recursive approaches to binary tree traversals.

Sunday, September 16, 2007

Google Reader Increase Productivity

Nearly a month ago I've stopped using Internet Explorer 7 as my primart RSS reader. The reasons were:
- IE7 is not quite stable, after 4-5 days of uptime it hangs or crashes
- Feed items are bound to one computer
- It consumes CPU when feeds updating.

So, I've exported my IE7 feeds into an OPML file and imported them into Google Reader.
Now my feeds are all in one place, which is good, their update doesn't take my CPU cycles.

P.S. Lifehacker has some tips how to increase your productivity with Google Reader.

Tuesday, September 04, 2007

Most amazing tutorial I've ever read

Recently, I've stumbled upon this cool Ruby tutorial. It is called Poignant Guide to Ruby. It is so well written, I wish other programming languages had such tutorials. Even if you're not engaged in Ruby development this book can be a great reading.

Totally recommended! :)

Friday, January 26, 2007

Creating Critical System Process in .NET

Yestreday, my home computer was infected by a worm - "Win32/Brontok.A". While cleaning it up I detected that I have TWO lsass.exe processes in the task manager. lsass.exe is a system process of the Microsoft Windows security mechanisms. The worm created lsass.exe in the My Documents folder, launched it and was happily operating on my machine.



And here's most interesting fact, when you try to kill lsass.exe process via task manager, you'll receive warning, like in the picture below.

I used Process Exloperer tool to kill that process and desinfect my computer.
However, it was interesting to see that Task Manager checks process name and not some special things about system process ( digital signature? ).

I created simple console application in C#, named it lsass.exe and voila - I have criticall system process :8-)

Tuesday, November 07, 2006

Financial education

This subject really bugs me for some time. I'm software developer, engineer. In school and university we didn't study that subject. Now I see that its one of the most important and SHOULD be studied, since everything in our life is connected with finances.

Good start to understand why financial education is so important is the books by Robert Kiosaki. For Russian and Ukranian readers there's e-version of his books here

Rather good blog about finances is Get Rich Slowly

Friday, October 06, 2006

Interested On How These Invisible Little Things Look Like?

When I was a child I really enjoyed studying different little things like leaves, insects and spores with microscope.

To pity that there were no such sites like this in that time.

Windows Live Writer Team Appears To Be Of Non Microsoft Origin

Rob Mensching recently met Windows Live Writer team. And it appears that this team was acquired by Microsoft.

Microsoft tries its best to enhance its Live initiative. Well, with such brilliant teams we can expect more interesting stuff to be released under Live.