Welcome to Technolog Sign in | Join | Help

March 2006 - Posts

WOW64 stands for Windows on Windows64. It emulates Win32 mode for programs that were compiled for Windows 32-bit mode. It _even_ will emulate SystemInfo API calls! So if you ask GetSystemInfo to supply info about the processor architecture, or if you query through API GetEnvironmentVariable(L"PROCESSOR_ARCHITECTURE"...) Wow64 will make your 32-bit feel comfortable by saying: "Yes, I am an Intel compatible x86 CPU!".

So I was puzzled, my SETUP needed to do some tasks and it had to simply detect wether or not we are on a x64 windows edition.
The solution seemed simple. Try it for yourselves.

Boot to Windows x64 (if not done so yet) and run file://C:\WINDOWS\SysWOW64\cmd.exe this is the good old 32-bit command prompt from the Windows XP time.

Let's examine the environment variables you might need. The output in my case would be...

ALLUSERSPROFILE=C:\Documents and Settings\All Users
APPDATA=C:\Documents and Settings\Administrator\Application Data
CommonProgramFiles=C:\Program Files (x86)\Common Files
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramW6432=C:\Program Files\Common Files
HOMEPATH=\Documents and Settings\Administrator
PROCESSOR_IDENTIFIER=AMD64 Family 15 Model 47 Stepping 0, AuthenticAMD
ProgramFiles=C:\Program Files (x86)
ProgramFiles(x86)=C:\Program Files (x86)
ProgramW6432=C:\Program Files
USERPROFILE=C:\Documents and Settings\Administrator

Evidentially you must solve the 'detect my AMD64' challenge through reading a new variable, named 'PROCESSOR_ARCHITEW6432' and not PROCESSOR_ARCHITECTURE.
If the variable does not exist, you are obviously running in a native win32 environment.

Update july 26 2006: Somebody made me awake, and pointed to the Kernel32 function: IsWow64Process(..) (see msdn).

I had to do some maintenance on a Windows ® service and found that you can use an ATL service as a windows service very well without all the overhead.


There are some samples on the internet, but they stop where I wanted to continue. That is, run UI-less services for instance, a job that could be done through a task scheduled-job but better be done through a service?


b.t.w. for the ones interested in a minimal windows service and who feel a sick inclination J writing the smallest EXE possible, they should use  a plain C implementation provided by  Microsoft’s ® Platform SDK located at [Platform SDK folder]\Samples\WinBase\Service and they should not use MS VC 8 but 4 or so...


Note that the grey highlighted code, is my code, the normal colored code is left over of what the MS Visual Studio created for you through the wizard. In addition, I’ve snipped all trace & debugging code.

The Visual Studio 2005 wizard creates an EXE that is suitable both as COM service (DCOM) and as windows service. You can guess a lot of overhead and unneeded lines.


This service does not perform anything, keep in mind it is a template for a 'scheduled' task-service, a batch for instance. Your code comes within DoDBMSJob...


// Service_Template.cpp : Implementation of WinMain


#include "stdafx.h"

#include "resource.h"


#include "Service_Template.h"

#pragma warning(disable: 4482) // nonstandard extension used: 'enum CServiceStatus' used in qualified name


[v1_enum] //32 bit rulez

enum CServiceStatus


      run = 0,


      stop = 2,



class CService_Module : public CAtlServiceModuleT< CService_Module, IDS_SERVICENAME >



      // this flag in milliseconds, tells us how often

      // is checked for tasks

      DWORD m_iPollTime;

      HANDLE m_ThreadHandle; // eventually use CHandle

      HANDLE  m_hServerStopEvent;

      bool connDone;

      CServiceStatus m_ServiceStatus ;

      bool stopNow;

public :

      ~CService_Module() throw()


            if (m_ThreadHandle != NULL)


            if (m_hServerStopEvent != NULL)



      CService_Module()  throw():   m_iPollTime(10), m_ThreadHandle(NULL), m_hServerStopEvent(NULL),



            m_ServiceStatus =CServiceStatus::run;

            //m_dwTimeOut = 60000; // one minute




      // we can’t skip this! Leave as is created by wizard



      HRESULT InitializeSecurity() throw()


// we don’t need this here

            return S_OK;



      void OnPauze() throw()


            m_ServiceStatus = CServiceStatus::pauze;


            if ( m_hServerStopEvent != NULL)




      void OnStop() throw()



            m_ServiceStatus =CServiceStatus::stop;

            if ( m_hServerStopEvent != NULL)


                  stopNow= true;

                  // be sure not to kill the thread & process

                  // before it nicely ends

                  // otherwise, our pointers will ‘Release()’ while

                  // the instance has been killed already, a common bug!


                  DWORD tc = GetTickCount();

                  while(m_hServerStopEvent != NULL || GetTickCount() - tc > 20000) // max 20 seconds wait





      void OnContinue( ) throw( )



            m_ServiceStatus =CServiceStatus::run;

            if ( m_hServerStopEvent != NULL)




      void OnShutDown() throw()



            m_ServiceStatus =CServiceStatus::pauze;

            if ( m_hServerStopEvent != NULL)




      HRESULT RegisterAppId(bool bService = false) throw()


            HRESULT hr = S_OK;

            // we extend our service description!

            // on W2k and higher, this is user friendly!

            // do not forget to add a string ot the string table

            // here ‘IDS_Description’

            BOOL res = __super::RegisterAppId(bService);

            if (bService)



                  if (IsInstalled())


                        SC_HANDLE hSCM = ::OpenSCManagerW(NULL, NULL, SERVICE_CHANGE_CONFIG);

                        SC_HANDLE hService = NULL;

                        if (hSCM == NULL)

                              hr = AtlHresultFromLastError();



                              hService = ::OpenServiceW(hSCM, m_szServiceName, SERVICE_CHANGE_CONFIG);

                              if (hService != NULL)


                                    const int m_szServiceNameLen = 4096;

                                    WCHAR m_szServiceDescription[m_szServiceDescriptionLen]={0};



                                           IDS_Description, m_szServiceDescription, m_szServiceDescriptionLen);

                                     SERVICE_DESCRIPTION sdBuf = {m_szServiceDescription};

                                     res = ChangeServiceConfig2W(hService, SERVICE_CONFIG_DESCRIPTION, &sdBuf);




                                    hr = AtlHresultFromLastError();




            return hr;


      HRESULT PreMessageLoop(int nShowCmd)  throw()


            //problem how to set a timer, we must provide the proc

            // a pointer to this

            // we could hack it and use HWND instead?


#ifdef _DEBUG



            HRESULT hr = __super::PreMessageLoop(nShowCmd);

            // if we don't have any COM classes, RegisterClassObjects

            // retunrs S_FALSE

            // This Causes the process to terminate

            // We don't want this, so we return S_OK in this case      

            if (hr == S_FALSE) hr = S_OK;

            if (m_bService == TRUE && hr == S_OK)

                  m_ThreadHandle = CreateThread(NULL, 0, mainJob, this, 0, 0);


            return hr;


      void __stdcall set_CollectTime(INT iCollectTime) throw()


            m_iPollTime = iCollectTime;


      INT get_CollectTime() throw()


            return m_iPollTime;


      void __stdcall SetEventHandleForStop(HANDLE eventHandle) throw()


            m_hServerStopEvent = eventHandle;


      CServiceStatus GetServiceStatus() throw()


            return this->m_ServiceStatus;


      void CALLBACK  TimerProc(PVOID pdata) throw()



#ifdef _DEBUG



            HRESULT hr;


            //your registry and initialization comes here        

            hr = DoDBMSJob(iKeepConnection, bstrConnBuff, dwVersion);





      // here could be your task

      STDMETHODIMP DoDBMSJob(int iAction, PCWSTR bstrConnBuff, DWORD dwVersion) throw()



            /* iAction 0 do work and exit

             * iAction 1 do work, cache connection and exit

             * iAction 2 release cached connection and exit


            // Your task and cleanup code comes here...

            // cleanup code _COULD_ be done in the main class

            // but do not forget, if you cleanup there, you might

            // fall into the Thread Apartment trap!

            // Some COM objects, are not free-threaded

            // so if you clean them up when windows

            // tells you to stop execution with the thread windows

            // ‘gives’ to you, the instance might fail and crash

            return hr;





CService_Module _AtlModule;



DWORD WINAPI mainJob(LPVOID lpThreadParameter) throw()


      LARGE_INTEGER liDueTime;

      //fetch the pointer to our main class

      CService_Module* p = static_cast<CService_Module*>(lpThreadParameter);


      DWORD dwError = ERROR_SUCCESS;

      //optimize memory usage

      // should automatically increase if needed

      //::SetProcessWorkingSetSize(GetCurrentProcess(), 1280000, 2560000);


      DWORD dwWait =0;


      HANDLE hServerStopEvent = CreateEventW(

                                 NULL,    // no security attributes

                                 TRUE,    // manual reset event

                                 FALSE,   // not-signalled

                                 NULL);   // no name


      // array with 2 events we want to monitors

      HANDLE          hEvents[2] = {hServerStopEvent,

                                    ::CreateWaitableTimer(NULL, TRUE, L"ASP_Session_Collect") };




      for (;;) //endless loop


            INT iPollTime = p->get_CollectTime();

            liDueTime.QuadPart=-iPollTime * 10000;

            //this timer is not periodic and is recreated each at loop so a negatieve value means a 'relative time'.


            if (::SetWaitableTimer(hEvents[1], &liDueTime,0, NULL, NULL, FALSE) == FALSE)


                  dwError = ::GetLastError();

                  //MessageBox(NULL, _T("CreateTimerFailed"), _T("Error"), MB_OK | MB_ICONERROR);

                  goto cleanup;


            // if your Service is Apartment threaded

            // use MsgWaitForMultipleObjectsEx!

            dwWait = ::WaitForMultipleObjects( 2, hEvents, FALSE, INFINITE);


            if (dwWait == WAIT_OBJECT_0+1)           


                  p->TimerProc(NULL);// no need to suspend timer sinced this is recreated each time              


            else // it was not the timer but a service event


                  CServiceStatus status = p->GetServiceStatus();

                  if (status==CServiceStatus::stop || status == CServiceStatus::shutmedown )




                  while (status == CServiceStatus::pauze)


                        status = p->GetServiceStatus();

                        Sleep(1000); //wait 1 second


                  if (status==CServiceStatus::stop || status == CServiceStatus::shutmedown )




                  // not timer event - error occurred,





      p->DoDBMSJob(2, NULL, 0);



   if (hEvents[1])



     if (hServerStopEvent != NULL)







      return 0;




extern "C" int WINAPI _tWinMain(HINSTANCE, HINSTANCE, LPTSTR lpCmdLine, int nShowCmd) throw()


    return _AtlModule.WinMain(nShowCmd);



// we would not be complete without listing stdafx.h this time, I’ve set compatibility with W2k, that would be sufficient for most services


// stdafx.h : include file for standard system include files,

// or project specific include files that are used frequently,

// but are changed infrequently


#pragma once

#define STRICT

// Modify the following defines if you have to target a platform prior to the ones specified below.

// Refer to MSDN for the latest info on corresponding values for different platforms.

#define WINVER 0x0500         // Change this to the appropriate value to target other versions of Windows.

#define _ATL_ATTRIBUTES //allow db_command etc...

#define _WIN32_WINNT 0x0500   // Change this to the appropriate value to target other versions of Windows.


#define _WIN32_WINDOWS 0x0500 // Change this to the appropriate value to target Windows Me or later.


#ifndef _WIN32_IE             // Allow use of features specific to IE 6.0 or later.

#define _WIN32_IE 0x0550      // Change this to the appropriate value to target other versions of IE.




//explicit disable since our main service ONLY will serve as a link between windows/registration and our mainJob function







#include "resource.h"

#include <atlbase.h>

#include <atlcom.h>


#include <atldbcli.h>

using namespace ATL;



here comes a great ATL feature, support eventlog writing! Put these lines in the RGS file, that Visual Studio created for you. If you don’t do so, the event log will report this: “The description for Event ID ( 0 ) in Source ( CTemplateService) cannot be found. etc)


Add this text in CTemplateService.rgs




      NoRemove SYSTEM


            NoRemove CurrentControlSet


                  NoRemove Services


                        NoRemove EventLog


                              NoRemove Application


                                    'CTemplate Service' ß this name must match IDS_SERVICENAME!


                                          val 'EventMessageFile' = s '%MODULE_RAW%'

                                          val 'TypesSupported' = d 7








Step 2

Add one resource file (Visual Studio does not support this!) and give it the extension .mc

For instance message.mc

The contents will be:






Your Company: Template Service, returned the following error: %1


ß-- one empty line!


In visual studio, you click on the properties of this file and enter at the command line

mc message.mc


Enter:  message.h message.rc


In your stdafx.h you need to include “message.h” now.


Now you have included message.rc as a resource. A message file is a superior way to include all languages in one executable or dll while the active thread defines the actual languageid being displayed on UI. This trick existed already on NT 3.x but not many developers / translators use it for internationalization. As you see, NT –requires- you to have a message resource file just to support logging to the Event Log store. FYI! Windows ® (Vista) will soon support w3c-‘like’ logging as well! Soon we can say goodbye to the rather unpleasant and difficult to implement eventlog.



According to the ATL documentation, you can deploy your compiled service as follows:

Install it with: CTemplateService.exe –service

Deinstall it with: CTemplateService.exe -UnRegServer



How easy can one port a product from Win32 to x64? It depends on whether or not you ignore what your mom told you: "don't listen to strangers..." <- ps: ignoring this, only applies to developing!

You can learn from other ones faults, right? So in the past, years ago, when the MSDN magazine and others told us to use INT_PTR instead of INT etc at certain code you know the drill, I was not to lazy, to modify my code. So I listened to strangers...

So, today, I took an old project, still running and shining, ISP Session, and wanted to offer x64 support. I can tell you this; If I had been only had been listing a little better to the wise guys!

Just kidding. In short, Everything works on x64 systems, Visual Studio 2005, it is a 32 bit process, but it debugs fluently a 64-bit process. (I attached to W3WP.exe for instance). It stopped at my breakpoints and shows source code, I stepped through etc. It was a good experience!

So, here for the MVP part, the good words for Microsoft!

Of course, you bet, my code did not 'just compile' and run. Not because of ignoring my 64-bit warnings that the Visual Studio compiler might have produced, but because of 'optimizing' some parts of my code in the past, that I should have done better.

Here they are...

1) If the MSDN tells you to allocate a CreateStreamOnHGlobal through a GlobalAlloc that must be using 'movable memory' please do so! Well, I found that the code ran faster when using fixed memory (it was at a case, where the memory stream did not have to grow)! But Win x64, does not like this, and punished me by presenting a GP.

2) If you think that BSTR work exactly the same as before and you don't have to modify code at that, you're right, but this was not applying to me! I was managing my own BSTR allocation replacements (again because of improved efficiency on IIS on this) and I found that a BSTR allocation is somewhat different. It is as follows on 64-bit systems:

BSTR memory layout: [4 bytes (not used)] [4 bytes (length prefix)], wchar_t[length], [\0], total length of allocation is aligned on 16 bytes.
The 4 unused bytes are rather mysterious since SysStringLen() still returns a UINT length (not a UINT_PTR). Maybe this is because of future plans, far, far away?
On 32-bit systems, the prefix is really 4 bytes, and not 8.

For those who are interested in the details:
download it: http://technolog.nl/eprogrammer/bstrnocache.zip
(In combination with my CComBSTR replacement, BSTR heap management performance is outstanding, without using any cache)

3) Some third party code, ZLIB had to be recompiled into a valid 64-bit lib. If you forget to do this, your DLL will link to the wrong 'guy'. B.t.w. zlib version 1.2.3 just recompiled perfectly (kudo's to Jean-loup Gailly and Mark Adler!). To tell the C++ compiler to link to the correct lib, I had to add some conditional code in stdafx.h. Maybe, there is a smarter way to do this. Report me if so...

#ifdef WIN64
#pragma comment(lib, "zlib64.lib")
#pragma comment(lib, "zlib.lib"

After making my product strict again, it works flawless on x64 systems (and faster!).

I have an AMD Athlon 64 system, and of course, my main preferred platform today is still Win32 on XP when I develop for server environments.

In the past, I always choose an Intel platform, because of driver madness and just the affinity that you feel with the OS that Intel has. But now, on my system, I never have problems. Great. But that's not my story; I just wanted a system that could do x64 as x32 as well. Unfortunately, I still cannot test IA64 systems.

Question: Is going 64-bit just hype? mmm, when I was the CEO of my CPU baking company, I'd told you, 'of course not!'.

When I was eprogrammer, I'd told you I agree with him :). Why? Because of hard numbers!
And if you don't love hard numbers, you're certainly not like me...
Most tests involve graphics. Well, that's important as well, but what if we skip graphics? Does a 64-bits environment really beat 32-bits even when we don't do 64bit math?
So here we go.

I have a testing system. An AMD Athlon64 3200+, 1 gigabyte RAM and the fastest memory that exists for it. Don't be jealous, you soon will get a better system from your mom!

Our test does the following. It opens a big file; it will encode it to a base64 string. This is a good test, since both integer math and memory allocations play a role.
The encoding is done through a tiny COM component that I quickly wrote for this purpose. It just uses the ATL framework, <atlenc.h> which has full support for base64 coding and decoding.

Then I have a vbscript tester. It has the following lines...
(it opens oembios.bin, I just choose this because, it had to be big, so take your pick to redo the test)

Dim obj, v, t
Set obj = Createobject("NWC.Decode")
t = Timer
obj.readfile "c:\windows\system32\oembios.bin" ’12.5 MB
WScript.Echo “FileRead: “ & Timer - t
t = Timer
obj.ToBase64 v
WScript.Echo “Encode: “ &  timer -t

On XP, the 32bit OS, this takes a whopping 0.21 seconds.
FYI, on Windows 2000, it also takes a 0.21 seconds. You see, the OS is at -this- particular test, is not showing improvements over oleautomation & COM performance.

When I compile the CPP COM object, with all optimizations disabled, it would take 0.65 seconds. So you get an idea of the difference when a compiler gets smart.

On Windows Server x64 edition, the same script (it opens the same file), and the COM object compiled to x64 code, takes just 0.11 seconds!

Of course, a good performance test, would involve pure tests, so measuring a mix of operations, would garble our output.
So I improved the COM object, and it would use two memory buffers (one for decoding, the other for binary contents) and only resize them, if a bigger allocation was needed.
In addition, the function ToString() which returns a variant, became ‘byref’ method. If you do so, you can reuse and reallocate string space (not many oleautomation programmers are aware of this efficiency step).
This makes a difference, since our file was 12.5 MB in size. An encoded Base64 unicode string, would need 35MB RAM string storage, and 12.5 MB binary space plus a conversion buffer of 17 MB (because ATL assumes you use a non wide-string). It makes sense, not to destroy that memory heap space and not recreate it at each call.
So, the first time, we decode, we measure memory allocations and math, the second time, all strings and allocations would be -used-, not (re)allocated!

'Hard' numbers! (finally)
Here is a typical output of our script doing 3 times a base64 encoding big files of 12.5 MB in size. The first time (yellow), the heap cache is not effective, the second time (blue), I open a different file (not 10.5 MB) and the third time (yellow again), I reopen the first file again.

x64 environment

win32 environment

readfile1 (test1)





0.203 (! See remarks *)

Final base64 string length



Readfile 2 (test2) (10.5MB)






Final base64 string length



Readfile 1 (test3)






Final string length

35 MB


Update: Added WOW64 test. ie, a vbs in 32-bit mode and a 32-bit Com server in emulation mode.

Readfile 1: 0.016
Encode:      0.09

ReadFile 2: 0.015
Encode:      0.047

Readfile 1: 0.016
Encode:      0.09

On a intel 4 with HT and 2.7 GH speed, the numbers hardly differ with the results on Windows x64.

Differences measured
File I/O read performance: hardly any difference
Memory allocation: +/- 100% faster on x64
(non 64 bit) 32 bit Integer Math: +/- 25% faster on x64.

The biggest difference in our silly simple test was how memory is dealt with. After al, the conclusion is fairly explainable. On the X64 system, disk I/O read performance is not measurable faster, since the real bottleneck, must be hardware here, not integer math.

Memory heap allocation speed: Here we see a 100% performance boost. Or is our test on the AMD 64 running with a 32-bit just 100% slower because of the 'AMD 32-bit CPU emulation mode'? I think that the test that also ran on an Intel P4 system, answers that question.

What about the simple integer math? It has slightly better performance on a x64 system, better because our MS C++ compiler uses us much as possible available 'spare-registers' while the x32 (w)Intel CPU design, only has 4 spare-registers (like EAX,EBX, ECX and EDX) J.

And to those guys, who blamed Wow64 mode to cause a 'very slow performance' in emulation mode, this test has debunked that accusation. 

  1. If you have 32-bit software that will be running on a x64 environment, it won't run slower (but on a IA64 it will!). That is an argument not to port the software is it? Of course, if you build good old COM servers, you might think about migrating, if you've not yet decided for migrating it to .NET (msil) since COM components cannot be emulated (ie a 64-bit process, cannot load a 32- bit COM-server inProc). I also guess, that Microsoft won't port Office so soon to x64 for these reasons.
  2. * It seems that when you run a 32-bit OS on a AMD 64-bit processor, the execution and heap management (100% difference) is a lot slower as the tests show, than when you boot a 64-bit OS (windows x64 here).
  3. On an Intel PIV with a comparable performance CPU-index the same performance was measured as on the 64-bit environment.

Our final conclusion comes to the following.

Let's first rephrase our question. Does a standard 32-bit software-package profit of being ported to 64-bit software?
It depends. If your software needs 64-bit calculations, as is the case with graphical software, yes!
If the software is just as the test doing some 32-bit math and some harddisk activity, it really makes no sense spending money on porting. Just make use of the perfect Wow64 design done by Microsoft!

An other performance chart shows that software execution is not especially faster because of a new smart set of CPU instructions available, but because of the 4GB addressable memory limit!  See Active Directory benchmark here
For instance, an Active Directory benchmark, showed a whopping 10,473 percent performance improvement over searching a non-indexed attribute when there are 3,000,000 users! Why that much? Because Active Directory on 64-bit architecture, allows the database to be fully cached in memory. And there you see, when your bucks spent on porting 64-bit, earn back quickly!

Source code for the COM component (MS Visual C++ 8.0) is here...