Microsoft recently announced (October 2012) its first deployment of Hadoop for Windows Server and Windows operating systems. Previously, Microsoft had made a deployment available in Windows Azure, including machine learning demos for Mahout. This new installation extends the technology for Windows (Mahout is NOT included in this first deployment, but I believe we can reasonably expect to start having that data mining option available in future deployments).
Being a new deployment, the software does not currently install completely on all versions of Windows, and in future versions after October 2012 you will likely not see any issues for installation. However, early adopters such as myself are willing to experiment with the installation settings to get the program working well. Debugging this October 2012 installation requires understanding:
- zip files
- Internet Information Services (IIS)
Having installed it successfully on several of my computers which have Windows 8 Professional, I describe the steps to make the software work on this operating system.
First, the software installs through Web Platform Installer. You will see this option if you have enabled Internet Information Services (go to Control Panel, Programs, Turn Windows Features on or off). Make sure the following elements are all on:
- Windows Communication Foundation HTTP Activation (under .NET Framework 3.5)
- ASP.NET 4.5 (under .NET Framework 4.5 Advanced Services)
- ASP.NET 3.5 (under Application Development Features)
- ASP.NET 4.5 (under Application Development Features)
Having read the MSDN forums, the element often missed is the “Windows Communication Foundation HTTP Activation” which relates to how ASP.NET 3.5 requests are resolved. Internet Information Services settings are NOT standard, and in my case, I had already tinkered with mine because of a Silverlight application. None of these settings should be taken for granted by developers, who should understand exactly what the minimum IIS installation settings are for each and every application deployment. I make this statement now, and you should hold all future Microsoft projects to this best practices deployment standard. Also, if you are deploying any software projects dependent on IIS, make sure you have an IIS expert look at your deployment settings so that you can best help your users and reduce any confidence concerns in your work.
If this HTTP activation is not set, then you might receive the following error:
Server Error in ‘/’ Application. ——————————————————————————–
Could not load type ‘System.ServiceModel.Activation.HttpModule’ from assembly ‘System.ServiceModel, Version=18.104.22.168, Culture=neutral, PublicKeyToken=b77a5c561934e089′. Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.TypeLoadException: Could not load type ‘System.ServiceModel.Activation.HttpModule’ from assembly ‘System.ServiceModel, Version=22.214.171.124, Culture=neutral, PublicKeyToken=b77a5c561934e089′.
An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.
To prevent this error, set the HTTP activation as shown in the above diagram.
To continue with the HDInsight installation, open the IIS Manager (Control Panel, System and Security, Administrative Tools, IIS Manager).
Then choose get Web Platform components — the window should open “Web Platform Installer 4.0″ or allow you to install that software.
Search for “HDInsight” in the search box and then click “Add” and “Install”.
Click through installation, and the software will start to download to your computer (expect to wait a few minutes).
I then received a pop-up window to allow the Hadoop installation of Java to have access through the Windows firewall. Though I chose the default settings, make sure the settings you set are appropriate for your environment. Java is essential for Hadoop, but because the language is comparatively hard to program, people have created other easier ways like Pig and Hive to submit commands to Hadoop.
Then, the software promises it is done — but don’t be so sure because this incomplete state is exactly why I am blogging on this topic.
On the desktop, I see three new icons, with the following names:
- Hadoop Command Line
- Hadoop Name Node Status
- Hadoop Map Reduce Status
If you click either of the “Status” icons, they open up a window making it look like the process is finished. I had done this a few times, and I thought I was done too until I started reading MSDN forums.
What I am missing is the HDInsight Dashboard shortcut icon. The files install to the “C:\” location on the hard drive, and here are the new folders you should see:
There may be conditions under which you might NOT see these four directories. Here is one:
- You are installing on a server domain controller
Sorry, there is currently no solution: you will have to wait for Microsoft to improve its installation procedure. The issue is that this installation assumes that it can make a user named “hadoop” and assign user rights to that new user. Domain users are managed under a more complex structure. I can see the complexity revealed in the PowerShell and Command files inside the ZIP files. However, I would not recommend trying to make choices to make it install since you might make some choices that put your installation down a different configuration than where Microsoft is going. The uninstall PowerShell modules were authored to reverse all changes, and when you make your own changes, you will have to remember to reverse whatever you do when installing a future HDInsight distribution. If you customize installation too much, you will not be able to join the community at MSDN forums in the case that you need technical support. The best practice is to wait for Microsoft to fix such issues.
Given that your installation has the proper four directories, I will now show you how to install the missing fourth icon. This procedure does NOT deviate from the original intent, and therefore remains consistent with the original deployment design.
You will need to unzip the following two files from their installation directory (if you have only one drive, it should be the “C” drive, but if you have several hard drives, it may be another drive letter):
- <Install Drive>:\HadoopFeaturePackSetup\Packages\HadoopDashboard-winpkg.zip
- <Install Drive>:\HadoopFeaturePackSetup\Packages\HadoopWebApi-winpkg.zip
Then, you need to modify the following file: Scripts\Modules\iis-prerequisitecommon.psm1 by changing line 118 from “dism.exe” to “c:\windows\system32\dism.exe”. Both zip files have this same identical file, so once you make one change, you can use the same PSM1 file in the other. There are two versions of DISM.exe on my system, one in the system32 directory and one in the syswow64 directory; and while I am running 64-bit Windows 8 Professional, it turns out that only the system32 version will work (which I know from trying a command from a PowerShell session). I recommend trying this version first, and if it does not work (you will know from an error) then try the syswow64. Once you make changes to these two zip packages, rezip the contents and replace the original zip packages (if you want, you can rename the original zip packages instead of deleting them). You will also need to remove (or move) the unzipped packages because the installation procedure will want to unzip to the same directory as the zip files.
Next, open a PowerShell session as “Run as Administrator”. Run the following commands (the install drive is “C” if you have one drive, but may be another drive if you have multiple hard drives):
- Set-ExecutionPolicy RemoteSigned
- cd <Install Drive>:\HadoopFeaturePackSetup\HadoopFeaturePackSetupTools
- .\winpkg.ps1 ..\Packages\HadoopWebApi-winpkg.zip install -CredentialFilePath c:\Hadoop\singlenodecreds.xml
- .\winpkg.ps1 ..\Packages\HadoopDashboard-winpkg.zip install -CredentialFilePath c:\Hadoop\singlenodecreds.xml
Having performed these commands, you will now see a fourth shorcut icon on the desktop: “Microsoft HDInsight Dashboard”. If you click the icon, you will see the following screen.
If you look inside IIS, you can see two new sites:
- HadoopDashboard (running in my case at port 8085)
- HadoopWebAPI (running in my case at port 6001)
Again, this October 2012 deployment is for developers. However, I decided to blog now since I want people to get started with this technology. You can expect Microsoft to release more stable versions in the future which address the problems on domain controllers (which is really an issue on how to grant user rights to the “hadoop” user). I am hopeful also for a future deployment which, like the Windows Azure version, includes Mahout; at minimum, we might expect someone will post directions on how to install Mahout based on a future staple distribution. The current deployment is for developers, so I do not believe there is much value in sharing even how to do it now, but instead we should wait for a future stable deployment. When we have a stable HDInsight distribution, we can start talking about machine learning and data mining on Windows operating systems.
If you want to try Mahout on Windows Azure, click this link: https://www.windowsazure.com/en-us/develop/net/tutorials/hadoop-recommendation-engine/