Total Pageviews

Sunday, March 29, 2015

Smurf..An ASA based tool in making- Part -1

This will be a blog series of all the learnings during the course of developing a tool on Azure Stream Analytics (ASA) called Smurf. This introductory blog will start with bits and pieces we want to create this tool.

I will not describe on what is ASA on what it does. It is described well in msdn sites. I will try to focus on all the other different tools which are required to create this project.

What are the components?

Azure Stream Analytics can work on 3 kinds of inputs.
1. Data from SQL Azure
2. Data from Azure Storage
3. Data from EventHubs.
We will touch each component as we proceed.

Starting with Event hub component

I am starting with the Event hub component first. Why I chose it is because of its impressive success story with ASA. As we know event hubs can work on partitioned data and also helps developer to easily scale up. Therefore a story with Event Hub and ASA will indeed be impressive
To build Event Hub support these are the following components which we will highlight as we go
1. Send data to event hub
2. Process data based on user defined business logic (this seems tricky, let's see how it goes)
3. Receive data from event hub

Send data to Event hub

Sending data to event hub is pretty much straight forward. You need to do the following
1. Create the event hub client using event hub name and event hub connection string
2. Create list of eventhubdata objects
3. Start sending these objects asynchronously
4. Verify if all the data has been sent

1. Create the event hub client  

/// <summary>
/// Sends the Batch of inputs
/// </summary>
/// <param name="eventDataList">List of EventData</param>
/// <returns>Responds with success </returns>
public async Task<bool> SendBatchAsync(IEnumerable<EventData> eventDataList)
                var eventHubClient = this.GetEventHubClient();
                foreach (var eventData in eventDataList)
                    await eventHubClient.SendAsync(eventData);

                    await Task.Delay(200);
            catch (Exception exception)
                return false;

            return true;

/// <summary>
/// Gets the Event Hub Client
/// </summary>
/// <returns>EventHubClient Object</returns>
private EventHubClient GetEventHubClient()
            var client = EventHubClient.CreateFromConnectionString(this.EventHubConnectionString, this.EventHubName);
            return client;


2. Create list of Eventhubdata objects

3. Start Sending them to Event Hub asynchronously

/// <summary>
/// Sends the Batch of inputs
/// </summary>
/// <param name="eventDataList">List of EventData</param>
/// <returns>Responds with success </returns>
 public async Task<bool> SendBatchAsync(IEnumerable<EventData> eventDataList)
                var eventHubClient = this.GetEventHubClient();
                foreach (var eventData in eventDataList)
                    await eventHubClient.SendAsync(eventData);
                    await Task.Delay(200);
            catch (Exception exception)
                return false;
            return true;

4. Verify using Xunit Test

        public async Task SendEventsToEventHub()
            var tasks = new List<Task>();
            var random = new Random();
            var eventDataList = new List<EventData>();
            //Create Event Hub Data List
            for (var loop = 0; loop < 5; loop++)
                var info = new MetricData() { DeviceId = random.Next(NumberOfDevice), Temperature = random.Next(100) };
                var serializedString = JsonConvert.SerializeObject(info);
                var data = new EventData(Encoding.UTF8.GetBytes(serializedString))
                    PartitionKey = info.DeviceId.ToString()
                data.Properties.Add("Type", "Telemetry_" + DateTime.Now.ToLongDateString());
            var sendObj = new Sender(this.EventHubName, this.EventHubConnectionString);
            var sentSuccess = await sendObj.SendBatchAsync(eventDataList);

Now that the send is working let me capture the learnings on receive in my next blog

Tuesday, December 16, 2014

Trace Investigation with HDInsight Hive query


This article is to parallelize  trace investigations using Hadoop Map Reduce and thereby reduce time and effort in investigations. 

Current Problem

1.       During on call events there is a time and effort investment in recognizing the exact issues. There is no mechanism which can give a prediction of probable issues.
2.       Traces are big data. Huge GB files are scanned to filter out exact traces. Currently we scan through traces which we download as buffer locally. Here Parallel execution of multiple filters is putting more load to the system.
3.       We are having 10-15 filter strings but executing all of them in one go is not possible in current scenario.

Proposed Solution
We propose a solution of using HDInsight to use Hive Query and do parallel execution using map reduce . Map Reduce is a technology to divide the large data in multiple chunks and send it to mappers. Mappers are executors which work on small data size and provide output. Outputs from different mappers are combined and reduced to a cumulative output using Reduceer.

We can execute Hive query language implementation on the winfab traces to accomplish multiple known filters consequently on the big data and filter out the traces in text format.

Before going forward here are few terminology that will help us the technologies used for this parsing.

What is Hive
Hive is a data warehousing infrastructure based on Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing (using the map-reduce programming paradigm) on commodity hardware.
Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides a simple query language called Hive QL, which is based on SQL and which enables users familiar with SQL to do ad-hoc querying, summarization and data analysis easily. At the same time, Hive QL also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

What is HDInsight

HDInsight is a Hadoop distribution powered by the cloud. This means HDInsight was architected to handle any amount of data, scaling from terabytes to petabytes on demand. You can spin up any number of nodes at anytime.

Powershell Script

In powershell execute Get-AzurePublishSettingsFile.
Currently we have a hdinsight created in a different Azure account.

1.       Save the publishsetting file in some location
2.       Execute Import-AzurePublishSettingsFile -PublishSettingsFile <<filepath where you saved the publishsettingfile>>
3.       Verify connection using Use-AzureHDInsightCluster -Name wftraceanalyzer
Note: There is an issue in connecting to the cluster, it might be because of incompatible powershell and Hadoop SDK. Please read this link
4.       After verification, execute the following powershell script

$clusterName = "<<hdinsight cluster name>>"
$subscriptionName ="<<subscription name>>"
$storageAccountName="<<storage account name>>"
$storageAccountKey = "<<storage key>>"
$destfolder ="hive/warehouse/traces/"
$logoutputfolder = "F:\Hadoop\Trace"

$destContext = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey

$files = Get-ChildItem $inputfilepath\*.* -Include *.trace | Where-Object {$_.Name -match "fabric_trace"}
foreach($file in $files)
    $fileName = $file.Name   
    $blobName = "$destfolder$fileName"
    Set-AzureStorageBlobContent -File $file -Container $containerName -Blob $blobName -Context $destContext -Force

#Hive query to create the table on the file structure foundation and load the trace data in hive table
$querystring="DROP TABLE trace; CREATE TABLE trace (date STRING, level STRING, process INT, thread INT, type STRING, text STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; LOAD DATA INPATH '/hive/warehouse/traces/' INTO TABLE trace;"
$jobDef = New-AzureHDInsightHiveJobDefinition -Query $querystring
$hivejob = Start-AzureHDInsightJob -Cluster $clusterName -JobDefinition $jobDef

#Troubleshooting 1:type~RE.PrimaryReplicate@<PartitionId>
$querystring= "SELECT * from trace where type like '%RE.PrimaryReplicate%'"
$response = Invoke-Hive $querystring | Out-File $logoutputfolder\REPrimaryReplicate.txt

#Troubleshooting 2:type~FM.*Upgrade
$querystring= "SELECT * from trace where type like '%FM%Upgrade%'"
Invoke-Hive $querystring | Out-File $logoutputfolder\FMUpgrade.txt

#Troubleshooting 3:type~FM.AppUpgrade
$querystring="SELECT * from trace where type like '%FM%AppUpgrade%'"
Invoke-Hive $querystring | Out-File $logoutputfolder\FMAppUpgrade.txt

#Troubleshooting 4:type~FabricNode.NodeOpening
$querystring="SELECT * from trace where type like '%FabricNode%NodeOpening%'"
Invoke-Hive $querystring| Out-File $logoutputfolder\FabricNodeOpening.txt

#Troubleshooting 5:type~CM.*Upgrade && level~Warning
$querystring="SELECT * from trace where type like '%CM%Upgrade%' and level like '%arning%'"
Invoke-Hive $querystring| Out-File $logoutputfolder\CMUpgrade.txt

#Troubleshooting 6:type~FM.*Upgrade && ~pending
$querystring="SELECT * from trace where type like '%FM%Upgrade%' and text like '%pending%'"
Invoke-Hive $querystring| Out-File $logoutputfolder\FMUpgradePending.txt

#Troubleshooting 7:type~FM.FabricUpgrade && ~pending
$querystring="SELECT * from trace where type like '%FM%FabricUpgrade%' and text like '%pending%'"
Invoke-Hive $querystring| Out-File $logoutputfolder\FMFabricUpgradePending.txt

#Troubleshooting 8:type~RA.Upgrade
$querystring="SELECT * from trace where type like '%RA%Upgrade%'"
Invoke-Hive $querystring| Out-File $logoutputfolder\RAUpgrade.txt

#Troubleshooting 9:type~RA.FT|RAP.FT|RAP.Api && text ~ "on node <node_id>"
$querystring="SELECT * from trace where type like '%RA%FT' or type like '%RAP%FT%' or type like '%RAP%Api' and text like '%on node%'"
Invoke-Hive $querystring| Out-File $logoutputfolder\RAFT.txt

#Troubleshooting 9:type~RE.PrimaryCatchUpQuorum
$querystring="SELECT * from trace where type like '%RE%PrimaryCatchUpQuorum'"
Invoke-Hive $querystring| Out-File $logoutputfolder\REPrimaryCatchUpQuorum.txt

#Troubleshooting 10:type~Lease.SlowSendCompletion
$querystring="SELECT * from trace where type like '%Lease%SlowSendCompletion%'"
Invoke-Hive $querystring| Out-File $logoutputfolder\LeaseSlowSend.txt

#Troubleshooting 11:type~Config && text~value
$querystring="SELECT * from trace where type like '%Config%' and text like '%value%'"
Invoke-Hive $querystring| Out-File $logoutputfolder\Config.txt

5.       There are bunch of filters which are taken and sent using Hive query and corresponding logs are stored in output folder

Input folder: where the traces are kept

Output folder:

The execution takes 5-15 min based on the data size. Eventually it gives all the traces.


Sunday, March 17, 2013

3 days to create a data visualizer tool from scratch

In this blog I will like to pen down my 3days experience in learning a technology called NodeXL from scratch and creating an app out of that which can be downloaded for usage at
This could have been done in a shorter span, but considering my laziness, the delay was bound to happen J
Following is the story line best to my knowledge.

March 11, 2013 6:00pm

One of the member of the Microsoft’s Most Valued Professional group,  saw the Data Visualization section in NetMe and asked me about my experience in NodeXL.
NodeXL is a free open source set of libraries which helps you create a graph out of a GraphML or GEXF file format. You can also create your custom graphs using the libraries. This is an easy to learn feature and very effective in Data Visualization.
 The objective was that NodeXL comes with an Microsoft Office Excel dependency, he was looking for an approach how we can remove this dependency and create a windows form app which will fetch the twitter related nodes and its children and show it in a graphical format.
NodeXL was totally new to me and creating an app on that seems to be challenging. I thought to give a try and see how it goes.

March 11, 2013 9:00 pm

I got to know the basic features of what NodeXL do. Downloaded the source code of the same, but diving into 18 projects in the source code repository was not only time consuming but also not feasible for me. So I posted the discussion in NodeXl community forum at
To my query, I was really satisfied by the quick response given by the moderator. At around 10 pm IST I received a response from him that it was possible and there was a library set that we can download and start using.

March 13, 2013

Till now I was confident enough about the usage in NodeXL. I prepared a small POC wherein I created a graph using a sample graphml file. GraphML file format are used to create directed, undirected and mixed graphs. It is based on XML based file format. A sample GraphML format could be created as likewise
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="" 
  <graph id="G" edgedefault="undirected">
    <node id="n0"/>
    <node id="n1"/>
    <node id="n2"/>
    <node id="n3"/>
    <node id="n4"/>
    <node id="n5"/>
    <node id="n6"/>
    <node id="n7"/>
    <node id="n8"/>
    <node id="n9"/>
    <node id="n10"/>
    <edge source="n0" target="n2"/>
    <edge source="n0" target="n3"/>
    <edge source="n1" target="n2"/>
    <edge source="n2" target="n3"/>
    <edge source="n3" target="n5"/>
    <edge source="n3" target="n4"/>
    <edge source="n4" target="n6"/>
    <edge source="n6" target="n5"/>
    <edge source="n5" target="n7"/>
    <edge source="n6" target="n8"/>
    <edge source="n8" target="n7"/>
    <edge source="n8" target="n9"/>
    <edge source="n8" target="n10"/>

Previous NodeXL community discussions helped me a lot in this. I am not highlighting the code as that can be easily searched in the NodeXL community discussion forum. Rather let me focus on the major problem statement of how to interact NodeXL with Twitter API.
At first I thought to take help of community member on this so posted the second thread on the forum

March 14, 2013

To this I got the whole step by step approach by the moderator on how we can use the NodeXL libraries to fetch twitter data. This was indeed of great help, but at this moment the Azure Mobile script which I wrote for NetMe flashed up. I thought to reuse the code.
The code was to fetch top 18-20 tweets with their username for any particular searched keyword.
Following is the AMS Node.JS code that is fulfilling the requirement
function read(query, user, request) {
function readExpert(request)
{    console.log(request.parameters.topic);
    var res="<xml>";  
   var req = require('request');     
    req.get({            uri:""+request.parameters.topic+"&result_type=mixed",            headers: {'content-type': 'application/x-www-form-urlencoded'},      

        if (!error && response.statusCode == 200) { 
               var resp = JSON.parse(body);
               for(var attribute in resp)
                   if(attribute == "results")
                       var len = resp["results"].length;                      
                       for(var loop=0; loop< resp[attribute].length; loop++)
                                res += "<Tweet>"
                                res += "<UserName>" + resp[attribute][loop].from_user_name +"</UserName>";
                                res += "<UserStatus>" + resp[attribute][loop].text+ "</UserStatus>"
                                res += "</Tweet>";                         
               res += "</xml>";
               request.respond(200, res);
            request.respond(statusCodes.BAD_REQUEST, error);

Now the next step was to use this data and convert it into graph using NodeXL libraries. Here let me make a note for all the libraries I used
1.       Smrf.NodeXL.Adapters
2.       Smrf.NodeXL.Control.WPF
3.       Smrf.NodeXL.Core
4.       Smrf.NodeXL.Layouts
5.       Smrf.NodeXL.Visualization.Wpf
The code for creating a graph is as follows

XmlDocument graphML = new XmlDocument();
//Helper class is having the logic of sending the request to Azure Mobile Service which has been mentioned in my previous blog

                HelperClass _helperClass = new HelperClass("", <<Azure Mobile Service Key>>, Guid.NewGuid().ToString());
                string response = _helperClass.ReadExperts(topic);
                string tweet = String.Empty;
                XmlDocument doc = new XmlDocument(); 

//Define NodeXL control
                NodeXLControl nodexl_Graph = new NodeXLControl();
if (response != "")
                    IVertex headVertex = nodexl_Graph.Graph.Vertices.Add();
                    headVertex.SetValue(ReservedMetadataKeys.PerVertexShape, VertexShape.Circle);
                    headVertex.SetValue(ReservedMetadataKeys.PerVertexLabel, topic);
                    headVertex.SetValue(ReservedMetadataKeys.PerVertexLabelFillColor, System.Drawing.Color.Blue);
                    foreach (XmlNode node in doc.SelectNodes("//xml//Tweet"))
                        IVertex vertex = nodexl_Graph.Graph.Vertices.Add();
                        foreach (System.Xml.XmlElement item in node)
                            if (item.Name == "UserName")
                                vertex.SetValue(ReservedMetadataKeys.PerVertexShape, VertexShape.Circle);
                                vertex.SetValue(ReservedMetadataKeys.PerVertexLabel, item.InnerText);
                            if (item.Name == "UserImage")
                                var webClient = new WebClient();
                                byte[] imageBytes = webClient.DownloadData(item.InnerText);
                                BitmapImage imageSource = new BitmapImage();
                                using (MemoryStream stream = new MemoryStream(imageBytes))
                                    stream.Seek(0, SeekOrigin.Begin);
                                    imageSource.StreamSource = stream;
                                    imageSource.CacheOption = BitmapCacheOption.OnLoad;
                                vertex.SetValue(ReservedMetadataKeys.PerVertexImage, imageSource);
                                vertex.SetValue(ReservedMetadataKeys.PerVertexLabelFillColor, Colors.Transparent);

                            if (item.Name == "UserStatus")
                                IVertex childvertex = nodexl_Graph.Graph.Vertices.Add();
                                childvertex.SetValue(ReservedMetadataKeys.PerVertexShape, VertexShape.Square);
                                childvertex.SetValue(ReservedMetadataKeys.PerVertexLabel, item.InnerText);
                                childvertex.SetValue(ReservedMetadataKeys.PerColor, System.Drawing.Color.Green);
                                IEdge childEdge = nodexl_Graph.Graph.Edges.Add(vertex, childvertex);
                                childEdge.SetValue(ReservedMetadataKeys.PerColor, System.Drawing.Color.Red);
                            IEdge edge = nodexl_Graph.Graph.Edges.Add(headVertex, vertex);
// NodeXL is a WPF control. If you need to host this control in Windows form you need to create an element host control and assign the control to that.
                elementHost_Graph.Child = nodexl_Graph;


This app has been uploaded under the downloads page in .  To try this app you need to download the zip folder, extract it and run the msi file. It will get installed in your machine and you will get to see the NetMe shortcut in your desktop. As this is a POC, I have not focused much on the look and feel of the app.
Once the app gets installed you will be able to search for any keyword and see the corresponding graph. You can shift the nodes accordingly.
The major intent of this blog was to highlight how fast and easy it is to code using NodeXL libraries and create your own data visualizer tool within a short time span. Hope you will like it  J