Tuesday, October 13, 2015

data tool: Microsoft Power BI (free vs pro)

Home | Power BI
"Power BI transforms your company's data into rich visuals for you to collect and organize so you can focus on what matters to you. Stay in the know, spot trends as they happen, and push your business further."

There are Windows (desktop) version of Power BI, apps for iOS, Android and Windows
as well as SQL Server Analysis Services without moving your data to the cloud.

Pricing | Power BI

Developers Center | Power BI
"Use the Power BI REST API to push data directly from your application into a dataset in Power BI. Your dashboards will be updated in real-time when the data changes. No more waiting or having to press the Refresh button!"

"As part of the July 8, 2013, announcement of the new Power BI suite of self-service tools"

big data tool: Kudu

Resolving transactional access and analytic performance trade-offs - O'Reilly Radar
"HDFS is terrific at scans, can’t do random access at all. The idea with Kudu is, we’re building a data store that [is] pretty darn good at both. … If you’re 70-80% of the way there on both axis, then the convenience you get out of having a single system, for most people, will win out because engineering time is expensive and computers are cheap.

… In the IoT use case, you’re probably less interested in updates, but one thing that is popular is random access in that workload. You may have a bunch of time series … You do some big analytics to do some modeling..."

Kudu - Fast Analytics on Fast Data
"A new addition to the open source Apache Hadoop ecosystem, Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
Currently, a limited-functionality version of Kudu is available as a Beta."

Written in C++ , not in Java, to get max speed and avoid GC issues. 

Microsoft Linux: Azure Cloud Switch

Apparently 50% of VMs on Azure are running on Linux!
And there is even Microsoft's own Linux distribution
for running some of Azure infrastructure: switches.

Microsoft showcases the Azure Cloud Switch (ACS) | Microsoft Azure Blog
"The Azure Cloud Switch (ACS) is our foray into building our own software for running network devices like switches. It is a cross-platform modular operating system for data center networking built on Linux. ACS allows us to debug, fix, and test software bugs much faster. It also allows us the flexibility to scale down the software and develop features that are required for our datacenter and our networking needs.

ACS also allows us to share the same software stack across hardware from multiple switch vendors. This is done via the Switch Abstraction Interface (SAI) specification, the first open-standard C API for programming network switching ASICs, of the Open Compute Project (OCP). Microsoft was a founding member of the SAI effort and remains a leading contributor to the project as we view SAI as an instrumental piece to make the ACS a success."
Whoa. Microsoft Is Using Linux to Run Its Cloud | WIRED

Microsoft Has Created Its Own Linux Distro Using Azure | Digital Trends


U-SQL language for Azure Data Lake

Introducing U-SQL – A Language that makes Big Data Processing Easy - The Visual Studio Blog - Site Home - MSDN Blogs
"Microsoft announced the new Azure Data Lake services for analytics in the cloud that includes a hyper-scale repository, a new analytics service built on YARN that allows data developers and data scientists to analyze all data, and HDInsight, a fully managed Hadoop, Spark, Storm and HBase service. Azure Data Lake Analytics includes U-SQL, a language that unifies the benefits of SQL with the expressive power of your own code. U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the store and across relational stores such as Azure SQL Database."
U-SQL is built on the learnings from Microsoft’s internal experience with SCOPE and existing languages such as T-SQL, ANSI SQL, and Hive."

@t = EXTRACT date string
           , time string
           , author string
           , tweet string
     FROM "/input/MyTwitterHistory.csv"
     USING Extractors.Csv();

@res = SELECT author
            COUNT(*) AS tweetcount
       FROM @t
       GROUP BY author;

OUTPUT @res TO "/output/MyTwitterAnalysis.csv"
ORDER BY tweetcount DESC
USING Outputters.Csv();