Videos

Perform Fuzzy Lookups / Approximate String Matching in Excel

Most likely you have a love/hate attitude towards spreadsheets. This highly useful tip will make you fall back in love with Excel. Microsoft does a great job at providing a solid add-in that makes fuzzy lookups relatively easy to perform.

Are there more powerful approximate string matching tools out there? Of course. But if you’re using Excel, this tool should be used before applying more complicated methods.

Everyone loves visualizations but sometimes you have to roll up your sleeves and clean up the data.

Enjoy!

See also: Get Out of the Spreadsheet Abyss

Download: Fuzzy Lookup Add-In for Excel

Create a Well Designed Pareto Chart in Tableau

In this video I will show you how to visualize Vilfredo Pareto’s namesake chart in Tableau. The Pareto Principle defines the 80/20 rule in that roughly 80% of the effects come from 20% of the causes.

I will use sample Tableau Superstore data to determine which states are responsible for 80% of sales. I’ll start with a basic Pareto chart and then move on to a visualization with a little more flair. This video should serve you well in your future data analyses.

 

B.I. Basics Part 4: Learn the QlikView ApplyMap Function

There will come a time in your QlikView load scripting endeavors where you will need to map a single key value to a lookup table and return the lookup value. If you’ve ever wanted a Qlikview function that is somewhat analogous to a CASE statement for simple lookups/transformations, then look no further than the ApplyMap function.

My video breaks down the hard to interpret user manual definition and provides a simple example that will have you performing QlikView lookups in no time.

B.I. Basics Part 3: Create a Gantt Chart in Excel

If you’ve ever had to put together a quick timeline to share with someone without the need to resort to full blown Microsoft Project then you will find this video helpful. I will show you how to create a very simple but effective Gantt chart that will satisfy your inner project manager. Definitely keep this tip in your Excel toolkit.

B.I. Basics Part 2: Sorting “Correctly” in Tableau

For those of you that are familiar with Tableau, you know that sorting can be an exercise in frustration and futility. Fortunately when you understand how Tableau intends its sort functionality to work, you’ll discover that there is a method to the madness. My video presents a simple solution that will alleviate your sorting frustration and should find a place in your Tableau toolbox.

Anthony Smoak Final Project: Data Visualization and Communication with Tableau

 

 

I recently earned a verified course certificate from Coursera in the “Data Visualization and Communication with Tableau” class. This class is the 3rd offered in the “Excel to MySQL: Analytic Techniques for Business” Coursera Specialization. I’m looking forward to taking a couple more MOOCs dealing with Tableau and visualization to supplement and reinforce existing knowledge. I would recommend the class to anyone looking to frame an analysis and learn a good bit about using Tableau.

Overview of Service Oriented Architecture

 

Service Oriented Architecture (SOA) can be described as an architectural style or strategy of “building loosely coupled distributed systems that deliver application functionality in the form of services for end-user applications” (Ho, 2003). Ho (2003) proclaims that a service can be envisioned as a simple unit of work as offered by a service provider. The service produces a desired end result for the service consumer. Another way to envision the concept of a service is to imagine a “reusable chunk of a business process that can be mixed and matched with other services” (Allen, 2006). The services either communicate with each other (i.e. pass data back and forth) or work in unison to enable or coordinate an activity.

When SOA is employed for designing applications and/or IT systems, the component services can be reused across the enterprise, which helps to lower overall development costs amongst other benefits. Reuse fosters consistency across the enterprise. For example, SOA enables banks to meet the needs of small, but profitable market segments without the need to redevelop new intelligence for a broad set of applications (Earley, Free & Kun, 2005). Furthermore, any number of services can be combined together to mimic a business processes.

“One of the most important advantages of a SOA is the ability to get away from an isolationist practice in software development, where each department builds its own system without any knowledge of what has already been done by others in the organization. This ‘silo’ approach leads to inefficient and costly situations where the same functionality is developed, deployed and maintained multiple times” (Maréchaux, 2006).

Architectural Model

Services are only accessed through a published application-programming interface, better known as the API. The API, which acts as the representative of the service to other applications, services or objects is “loosely coupled” with its underlying development and execution code. Any outside client invoking the service is not concerned with the service’s development code and is hidden from the outside client. “This abstraction of service implementation details through interfaces insulates clients from having to change their own code whenever any changes occur in the service implementation” (Khanna, 2008). In this manner, the service acts as a “black box” where the inner workings and designs of the service are completely independent from requestors. If the underlying code of the service were switched from java to C++, this change would be completely oblivious to would-be requestors of the service.

Allen, (2006) describes the concept of loose coupling as, “a feature of software systems that allows those systems to be linked without having knowledge of the technologies used by one another.” Loosely coupled software can be configured and combined together with other software at runtime. Tightly coupled software does not offer the same integration flexibility with other software, as its configuration is determined at design-time. This design-time configuration significantly hinders reusability options. In addition, loosely coupled applications are much more adaptable to unforeseen changes that may occur in business environments.

In the early 1990’s some financial firms adopted an objected oriented approach to their banking architecture. This approach is only superficially similar to a service oriented architecture approach. In an object oriented (OO) approach, the emphasis is on the ability to reuse objects within the source code. SOA emphasizes a “runtime reuse” philosophy in which the service itself is discoverable and reused across a network (Earley, Free & Kun, 2005). SOA also provides a solution to the lack of interoperability between legacy systems.

References:

Allen, P. (2006). Service orientation: winning strategies and best practices.

Early, A., & Free, D., & Kun, M. (2005, July 1). An SOA Approach Will Boost a Bank’s Competitiveness (ID: G00126447). Retrieved from Gartner database.

Ho, H. (2003). What is Service-Oriented Architecture? O’Reilly XML.com.

Khanna, Ayesha. (2008). Straight through processing for financial services: the complete guide.

Maréchaux, J., (2006, March 28). Combining Service-Oriented Architecture and Event-Driven Architecture using an Enterprise Service Bus. IBM developerWorks. Retrieved from http://www.ibm.com/developerworks/library/ws-soa-eda-esb/

What Exactly is Hadoop & MapReduce?

In a nutshell, Hadoop is an open source framework that enables the distributed processing of large amounts of data over multiple servers. In effect it is a distributed file system tailored to the storage needs of big data analysis. In lieu of holding all of the data required on one big expensive machine, Hadoop offers a scalable solution of incorporating more drives and data sources as the need arises.

Having the storage capacity for big data analyses in place is instrumental, but equally important is having the means to process data from the distributed sources. This is where Map Reduce comes into play.

Map Reduce is a programming model introduced by Google for processing and generating large data sets on clusters of computers. This video from IBM Analytics does an excellent job of presenting a clear concise description of what Map Reduce accomplishes.