Category Archives: Reporting

Why BI in the Cloud?

Business Intelligence (BI) has been around for a while but recently, the interest in analytics and tools to support it have become increasingly popular. Previously, only large enterprises were able to afford the infrastructure and license cost to implement traditional business intelligence. However, the advent of the cloud is an opportunity for everyone to take advantage of the transformative power of data.

img_businessIntel

Traditional BI was typically a client server setup with companies typically shelling out for dedicated equipment in their own data center or a co-lo, requiring license fees for the software and a dedicated staff to manage all the equipment and to maintain the server. The more data that you had to analyze, the more equipment that you had to buy and the more staff that you had to hire. The massive amount of investment in this area became a sunk cost and some enterprises are still tied to this model despite the fact that new models have arisen to tackle the data challenge.

The Cloud

The cloud opens up all new options for companies looking to either build out their own solution or to leverage new products that are native to cloud. There are currently three generally accepted cloud delivery models: Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). I consider SaaS and IaaS to be the models that will be the most effective for BI.

One of the concerns that are often raised regarding the cloud is security – this in particular with BI since it involves data and sometimes personal information. The question then is, how will a provider that specializes in managing a data center (in the case of IaaS) or large-scale application and data hosting (in the case of SaaS) any less secure than a company who’s focus is on their own business and not on securing the data center? You will need to verify a few things depending the delivery model that you are using. For SaaS, you will want to ensure that the provider can give you secure login and authentication, the ability to define granular levels of access, SSL, and the option of encryption at rest. For IaaS, the customer is typically responsible for setting up security but you will want to see ISAE 16, ISO 7001 certification, DDOS mitigation as well as options to provision firewalls and load balancers. The Cloud Security Alliance (CSA) is currently working on their Security, Trust and Assurance Registry (STAR) which will make it easier to determine security criteria for a cloud provider but it is currently only in the preview stage.

Deciding which Cloud Model

If you are interested in just providing data and then having software available to analyze, manipulate and create reports, SaaS products may work for you. Example of some SaaS BI companies are BIRST, PivotLink, and GoodData. All of them provide the BI-stack, which is a method for extracting, transforming and loading data, a place to store the data, and a front-end to create ah-hoc reports and dashboards. The advantages of using this style of BI is that all the management of the infrastructure and software is handled by the vendor. There is also minimal up-front costs – the model is to simply pay for what you use.

If you want to build your own BI platform, you can leverage IaaS and open-source software. You will need to find an IaaS vendor that best suites your needs – Amazon Web Services, GoGrid and Rackspace are the leaders in this space. Using IaaS, you will have full control of your infrastructure – you can determine how many servers to spin up, the security you want to use and how you want to store the data. You will also need to build out the software assuming that you have that expertise in-house. Some good open-source options are Talend for data migration and manipulation, Pentaho for data integration and analytics, BIRT for reporting and MySQL or Postgres for the database. This model requires more system administration expertise and developer resources to build out a custom BI solution but this may be worth the investment if you want tighter control of your product or have very specific custom needs. In either case, if you can leverage the features offered in open-source software, your will also minimize up-front costs and will pay for the infrastructure that you use – with the option of spinning up servers to meet demand or remove servers when they are no longer needed. You can start off with small 1GB server and expand to more cores, more RAM and more storage quickly and easily.

In addition, the flexibility of the cloud gives you the option to expand your infrastructure if you need to incorporate a Big Data solution to meet a particular use case. Currently, most of the popular technology is open-source such as Hadoop, MongoDB and Cassandra. I discuss Big Data in more depth in a previous blog post.

Ultimately, you may decide that you need all the features that are offered by a traditional BI vendor or have already made the investment in a particular infrastructure or technology. After all, these companies have been around a long time and there are many talent individuals who are well-versed in these products. However, if you are interested in lowering costs and off-load the infrastructure and software work to another vendor or are new to BI and want to get started with minimal up-front costs, the cloud based BI solutions might be the right option for you. Instead of having to project growth in order to order the hardware up-front, you will have the ability to pay-as-you-go, and add infrastructure and cost only as your growth demands. DASHbay is experienced in both delivery models discussed here and we can provide the right analytics expertise and development experience for your BI needs. Considering the growth in data, the flexibility of the cloud and the much needed analytic features of a BI solution work well together to provide a powerful, low-cost and scalable solution. Make sure that you work with the right vendors and partners to make your project a success!

The Personal Analytics System

In a previous post, I discussed the explosion of data due to the growth in compute power coupled with the advent of Web 2.0 and social networks. However, this is not the only source of new and interesting data. Not only do people generate and contribute data via check-ins and tweets, they generate data by simply going about their everyday lives. Until recently, there was no way to capture that information easily. However, new gadgets have arrived that contribute to Big Data – however, this is data captured from human activity.

The ones that are out currently are the Nike+ Fuelband, the Jawbone Up, and the Fitibit Ultra. They are all essentially pedometers on steroids – a fun way to track your activity throughout the day and integrate it with a web dashboard and your social networks. These differ from the more serious Nike+ Sportswatch or the Garmin Forerunner that include GPS and are designed for serious athletes. What the more life-style oriented devices are intended for are a low-cost way to track daily activity throughout the day as well as sleep efficiency at night. The idea is that if you can collect data of your various activities throughout the day, you will have data points from which to make a better health plan or plan for a certain goal. They are also typically tied to other apps or websites where you can track food consumption in order to have a fuller picture of caloric gains and losses.

Personal Metrics Driven Management

While the intent is to improve health, this trend towards tracking gadgets for casual use is interesting since it generates data that pertains to a specific individual. You will now have a way of tracking over time very detailed information about your activities, habits, and sleep cycles. Successful companies have long used metrics to drive decisions and improve performance and efficiency. The tools are now starting to filter down to individual to take advantage of analytics to improve their lives.

fitbit_03

While all three trackers have their metrics, the one that I have been testing is the Fitbit Ultra. I integrate it with MyFitnessPal since it has a superior food tracking database and a better mobile app. The product is easy enough to setup and it tracks things like steps taken during the day, floors climbed, and of course calories burned. It’s all interesting information that you can take to determine your current activity levels and give you insight on if you should make changes in order to meet your goals. There are some pre-set goals like achieving 10,000 steps in a week, but you can modify them to your liking.

Sample_fitbit_metrics

The amount of data collected is astonishing but the real power is not in the point-in-time snapshot of activity but rather the accumulation of data over time to determine patterns and the integration with mobile and other apps (in particular, through the Fitbit API). You may see things like your activity drops significantly on days after you have low “sleep efficiency” (this is a metric that Fitbit uses to determine how much sleep you obtain during the night without waking up). Or you realize that you workout routine isn’t active enough to offset your time in front of a computer. Sadly, I determined that most of my day was sedentary since I spend most of my time at work behind a desk. The integration is also key since it increases the amount of quality data that you can collect, for example caloric intake in the case of MyFitnessPal. Integration with mobile apps also insures that you always have a mechanism for recording data that the Fitbit does not since most people carry their smart phone with them everywhere.

Integration is Key

All these website also include the requisite integration with social network like Facebook and Twitter. Although I believe that goals are best achieved when announced and through the support of friends, it does seem a bit creepy to be announcing when you go to bed and wake up and how many calories you consumed that day.

Corporations aren’t the only ones who can benefit from better data collection and analysis methods. Personal activity trackers now give the power of automated data collection and analysis to consumers. The websites even follow the Metrics Driven Management technique of a dashboard that displays all your pertinent metrics with the ability to drill-down for additional details. Data is now everywhere, even in your every day activities. Companies are now using data collection techniques and business intelligence technology to bring analysis to all aspects of our lives.

Customer retention metrics

Last night (Tue July 19th), I was fortunate to be able to speak to the SVForum Business Intelligence special interest group (SIG).

After introducing the audience to DASHbay, I took them through an implementation we did using our Quick Analysis practice, which leverages open source software (especially BIRT and postgresql), cloud computing (on AWS), and rapid, iterative development.

The implementation itself was a dashboard, built with BIRT in less than a week and showing metrics for account acquisition and retention. The metrics help any business track not just how well they are acquiring customers, but how well they are keeping them.

Account retention dashboard
Our customer was able to get at the metrics via a URL to a server running in the cloud, set up just for them. It’s a great way to leverage cloud computing: no IT procurement costs or delays, and you only pay for it while you need it.

We talked about DASHbay’s Report Server product, which among other features, allows us to capture any useful piece of the report, and include it in any web page. It also provides permissioning and authentication, taxonomy for organizing reports, and more.

I got an excellent reception from the audience, and was pleased with the reaction and discussions afterwards. Thanks to all who attended!

If you didn’t get a chance to be there, please get in touch so we can talk to you more about our Report Server for BIRT, our Quick Analysis Service, or many custom BI and Data Analytics services. Customer Retention is one very useful application which we can provide, but our tools and techniques are applicable to most common business analysis problems.

Terry

Why DASHbay didn’t crash when Amazon-East did

By now, we’ve all heard about the Great Cloudburst of 2011. On April 21, Amazon’s Virginia-based data center experienced a huge reduction in service, triggered by what the company called “a networking event” and subsequent “re-mirroring of EBS volumes”.

I’ll leave examinations of the cause and response to other websites, and discuss the impact to DASHbay.

DASHbay builds and supports data-centric applications, focusing on open source software solutions, and often using cloud deployments. Amazon is our most frequently-used cloud data services provider.

At the time of the crash, several of our customers had mission-critical DASHbay-deployed applications running in the cloud. How did those customers fare, and therefore, how did DASHbay fare, since our customers’ problems are our problems?

I’m pleased to report that none of our customers were severely impacted by the outage.

Why not?

Here are some case studies of apps we built for clients, and the mitigation strategies that saved our bacon during Amazon’s failure.

The first is a real-time, high-availability mobile analytics collection application we deployed for Nielsen Mobile. Because this app’s continuous availability is mission critical, it was designed to not be dependent on any one AWS region. It failed over seamlessly to Amazon-West, and data-gathering continued normally. According to Brian Edgar, Group Program Manager at The Nielsen Company’s Telecom Practice: “While the outage at Amazon East was certainly bad news for Amazon and many of its clients, it was a great example of why the technology choices DASHbay recommended for us were ideal for this application. Our application was architected for geographic redundancy and fully leveraged the cloud model with dynamic DNS routing and load balancing using servers in multiple zones and regions. Our mission-critical, highly-available application experienced no outage at all. The Amazon regional failure proves we did the right thing.”

Another is a data acquisition app built for our client Credit.com. Unstructured data is gathered and marshaled into transaction reports. Data loss can directly impact Credit.com’s ability to monitor its own financial performance. This app was deployed only in the Amazon-East region, and was not available for over 24 hours. However, we anticipated the possibility of an outage, and had offshore staff in Nagpur, India, trained to perform manual workarounds for as long as necessary. These manual processes kicked in, and kept the data flowing. According to Credit.com’s CEO Ian Cohen, “We’ve been working with Dashbay for the last year and were really pleased with the measures they put in place to provide redundancies for our data acquisition applications. They positioned an offshore failsafe that allowed us to operate without interruption.”

What’s the message here? I think it’s this: data centers can fail! Design operational processes and real-time architectures with fail-over in mind. We used a variety of approaches, from human-intensive procedures that were nevertheless ready to go, to automated failover.

Which risk-mitigation strategies are right for a particular app? That depends on factors such as the volume of data and the tolerable latency of gathering and moving that data. We’re committed to thinking those factors through with our clients, and designing applications and processes with failover in mind.

One more important thing: let’s all keep the mindset of learning from mistakes, and if necessary changing architectures and backup procedures to keep our businesses running.

Terry Joyce, DASHbay founder