Archive for April, 2014

The third coming of Natural Language Processing

    “SAP tried to introduce natural language processing based BI tools about five years ago and failed. Why would I use yours?”

      Yesterday I was explaining to a customer that the QuickLogix natural language query engine would make it easier for his business users to ask questions and make meaning out of their data. Being the IT Director of a multi-billion dollar company, this was a question I was expecting him to ask and he didn’t disappoint! So why indeed does Gartner project that Natural Language processing is the next big thing in the world of data analytics and business intelligence? Why- if it has been tried before- not 30 years ago- but barely 5 years ago- and it didn’t really take off then?

        It boils down to one major tech Trend in the past 10 or so years and one major Event in the past 3 years.

          The TREND
          I remember the days when the leading edge of innovation was done in the Enterprise world and the benefits bled into the consumer market. Sometime around the mid 2000s, perhaps with the introduction of the iPhone, that trend started reversing. Consumer products and requirements were on the leading edge of technological creativity. All things new and exciting in the enterprise world (cloud computing, SaaS products etc.) are dictated bleed-outs of the consumer market. More mobile devices meant more data being transferred(volume), more content being generated (variety) and more demand for quick turnaround on data accessibility and processing (velocity). Yes the familiar 3Vs of Big Data are a direct result of demands in the consumer market.

            THE EVENT
            When it comes to natural language processing, I like to think of the world as pre-Siri and post-Siri. Apple introduced Siri to the world with the iPhone4S in October 2011. Ever since, there has been a renewed focus among all other phone OS manufacturers to provide (or improve upon) a similar service. Google has been around a long time with their ground-breaking natural language search. However it is the advent of Siri that has set the average consumer expectation that all interactions- personal or otherwise- can (and should) work by using simple English.

              The Trend and the Event together have subliminally revolutionized the mindset of the workforce. More and more business specialists and users are becoming inclined to use natural language in their work. The mobile evolution will serve as a potent catalyst for the acceptance of NLP by business users in their everyday functional tasks. The challenges of training them to ask the right questions and make meaning out of the results will remain. But the adoption of the technology in itself? It was tried in the 1950s & 60s, in the late 1990s and early 2000s- but in this third coming- natural language processing is here to stay.

              Advertisements

Leave a comment

5 things you need to know about Data Lakes

    Here are five important things to know about data lakes:

      1. What is a data lake? That’s a good place to start any conversation! A data lake is essentially a landing zone to store all the data that an organization collects. The main advantage over a traditional enterprise data warehouse (EDW) is that there is no need for extract-transform-load (ETL) processes to ingest the data from any operational systems or to access the data from the data lake itself. In addition, it is relatively inexpensive and massively scalable.

        2. Traditional EDW systems also have restrictions on the data types that they can support. All enterprise organizations today collect more data than they process. The data lake can be used to store data of any type and in any format. As a result, the cost of transforming herewith inaccessible information (such as text, images and other unstructured data) is eliminated or at least substantially reduced. What this really means for any organization is that new operational systems can be easily added into the data lake and users can start deriving insights from them almost immediately.

          3. Why isn’t everyone adopting data lakes? There are a couple of pertinent reasons. To begin with, a lot of organizations have invested heavily in the infrastructure, support and services offered by the large EDW solution providers (IBM, SAP, Oracle, Microsoft) and making a transition needs many levels of business justification. Also, the data lake technology (and the Enterprise Hadoop ecosystem) is new and evolving. As a result, early adopters will only include organizations that want to be on the cutting-edge of technological advances, those that would like to capitalize on the financial advantages of the data lake or those that are willing to hedge their bets on revolutionary solutions offered by up and coming players like QuickLogix (www.quicklogix.com. full disclosure- I am affiliated with this organization).

            4. Data governance has been a challenge with EDW systems. It is only going to gain more prominence with the advent of data lakes. Gaps in data quality and reliability will be more easily exposed. We should collectively be applauding this development. IT teams can shift their emphasis from working on ETL processes to move the data into the common store to ensuring that the data collection (operational) systems meet stringent quality standards.

              5. Data lakes are not for everyone. One of the common complaints from data architects and technologists is that their organization is simply not suited for a shift to scale-out, parallel, no-SQL systems. It is true. To dig a hole, you might just need a spade not a jackhammer. However, it is important to assess current and future technological requirements of the organization while making these choices.

Leave a comment

%d bloggers like this: