Jai Krishna Ponnappan Web: Big Data

Showing posts with label Big Data. Show all posts

Social & Organizational Success With Data

The pace of digital change is creating new opportunities for customers and these opportunities require quick responses. The role of the digital executive/officer is one of the most demanding in business. They need to be strategic, creative, growth-minded and cost conscious of the world they live in.

Business success in the digital age will require organizations to take bold actions, including inventing new business models and changing the way they function. By 2017, 70% of successful digital business models will rely on deliberately unstable processes designed to shift with customers' needs.

Many organizations are either beginning, or in the midst of, digital business transformation initiatives. The prediction is that only 30% of these efforts will succeed. To be part of that 30%, business and IT leaders must be ready and willing to innovate rapidly from a business model, business process and technology perspective.

As a result of business model innovation, some business processes must become deliberately unstable, and deliberately unstable processes are designed for change and can dynamically adjust to customers' needs. They’re needed because they are agile, adaptable and "supermanoeuvrable" as customers' needs shift. They are also competitive differentiators, because they support customer interactions that are unpredictable and require ad hoc decision making to enable larger, more stable processes to continue.

It is imperative in 2016 to break away from linear business processes and deploy a spectrum of standardized and variable processes to reap the benefits of digital business. The need for this shift is intensified by the introduction of several new factors and many types of unmeasured KPIs/ Internet-connected 'things' etc. into the business environment. Things like smart machines generate real-time information for other machines. Business processes must be designed for change to enable organizations to exploit this information.

There are many aspects to consider in harnessing big data and advanced analytics, and becoming an insights-driven organization. To help data professionals and IT leaders on their journey, here are a few Guiding Principles to not only drive value from big data and analytics, but to also put insights at the heart of your enterprise. Here are those principles:

Governing Principles

Principle 1:

Embark on the journey to insights, within your business and technology context

The starting point must be your digital business objectives. Design your roadmap to harness new data sources based on how they will help achieve these objectives. Equally importantly, your journey must be dictated by where you start, not only in terms of data maturity but also technology.

Principle 2:

Enable your data landscape for the flood coming from connected people and things

There are many new technologies that enable the capture and management of the data flood. Your new data landscape should be a mix of these technologies, chosen to provide the right solution in terms of cost, flexibility and speed to suit each specific data set and meet the insight needs of the business.

Principle 3:

Master governance, security and privacy of your data assets

Insights from unreliable data are worse than no insights at all. Equally, programs fail and businesses leave themselves exposed if data is not handled securely and with consideration of relevant privacy issues. Maturing and industrializing an organization in its production of value from data, is a key lever to success.

Principle 4:

Develop an enterprise data science culture

Data science unlocks insights. Appreciating and understanding how value is derived from data needs to become part of the culture of the organization. Only by embedding it throughout the enterprise, and systematically making all decisions better informed, can organizations achieve the transformation to becoming insights-driven.

Principle 5:

Unleash data- and insights-as-a-service

The demand from business users for information and data-driven insights is ever increasing across all organizations. To harness this, business users must feel that they can rapidly access the insights they need where and when they need it. Setting up a powerful platform that delivers these insights ‘on-demand’, is the ultimate goal.

Principle 6:

Make insight-driven value a crucial business KPI

Measure your measurement. Apply data science to your data science to see where you are adding value and where you are not. If data is becoming one of your most valuable assets, then treat it as such – include it in KPIs and business reviews.

Principle 7:

Empower your people with Insights at the point of action

All functions in an organization are faced daily with a series of decision points and actions, both at the macro and micro level. Whether you are in Supply Chain, Finance, Procurement, Marketing or other parts of the business, empowering your business teams with real-time insights at the point of action makes the crucial difference.

From marketing to medicine, personalized treatment is taking hold. Customers across all industries expect more these days, and they will go elsewhere if they don't get what they want. The most advanced organizations are actively addressing this dynamic by blending traditional customer data with big data, then using analytics to fine-tune their products and services.

______________________________________________________

Big Data Is Improving Lives of Americans: A White House Report

The White House has just issued a report looking at four of the top areas where big data has the potential to greatly improve the lives and safety of Americans. But there are just as many pitfalls as promises to be aware of.

Big Data’s Opportunities & Challenges for All Americans

The Obama Administration’s Big Data Working Group has just issued its comprehensive report looking at the opportunities and challenges around big data and four key areas of society:

• Personal credit • Employment • Higher education • Law enforcement.

Entitled Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights, the report notes that “big data and associated technologies have enormous potential for positive impact in the United States.”

But big data also has the potential to create unintended discriminatory consequences if not used correctly. Here we look at the Problem the government is trying to solve; the Opportunity that big data presents; and the Challenge that will need to be overcome in order for a big data solution to work.

The Big Data Challenge:

Expanding access to affordable credit while preserving consumer rights that protect against discrimination in credit eligibility decisions.

The right to be informed about and to dispute the accuracy of the underlying data used to create a credit score is particularly important because credit bureaus have significant data accuracy issues, which are likely to be exacerbated by the use of new, fast-changing data sources.

The Problem: Traditional hiring practices may unnecessarily filter out applicants whose skills match the job opening.

Even as recruiting and hiring managers look to make greater use of algorithmic systems and automation, the inclination remains for individuals to hire someone similar to themselves, an unconscious phenomenon often referred to as “like me” bias, which can impede diversity.44 Algorithmic systems can be designed to help prevent this bias and increase diversity in the hiring process.

The Big Data Opportunity:

Big data can be used to uncover or possibly reduce employment discrimination.

Companies can use data-driven approaches to find potential employees who otherwise might have been overlooked based on traditional educational or workplace-experience requirements. Data-analytics systems allow companies to objectively consider experiences and skill sets that have a proven correlation with success.

The Big Data Challenge:

Promoting fairness, ethics, and mechanisms for mitigating discrimination in employment opportunity.

Data-analytics companies are creating new kinds of “candidate scores” by using diverse and novel sources of information on job candidates. These sources, and the algorithms used to develop them, sometimes use factors that could closely align with race or other protected characteristics, or may be unreliable in predicting success of an individual at a job.

The Problem: Students often face challenges accessing higher education, finding information to help choose the right college, and staying enrolled.

Differences in the price of attendance across institutions affect financial returns, and may lead to differences in the amount that students have to borrow, which may also affect their career decisions and personal lives in meaningful ways. Despite the importance of this decision, there is a surprising lack of clear, easy to use, and accessible information available to guide the students making these choices. The opportunities to use big data in higher education can either produce or prevent discrimination—the same technology that can help identify and serve students who are more likely to be in need of extra help can also be used to deny admissions or other opportunities based on the very same characteristics.

The Big Data Opportunity:

Using big data can increase educational opportunities for the students who most need them.

To address the lack of information about college quality and costs, the Obama Administration has created a new College Scorecard to provide reliable information about college performance.56 The College Scorecard is a large step toward helping students and their families evaluate college choices. Never-before-released national data about post-college outcomes—including the most comparable and reliable data on the earnings of colleges’ alumni and new data on student debt—and student-loan repayment provides students, families, and their advisers with a more accurate picture of college cost and value.

The Big Data Challenge:

Administrators must be careful to address the possibility of discrimination in higher education admissions decisions.

In making admissions decisions, institutions of higher education may use big data techniques to try to predict the likelihood that an applicant will graduate before they ever set foot on campus.59 using these types of data practices, some students could face barriers to admission because they are statistically less likely to graduate. Institutions could also deny students from low-income families, or other students who face unique challenges in graduating, the financial support that they deserve or need to afford college.

The Problem: In a rapidly evolving world, law enforcement officials are looking for smart ways to use new technologies to increase community safety and trust.

Local, state, and federal law enforcement agencies are increasingly drawing on data analytics and algorithmic systems to further their mission of protecting America. Using information gathered from the field and through the use of new technologies, law enforcement officials are analyzing situations in order to determine the appropriate response.

The Big Data Opportunity:

Data and algorithms can potentially help law enforcement become more transparent, effective, and efficient.

New technologies are replacing manual techniques, and many police departments now use sophisticated computer modeling systems to refine their understanding of crime hot spots, linking offense data to patterns in temperature, time of day, proximity to other structures and facilities, and other variables. Some of the newest analytical modeling techniques, often called “predictive policing,” might provide greater precision in predicting locations and times at which criminal activity is likely to occur. An analytical method known as “near-repeat modeling” attempts to predict crimes based on this insight.

The Big Data Challenge:

The law enforcement community can use new technologies to enhance trust and public safety in the community, especially through measures that promote transparency and accountability and mitigate risks of disparities in treatment and outcomes based on individual characteristics.

Those leading efforts to use data analytics to create and implement predictive tools must work hard to ensure that such algorithms are not dependent on factors that disproportionately single out particular communities based on characteristics such as race, religion, income level, education, or other data inputs that may serve as proxies for characteristics with little or no bearing on an individual’s likelihood of association with criminal activity.

Looking to the Future

The use of big data can create great value for the American people, but as these technologies expand in reach throughout society, we must uphold our fundamental values so these systems are neither destructive nor opportunity limiting. Moving forward, it is essential that the public and private sectors continue to have collaborative conversations about how to achieve the most out of big data technologies while deliberately applying these tools to avoid—and when appropriate, address—discrimination.

Key Big Data Challenges

Sixty-one percent of IT leaders expect spending on big data initiatives to increase, while only 5% expect decreases. The challenge: Finding the right big data talent to fulfill those initiatives, according to a recent survey.

Nearly 60% of the respondents are confident that their IT department can satisfy big data demands of the business and 14% are not confident.

"The data indicates current expectations of big data are still somewhat unrealistic due to market hype,”the report states. “Despite IT leaders expecting spending to increase, the confidence level in their department’s ability to meet big data demands in comparison to broader IT initiatives is lower.”

About two thirds of the IT executives rank big data architects as the most difficult role to fill. Data scientists (48%) and data modelers (43%) round out the top three most difficult positions to fill. More technical big data positions are ranked less difficult to fill.

Not by coincidence, big data companies are introducing online and face-to-face training programs and certifications for Hadoop and other related software platforms.

Still, other big data challenges remain. Variety—the dimension of big data dealing with the different forms of data—hinders organizations from deriving value from big data the most, according to 45% of those surveyed. Speed of data is next in terms of challenges, at 31%, followed by the amount of data, at 24%.

The application of big data is happening in a number of business areas, according to the study, with 81% of organizations viewing operations and fulfillment as priority areas within the next 12 months. This was followed by customer satisfaction (53%), business strategy (52%), governance/risk/compliance (51%) and sales/marketing (49%).

More than 200 IT leaders participated in the February 2015 survey.

ADDRESSING FIVE EMERGING CHALLENGES

OF BIG DATA

Introduction - Big Data Challenges

Challenge #1: Uncertainty of the Data Management Landscape

Challenge #2: The Big Data Talent Gap

Challenge #3: Getting Data into the Big Data Platform

Challenge #4: Synchronization across the Data Sources

Challenge #5: Getting Useful Information out of the Big Data Platform

Considerations: What Risks Do These Challenges Really Pose?

Conclusion: Addressing the Challenge with a Big Data Integration Strategy

INTRODUCTION - BIG DATA CHALLENGES

Big data technologies are maturing to a point in which more organizations are prepared to pilot and adopt big data as a core component of the information management and analytics infrastructure. Big data, as a compendium of emerging disruptive tools and technologies, is positioned as the next great step in enabling integrated analytics in many common business scenarios.

As big data wends its inextricable way into the enterprise, information technology (IT) practitioners and business sponsors alike will bump up against a number of challenges that must be addressed before any big data program can be successful. Five of those challenges are:

1. Uncertainty of the Data Management Landscape – There are many competing technologies, and within each technical area there are numerous rivals. Our first challenge is making the best choices while not introducing additional unknowns and risk to big data adoption.

2. The Big Data Talent Gap – The excitement around big data applications seems to imply that there is a broad community of experts available to help in implementation. However, this is not yet the case, and the talent gap poses our second challenge.

3. Getting Data into the Big Data Platform – The scale and variety of data to be absorbed into a big data environment can overwhelm the unprepared data practitioner, making data accessibility and integration our third challenge.

4. Synchronization Across the Data Sources – As more data sets from diverse sources are incorporated into an analytical platform, the potential for time lags to impact data currency and consistency becomes our fourth challenge.

5. Getting Useful Information out of the Big Data Platform – Lastly, using big data for different purposes ranging from storage augmentation to enabling high-performance analytics is impeded if the information cannot be adequately provisioned back within the other components of the enterprise information architecture, making big data syndication our fifth challenge.

In this paper, we examine these challenges and consider the requirements for tools to help address them. First, we discuss each of the challenges in greater detail, and then we look at understanding and then quantifying the risks of not addressing these issues. Finally, we explore how a strategy for data integration can be crafted to manage those risks.

CHALLENGE #1: UNCERTAINTY OF THE DATA MANAGEMENT LANDSCAPE

One disruptive facet of big data is the use of a variety of innovative data management frameworks whose designs are intended to support both operational and to a greater extent, analytical processing. These approaches are generally lumped into a category referred to as NoSQL (that is, “not only SQL”) frameworks that are differentiated from the conventional relational database management system paradigm in terms of storage model, data access methodology, and are largely designed to meet performance demands for big data applications (such as managing massive amounts of data and rapid response times).

There are a number of different NoSQL approaches. Some employ the paradigm of a document store that maintains a hierarchical object representation (using standard encoding methods such as XML, JSON, or BSON) associated with each managed data object or entity. Others are based on the concept of a key-value store that allows applications to associate values associated with varying attributes (as named “keys”) to be associated with each managed object in the data set, basically enabling a schema-less model. Graph databases maintain the interconnected relationships among different objects, simplifying social network analyses. And other paradigms are continuing to evolve.

The wide variety of NoSQL tools, developers and the status of the market are creating uncertainty within the data management landscape.

We are still in the relatively early stages of this evolution, with many competing approaches and companies. In fact, within each of these NoSQL categories, there are dozens of models being developed by a wide contingent of organizations, both commercial and non-commercial. Each approach is suited differently to key performance dimensions—some models provide great flexibility, others are eminently scalable in terms of performance while others support a wider range of functionality.

In other words, the wide variety of NoSQL tools developers and the status of the market lend a great degree of uncertainty to the data management landscape. Choosing a NoSQL tool can be difficult, but committing to the wrong core data management technology can prove to be a costly error if the selected vendor’s tool does not live up to expectations, the vendor company fails, or if third-party application development tends to adopt different data management schemes. For any organization seeking to institute big data, this challenge is to propose a means for your organization to select NoSQL alternatives while mitigating the technology risk.

CHALLENGE #2: THE BIG DATA TALENT GAP

It is difficult to peruse the analyst and high-tech media without being bombarded with content touting the value of big data analytics and corresponding reliance on a wide variety of disruptive technologies. These new tools range from traditional relational database tools with alternative data layouts designed to increased access speed while decreasing the storage footprint, in-memory analytics, NoSQL data management frameworks, as well as the broad Hadoop ecosystem.

There is a growing community of application developers who are increasing their knowledge of tools like those comprising the Hadoop ecosystem. That being said, despite the promotion of these big data technologies, the reality is that there is not a wealth of skills in the market. The typical expert, though, has gained experience through tool implementation and its use as a programming model, rather than the data management aspects. That suggests that many big data tools experts remain somewhat naïve when it comes to the practical aspects of data modeling, data architecture, and data integration. And in turn, this can lead to less-then-successful implementations whose performance is negatively impacted by issues related to data accessibility.

And the talent gap is real—consider these statistics: According to analyst firm McKinsey & Company, “By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”2 And in a report from 2012, “Gartner analysts predicted that by 2015, 4.4 million IT jobs globally will be created to support big data with 1.9 million of those jobs in the United States. … However, while the jobs will be created, there is no assurance that there will be employees to fill those positions.”

NoSQL and other innovative data management options are predicted to grow in 2015 and beyond.

The big data talent gap is real. Consider this statistic: “By 2018, the US alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”

There is no doubt that as more data practitioners become engaged, the talent gap will eventually close. But when developers are not adept at addressing these fundamental data architecture and data management challenges, the ability to achieve and maintain a competitive edge through technology adoption will be severely impaired. In essence, for an organization seeking to deploy a big data framework, the challenge lies in ensuring a level of the usability for the big data ecosystem as the proper expertise is brought on board.

CHALLENGE #3: GETTING DATA INTO THE BIG DATA PLATFORM

It might seem obvious that the intent of a big data program involves processing or analyzing massive amounts of data. Yet while many people have raised expectations regarding analyzing massive data sets sitting in a big data platform, they may not be aware of the complexity of facilitating the access, transmission, and delivery of data from the numerous sources and then loading those various data sets into the big data platform.

The impulse toward establishing the ability to manage and analyze data sets of potentially gargantuan size can overshadow the practical steps needed to seamlessly provision data to the big data environment. The intricate aspects of data access, movement, and loading are only part of the challenge. The need to navigate extraction and transformation is not limited to structured conventional relational data sets. Analysts increasingly want to import older mainframe data sets (in VSAM files or IMS structures, for example) and at the same time want to absorb meaningful representations of objects and concepts refined out of different types of unstructured data sources such as emails, texts, tweets, images, graphics, audio files, and videos, all accompanied by their corresponding metadata.

An additional challenge is navigating the response time expectations for loading data into the platform. Trying to squeeze massive data volumes through “data pipes” of limited bandwidth will both degrade performance and may even impact data currency. This actually implies two challenges for any organization starting a big data program. The first involves both cataloging the numerous data source types expected to be incorporated into the analytical framework and ensuring that there are methods for universal data accessibility, while the second is to understand the performance expectations and ensure that the tools and infrastructure can handle the volume transfers in a timely manner.

CHALLENGE #4: SYNCHRONIZATION ACROSS THE DATA SOURCES

Once you have figured out how to get data into the big data platform, you begin to realize that data copies migrated from different sources on different schedules and at different rates can rapidly get out of synchronization with the originating systems. There are different aspects of synchrony. From a data currency perspective, synchrony implies that the data coming from one source is not out of date with data coming from another source. From a semantics perspective, synchronization implies commonality of data concepts, definitions, metadata, and the like.

With conventional data marts and data warehouses, sequences of data extractions, transformations, and migrations all provide situations in which there is a risk for information to become unsynchronized. But as the data volumes explode and the speed at which updates are expected to be made, ensuring the level of governance typically applied for conventional data management environments becomes much more difficult.

The inability to ensure synchrony for big data poses the risk of analyses that use inconsistent or potentially even invalid information. If inconsistent data in a conventional data warehouse poses a risk of forwarding faulty analytical results to downstream information consumers, allowing more rampant inconsistencies and asynchrony in a big data environment can have a much more disastrous effect.

Many people may not be aware of the complexity of facilitating the access, transmission, and delivery of data from the numerous sources and then loading those data sets into the big data platform.

The inability to ensure synchrony for big data poses the risk of analyses that use inconsistent or potentially even invalid information.

CHALLENGE #5: GETTING USEFUL INFORMATION OUT OF THE BIG DATA PLATFORM

Most of the most practical uses cases for big data involve data availability: augmenting existing data storage as well as providing access to end-users employing business intelligence tools for the purpose of data discovery. These BI tools not only must be able to connect to one or more big data platforms, they must provide transparency to the data consumers to reduce or eliminate the need for custom coding. At the same time, as the number of data consumers grows, we can anticipate a need to support a rapidly expanding collection of many simultaneous user accesses. That demand may spike at different times or the day or in reaction to different aspects of business process cycles. Ensuring right-time data availability to the community of data consumers becomes a critical success factor.

This frames our fifth and final challenge: enabling a means of making data accessible to the different types of downstream applications in a way that is seamless and transparent to the consuming applications while elastically supporting demand.

CONSIDERATIONS: WHAT RISKS DO THESE CHALLENGES REALLY POSE?

Considering the business impacts of these challenges suggests some serious risks to successfully deploying a big data program. In Table 1, we reflect on the impacts of our challenges and corresponding risks to success.

Challenge	Impact	Risk
Uncertainty of the market landscape	Difficulty in choosing technology components Vendor lock-in	Committing to failing product or failing vendor
Big data talent gap	Steep learning curve Extended time for design, development, and implementation	Delayed time to value
Big data loading	Increased cycle time for analytical platform data population	Inability to actualize the program due to unmanageable data latencies
Synchronization	Data that is inconsistent or out of date	Flawed decisions based on flawed data
Big data accessibility	Increased complexity in syndicating data to end-user discovery tools	Inability to appropriately satisfy the growing community of data consumers

Pages

Social & Organizational Success With Data

Key Big Data Challenges

Labels

BROWSE BY CATEGORY

POPULAR ARTICLES

Subscribe to Me on Facebook

Find Jai on Twitter

Curated content by Jai on Mix.com

Archive