INNOVANT

Level up your business with SYNCK.

Call Us +1-938-740-7555
Contact Us
Blog

The Key to Successfull Big Data: Agile Methodology in Four Sprints

By Pedro Bonillo, PhD, Machine Learning and Big Data Consultant.

Blog

Introduction

A common mistake in the area of ​​Software Development is to think that the big data methodology gets in the way, that it does not work, that it is a way of hindering, and since life is chaos, it is best to make the software without at least one plan and thus the projects are done with the idea of “When the time comes, we’ll see”. The truth is that no quality software product can ignore the process with which it is made and the product obtained.

Although agile methodologies are in fashion, they are nothing more than a response to the speed with which the market requires software to go into production. But it is important to be able to distinguish between tools, techniques, methods and methodologies.

A tool is something that is used in a workplace, for example Apache Hadoop. The technique is the efficient or effective way to use the tool, for example, using Apache Hadoop, with columnar format and writing the compressed files with Snappy. A Method is a path, a way to get from one point to another, for example, Map Reduce with Yarn or Spark on top of Apache Hadoop. A methodology is a set of methods on which it has been reflected after its application and that, when put into practice under the same premises, similar results are obtained.

In Big Data, many have tried to use orthodox methodologies, for example, now everything is Big Data or Big Data does not work for me. The reality is that the big data methodology must be agile and must start from the selection of a use case that solves a problem that cannot be solved with current tools, techniques, methods and methodologies.

In this way, the proposed Big Data methodology for projects is an adaptation of Scrum (agile and flexible methodology to manage software development), with 4 Sprints; the first two of 2 weeks; the third of 4 weeks; and the last of 6 weeks; for a total of 12 weeks; with daily reviews. (see, Figure 1).

Figure 1: Methodology of Big Data projects

Quality software development requires more than just speed; it demands a structured approach. In Big Data, adopting an agile methodology tailored to unique challenges allows us to innovate effectively and solve complex problems that traditional methods can't address.

Testimonial

Pedro Bonillo

PhD, ML and Big Data Consultant

Customer experience has become a key differentiator in today's competitive landscape. IT services enable businesses to personalize customer interactions, provide efficient support through various channels, and offer seamless online experiences.

IT services facilitate data collection, storage, analysis, and visualization, turning raw information into actionable intelligence. By harnessing the power of data analytics, businesses can identify trends, customer preferences, and areas for improvement, leading to more effective strategies and increased profitability. Disruptions, such as natural disasters or system failures, can severely impact a business's operations. IT services include robust disaster recovery and backup solutions, ensuring that critical data is protected and can be swiftly recovered in case of any unforeseen events. This level of preparedness helps maintain business continuity and minimizes downtime,

Whether it's through chatbots, mobile apps, or responsive websites, IT services empower businesses to exceed customer expectations and build lasting relationships. Data is a goldmine of valuable insights that can help businesses make informed decisions.

Sprint 1: Ingredients Search

In the first sprint, which we will call Ingredients Search, the possible use cases that we seek to improve through Big Data Management will be identified. This Sprint has a duration of 2 weeks. The selection criteria of the use case are at least three: volume (more than a million records), variability (many columns and some with null or nan), speed (perform queries that traditional relational database systems do not respond or take more than 5 minutes to respond).

Sprint 2: Prepare food

In the second sprint (Prepare Food), it is necessary to identify in each of the use cases described in the previous sprint, which are the variables that we want to calculate (key performance indicator) or predict (Distillation Tier) in each case. This Sprint has a duration of 2 weeks.

Sprint 3: Food plating and Presentation

In the third sprint (Food plating and Presentation), it is about selecting the most relevant use case for the organization, or what we will call the Golden Goal. A use case that could not be solved with traditional relational database systems or with available business intelligence tools and that can be addressed quickly, with less cost and with less effort using Big Data.

For this selected use case, it will be necessary to obtain the support of Senior Management, which implies evangelizing them based on the New Architecture and the Management of Large Volumes of Data. In addition, it is necessary to obtain the investment and expense budget to be able to execute the next Sprint, and thus be able to develop the use case, with the necessary Big Data components.

The word necessary is highlighted, since it is a common mistake to try to implement all the components of the Big Data Architecture for this use case. This sprint (the third) has a duration of 4 weeks.

Sprint 4: Delivery food

The last sprint (Delivery Food), is divided into 3 stages:

PACKAGING:

In this stage the components of the Proposed Big Data Architecture for the selected use case are installed and configured. The extraction, transformation and loading of the data is carried out (according to the types of Information Acquisition defined: Real-time ingestion, Micro batch ingestion, Batch ingestion); their storage and use. Opportunities for improvement must be identified, regarding the increase on sales or optimizing processes. This stage lasts 4 weeks.

PROOF OF DELIVERY:

This second stage consists of verifying that the specifications of the use case were met, in order to correct assumptions in the strategy and be able to refine the Return on Investment. It lasts one week. Between this stage and the next, it is customary to adjust the learnings and select another use case to replicate these learnings in a new cycle of the methodology (sprint 5, Figure 1)

DISPOSABLE SERVICEWARE WASTE:

Finally it is necessary to measure the results and get rid of the waste, in such a way to document everything that worked and remove all the waste of what did not work or could not be applied. This stage lasts 1 week.

Final Thoughts

This methodology promotes the adaptation of the organization in an evolutionary and incremental way through the use of agile methods. This is why iteration through a phase of selecting a next use case is suggested.

Through this methodology we have been able to implement successful use cases in International Banks, Payment Processors for Uber and Google and financial fintech. Thus demonstrating that a period of twelve weeks or three months is enough to implement a successful Big Data use case, just to exemplify the AWS team implements its famous Data Lab using a methodology very similar to the one described above.

OUR LATEST BLOGS

Read more blogs of our company

Are you busy reading out IT fires instead of focusing on your core business

News
BIG DATA ARCHITECTURE

Big Data Architecture: Proposal for a Management Architecture for Large Volumes of Data

Learn how to leverage AWS tools for seamless data integration, real-time analytics, and scalable storage solutions that transform your data into actionable insights.

  • Data Lake as a Centralized Component
  • AWS-Based Big Data Architecture
  • Comprehensive Governance and Scalability
DATA LAKEHOUSE

Why Data Lakehouse is the Game-Changer for Big Data Analytics

Discover the revolutionary Data Lakehouse architecture that combines the best of data warehouses and data lakes. Learn how this unified approach eliminates traditional data management challenges, supports AI integration, and transforms your analytics strategy.

  • Integration of Data Warehouse and Data Lake
  • Elimination of Traditional Challenges
  • Advanced Analytics and AI Integration
DATA

Data Mesh or Data Fabric? Choosing the Right Architecture for Your Business

Learn how these cutting-edge approaches revolutionize big data strategies, empowering your organization with enhanced scalability, integration, and data-driven decision-making.

  • Purpose and Approach
  • Architectural Differences
  • Focus on Organizational vs. Technological Change
Contact

Lets get in touch

You can reach us anytime via info@innovant.us








    GoogleSocial MediaClutchRecommendationEvents

    • 14+ Years

      Field Experience

    • 290+ Projects

      Done Around World

    • 99%

      Client Satisfaction

    • 2010+ Year

      Established On

    • 20 Mins

      Response Time

    Contact Info

    +1 617 1286 4880
    info@innovant.us

    Visit our office

    410 University Ave Westwood, MA 02090-2311