Weekly outline

  • General

    • Collaborative filtering in the lambda architecture

    • Generating real-time recommendations

    • Big data analytics in Spark

    • Complex event processing with Proton

      The basic principle of complex event processing is to derive complex events on the basis of a possibly large number of simple events using an event processing logic. Proton on Storm allows running an open source complex event processing engine in a distributed manner on multiple machines using the Apache Storm infrastructure. Event processing networks provide a conceptual model describing the event processing flow execution. Such a network comprises a collection of event processing agents, event producers, and event consumers that are linked by channels.

      You can learn more about Proton on Storm at http://github.com/ishkin/Proton/tree/master/IBM%20Proactive%20Technology%20Online%20on%20STORM

      • SQL operators for MapReduce with Teradata

        Database management system providers seek to enhance their traditional databases and make them applicable to big data use-cases. A basic concept to achieve this is given by partitioning of tables, leading to massively parallel databases. Table operators allow making use of the partitioning for distributed algorithms using MapReduce. A selected commercial tool offering these approaches is the Teradata Aster solution.

        You can learn more about this solution at http://www.teradata.de/products-and-services/analytics-from-aster-overview

        • In-memory processing

          More and more main memory becomes available at a reasonable price. As access speed is reduced significantly once data outside of main memory is accessed, high performance applications focus on keeping as much data as possible in main memory. There is a wide variety of in-memory database systems available. Central performance and applicability measures to be kept in mind when choosing such a system comprise operating system compatibility, hardware requirements, license and support issues, runtime monitoring capabilities, memory utilisation, database interface standards, extensibility, portability, integration of open source big data technologies, local and distributed scaling and elasticity, available analytics functionality, persistence, availability, and security. 

          A first overview of available products can be won at http://en.wikipedia.org/wiki/List_of_in-memory_databases

          A more in-depth study is available at http://s.fhg.de/in-memory-systems (German only, however).