|Welcome to issue 237 of NoSQL Weekly. Let's get straight to the links this week.
Articles, Tutorials and Talks
Petascale Key Value Stores & App Driven Analytics
Recent innovations in storage, like SanDisk's new InfiniFlash offering, are driving higher storage densities and lower prices. Flash-optimized-NoSQL has the scalability to handle InfinFlash's 512G of Flash, and a two node HA DB cluster capable of over 1M TPS could fit in only 5u. These solutions enable analytics with Java, a new programming model of compute at an easy to program Java application layer. This architecture removes the need to outguess a SQL optimizer or debug stored procedures. The talk discusses this emerging architecture, hardware options and prices along with example use cases.
Retail Reference Architecture Part 4: Recommendations and Personalizations
In this part, we'll be looking at a very different application of MongoDB in the retail space, one that even those familiar with MongoDB might not think it is well suited for: logging a high volume of user activity data and performing data analytics. This final use case demonstrates how MongoDB can enable scalable insights, including recommendations and personalization for your customers.
Inside Apache HBase's New Support for MOBs
Learn about the design decisions behind HBase's new support for MOBs.
Native multi-model can compete with pure document and graph databases
Native multi-model databases combine different data models like documents or graphs in one tool and even allow to mix them in a single query. How can this concept compete with a pure document store like MongoDB or a graph database like Neo4j? I myself and a lot of folks in the community asked that question. So here are some benchmark results: 100k reads -> competitive; 100k writes -> competitive; friends-of-friends -> outstanding; shortest-path -> superior; aggregation -> superior.
Saving CPU! Using Native Hadoop Libraries for CRC computation in HBase
Use Hadoop Native Library calculating CRC and save CPU!
The Improved Job Scheduling Algorithm of Hadoop Platform
This paper discussed some job scheduling algorithms for Hadoop platform, and proposed a jobs scheduling optimization algorithm based on Bayes Classification viewing the shortcoming of those algorithms which are used.
Flying faster with Twitter Heron
We process billions of events on Twitter every day. As you might guess, analyzing these events in real time presents a massive challenge. Our main system for such analysis has been Storm, a distributed stream computation system we've open-sourced. But as the scale and diversity of Twitter data has increased, our requirements have evolved. So we've designed a new system, Heron -- a real-time analytics platform that is fully API-compatible with Storm.
Using Jil for custom JSON Serialization in the Couchbase .NET SDK
One of the new features introduced in 2.1.0 of the Couchbase .NET SDK was support for overriding the default serializer with your own custom serializer by extending the ITypeSerializer interface and providing your own or favorite serializer. In this article we will show you how to do this and provide an example (clonable from couchbase-net-contrib project in Couchbaselabs) using a very fast JSON serializer called Jil.
GraphConnect Europe 2015 Videos
Building an S3 object store with Docker, Cassandra and Kubernetes
Pitfalls and Workarounds for Tailing the Oplog on a MongoDB Sharded Cluster
Build reliable, robust, and high-performance big data applications using the Cascading application development efficiently. This book is intended for software developers, system architects and analysts, big data project managers, and data scientists who wish to deploy big data solutions using the Cascading framework.
Interesting Projects, Tools and Libraries
Rozu is a webhook API server, using MongoDB for persistent storage & Redis for pub/sub of inbound events.
Deploy CouchDB documents from directory, JSON or CommonJS module. Via API or command line client.
Bootstrap projects: configure CouchDB, setup security, deploy ddocs and create users.
Upcoming Events and Webinars
Webinar: Best Practices for Upgrading to MongoDB 3.0
MongoDB 3.0 brings major enhancements. Write performance has improved by 7-10x with WiredTiger and document-level concurrency control. Compression reduces storage needs by up to 80%. To take advantage of these features, your team needs an upgrade plan. In this session, we'll walk you through how to build an upgrade plan. We'll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. You'll walk away confident that you're prepared to upgrade.
Webinar: Simplicity Scales - Big Data Application Management & Operations
In this presentation, Tyler Hannan discuss how you can simplify the management of the technologies required to support your Big Data applications and give practical considerations to make when choosing the right tools for the job.
New York MongoDB Meetup June 2015 - New York, NY
MongoDB Scout (code name) is a new GUI tool to help you see and understand your data. Scout samples collection data from MongoDB, infers your schema, then displays the schema along with visual summarizes of the sample data. We previewed Scout at MongoDB World, and we're planning to ship it with MongoDB Enterprise 3.2. Join us to see the current working version, kick the tires, and find out what's next on our roadmap.
Completely real-time recommendations - New York, NY
Currently deployed recommendation technology almost always provides real-time recommendations based on a model that is developed using off-line techniques. The use of off-line training severely limits the ability to deal with fast changing content such as news or auctions. I will describe techniques which make it possible to move this training load into true real-time without loss of accuracy. Real-time training of recommendations is rarely done, partly because existing algorithms require periodic off-line training to correct accumulating inaccuracies. The techniques that I will describe do not require off-line training of any kind. In addition, the techniques described in this talk are easy to implement using stream processing, noSQL databases and search engines.
Relational to graph: A worked example - London, United Kingdom
In this session we'll take an existing relational database and port it into a Neo4j graph. We'll start off by coming up with some 'graphy' questions that we'd be able to answer more easily if the data was structured as a graph. Having done that we'll design a graph model, export the appropriate relational tables and import them into Neo4j.
Share NoSQL Weekly