database design - data modelling and queries in cassandra -


   |id|        events        timestamp      ----------------------------------------------    |1 |       inprogress    2010-03-31 15:59:42    |1 |       awaiting      2010-04-31 15:59:42       |1 |       resolved      2010-05-31 15:59:42    |1 |       closed        2010-06-31 15:59:42           |2 |       awaiting      2010-07-31 15:59:42     |2 |       inprogress    2010-08-31 15:59:42       |2 |       wait          2010-09-31 15:59:42             |2 |       closed        2010-10-31 15:59:42          

i have table in cassandra. table need extract 2 tables-one containing 1st event corresponding id , other containing last event corresponding id.thus, should 2 tables output:

    initial          -----------------------------       inprogress                  awaiting            final    -----------------------------      closed               

i need know how can done in cql(cassandra query language)only or if there way i can model data in such way able obtain desired results.

you use schema like:

create table event(     id int,     ts timestamp,     desc text,     primary key (id, ts) ); 

this allow fetch id , order timestamp (asc or desc) , limit 1.

however, check how many events per id you're expecting. if it's enough take events id beyond 100mb, need start considering bucketing, or other approach.

another alternative use spark analytical query , store result in table holds in format want. mean running external job periodically (or spark streaming app, around few seconds minutes behind live data), work.


Comments

Popular posts from this blog

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

c++ - OpenMP unpredictable overhead -

javascript - Wordpress slider, not displayed 100% width -