python - Using Pandas GroupBy and size()/count() to generate an aggregated DataFrame -


so have dataframe called df goes:

date                       tag 2011-02-18 12:57:00-07:00  2011-02-19 12:57:00-07:00  2011-03-18 12:57:00-07:00  b 2011-04-01 12:57:00-07:00  c 2011-05-19 12:57:00-07:00  z 2011-06-03 12:57:00-07:00  2011-06-05 12:57:00-07:00  ... 

i'm trying groupby tag, , date (yr/month), looks like:

date      b  c  z 2011-02  2  0  0  0 2011-03  0  1  0  0 2011-04  0  0  1  0 2011-05  0  0  0  1 2011-06  2  0  0  0 ... 

i've tried following, doesn't quite give me want.

grouped_series = df.groupby([["%s-%s" % (d.year, d.month) d in df.date], df.tag]).size() 

i know tag exists etc. appreciated.

update (for people looking in future):

ended keeping datetime, instead of string format. trust me, better when plotting:

grouped_df = df.groupby([[ datetime.datetime(d.year, d.month, 1, 0, 0) d in df.date], df.name]).size() grouped_df = grouped_df.unstack().fillna(0) 

you use unstack() , fillna() methods:

>>> g = df.groupby([["%s-%s" % (d.year, d.month) d in df.date], df.tag]).size() >>> g         tag 2011-2       2 2011-3  b      1 2011-4  c      1 2011-5  z      1 2011-6       2 dtype: int64 >>> g.unstack().fillna(0) tag      b  c  z 2011-2  2  0  0  0 2011-3  0  1  0  0 2011-4  0  0  1  0 2011-5  0  0  0  1 2011-6  2  0  0  0 

Comments

Popular posts from this blog

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

c++ - OpenMP unpredictable overhead -

javascript - Wordpress slider, not displayed 100% width -