python - Using Pandas GroupBy and size()/count() to generate an aggregated DataFrame -
so have dataframe called df
goes:
date tag 2011-02-18 12:57:00-07:00 2011-02-19 12:57:00-07:00 2011-03-18 12:57:00-07:00 b 2011-04-01 12:57:00-07:00 c 2011-05-19 12:57:00-07:00 z 2011-06-03 12:57:00-07:00 2011-06-05 12:57:00-07:00 ...
i'm trying groupby tag, , date (yr/month), looks like:
date b c z 2011-02 2 0 0 0 2011-03 0 1 0 0 2011-04 0 0 1 0 2011-05 0 0 0 1 2011-06 2 0 0 0 ...
i've tried following, doesn't quite give me want.
grouped_series = df.groupby([["%s-%s" % (d.year, d.month) d in df.date], df.tag]).size()
i know tag exists etc. appreciated.
update (for people looking in future):
ended keeping datetime, instead of string format. trust me, better when plotting:
grouped_df = df.groupby([[ datetime.datetime(d.year, d.month, 1, 0, 0) d in df.date], df.name]).size() grouped_df = grouped_df.unstack().fillna(0)
you use unstack()
, fillna()
methods:
>>> g = df.groupby([["%s-%s" % (d.year, d.month) d in df.date], df.tag]).size() >>> g tag 2011-2 2 2011-3 b 1 2011-4 c 1 2011-5 z 1 2011-6 2 dtype: int64 >>> g.unstack().fillna(0) tag b c z 2011-2 2 0 0 0 2011-3 0 1 0 0 2011-4 0 0 1 0 2011-5 0 0 0 1 2011-6 2 0 0 0
Comments
Post a Comment