python - What is the point of indexing in pandas? -
can point me link or provide explanation of benefits of indexing in pandas? routinely deal tables , join them based on columns, , joining/merging process seems re-index things anyway, it's bit cumbersome apply index criteria considering don't think need to.
any thoughts on best-practices around indexing?
like dict, dataframe's index backed hash table. looking rows based on index values looking dict values based on key.
in contrast, values in column values in list.
looking rows based on index values faster looking rows based on column values.
for example, consider
df = pd.dataframe({'foo':np.random.random(), 'index':range(10000)}) df_with_index = df.set_index(['index'])
here how row df['index']
column equals 999. pandas has loop through every value in column find ones equal 999.
df[df['index'] == 999] # foo index # 999 0.375489 999
here how lookup row index equals 999. index, pandas uses hash value find rows:
df_with_index.loc[999] # foo 0.375489 # index 999.000000 # name: 999, dtype: float64
looking rows index faster looking rows column value:
in [254]: %timeit df[df['index'] == 999] 1000 loops, best of 3: 368 µs per loop in [255]: %timeit df_with_index.loc[999] 10000 loops, best of 3: 57.7 µs per loop
note however, takes time build index:
in [220]: %timeit df.set_index(['index']) 1000 loops, best of 3: 330 µs per loop
so having index advantageous when have many lookups of type perform.
sometimes index plays role in reshaping dataframe. many functions, such set_index
, stack
, unstack
, pivot
, pivot_table
, melt
, lreshape
, , crosstab
, use or manipulate index. want dataframe in different shape presentation purposes, or join
, merge
or groupby
operations. (as note joining can done based on column values, joining based on index faster.) behind scenes, join
, merge
, groupby
take advantage of fast index lookups when possible.
time series have resample
, asfreq
, interpolate
methods underlying implementations take advantage of fast index lookups too.
so in end, think origin of index's usefulness, why shows in many functions, due ability perform fast hash lookups.
Comments
Post a Comment