python - What is the point of indexing in pandas? -


can point me link or provide explanation of benefits of indexing in pandas? routinely deal tables , join them based on columns, , joining/merging process seems re-index things anyway, it's bit cumbersome apply index criteria considering don't think need to.

any thoughts on best-practices around indexing?

like dict, dataframe's index backed hash table. looking rows based on index values looking dict values based on key.

in contrast, values in column values in list.

looking rows based on index values faster looking rows based on column values.

for example, consider

df = pd.dataframe({'foo':np.random.random(), 'index':range(10000)}) df_with_index = df.set_index(['index']) 

here how row df['index'] column equals 999. pandas has loop through every value in column find ones equal 999.

df[df['index'] == 999]  #           foo  index # 999  0.375489    999 

here how lookup row index equals 999. index, pandas uses hash value find rows:

df_with_index.loc[999] # foo        0.375489 # index    999.000000 # name: 999, dtype: float64 

looking rows index faster looking rows column value:

in [254]: %timeit df[df['index'] == 999] 1000 loops, best of 3: 368 µs per loop  in [255]: %timeit df_with_index.loc[999] 10000 loops, best of 3: 57.7 µs per loop 

note however, takes time build index:

in [220]: %timeit df.set_index(['index']) 1000 loops, best of 3: 330 µs per loop 

so having index advantageous when have many lookups of type perform.

sometimes index plays role in reshaping dataframe. many functions, such set_index, stack, unstack, pivot, pivot_table, melt, lreshape, , crosstab, use or manipulate index. want dataframe in different shape presentation purposes, or join, merge or groupby operations. (as note joining can done based on column values, joining based on index faster.) behind scenes, join, merge , groupby take advantage of fast index lookups when possible.

time series have resample, asfreq , interpolate methods underlying implementations take advantage of fast index lookups too.

so in end, think origin of index's usefulness, why shows in many functions, due ability perform fast hash lookups.


Comments

Popular posts from this blog

c++ - OpenMP unpredictable overhead -

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

javascript - Wordpress slider, not displayed 100% width -