14 changed files with 336 additions and 4 deletions
After Width: | Height: | Size: 9.7 KiB |
@ -0,0 +1,61 @@
|
||||
**************** |
||||
Drawing barplots |
||||
**************** |
||||
|
||||
The ``incenp.plotting.bar`` module provides a ``barplot`` function |
||||
intended to facilitate the creation of bar plots from multi-indexed data. |
||||
|
||||
The module uses the same notion of *tracks* and *subtracks* as the |
||||
``incenp.plotting.scatter`` module. |
||||
|
||||
|
||||
Sample data |
||||
=========== |
||||
|
||||
Let’s create a multi-indexed `DataFrame` which we will use in the |
||||
examples below: |
||||
|
||||
.. code-block:: python |
||||
|
||||
index = pd.MultiIndex.from_arrays([ |
||||
['foo'] * 2 + ['bar'] * 2 + ['baz'] * 2 + ['qux'] * 2, |
||||
['one', 'two'] * 4 |
||||
], |
||||
names=['first', 'second'] |
||||
) |
||||
df = pd.DataFrame(np.random.randint(0, 100, size=(8,2)), |
||||
index = index, columns=['A', 'B']) |
||||
|
||||
This creates a `DataFrame` with 2 columns (``A`` and ``B``) and 8 |
||||
rows, indexed in two levels (level ``first``, with 4 distinct values |
||||
``foo``, ``bar``, ``baz``, and ``qux``; and level ``second``, with 2 |
||||
distinct values ``one`` and ``two``). |
||||
|
||||
|
||||
Quick start |
||||
=========== |
||||
|
||||
Here is a quick example illustrating the main points of the ``barplot`` |
||||
function (``ax`` is supposed to be a `matplotlib.axes.Axes` object): |
||||
|
||||
.. code-block:: python |
||||
|
||||
barplot(ax, df, column='A', |
||||
tracks=['foo', 'bar', 'bax'], |
||||
subtracks=['one', 'two'], subtrackname='second', |
||||
ncolumn='B') |
||||
ax.legend(['one', 'two'], loc='upper center') |
||||
|
||||
.. figure:: barplot1.png |
||||
|
||||
A sample bar plot. |
||||
|
||||
The ``column`` parameter indicates which column in the `DataFrame` |
||||
contains the value to plot. The ``tracks`` and ``subtracks`` parameters |
||||
are used to select and distribute the rows along the tracks and |
||||
subtracks, in a similar way to the ``scatterplot`` function of the |
||||
``incenp.plotting.scatter`` module. |
||||
|
||||
The ``ncolumn`` parameter, if included, indicates which column in the |
||||
`DataFrame` contains the *number of samples*, to be displayed on top of |
||||
every bar. |
@ -0,0 +1,31 @@
|
||||
# -*- coding: utf-8 -*- |
||||
|
||||
source_suffix = '.rst' |
||||
master_doc = 'index' |
||||
|
||||
copyright = u'2020 Damien Goutte-Gattat' |
||||
author = u'Damien Goutte-Gattat <dgouttegattat@incenp.org>' |
||||
|
||||
language = 'en' |
||||
|
||||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] |
||||
|
||||
pygments_style = 'sphinx' |
||||
|
||||
extensions = ['sphinx.ext.intersphinx'] |
||||
intersphinx_mapping = {'python': ('https://docs.python.org/3', None)} |
||||
|
||||
# HTML output |
||||
html_theme = 'alabaster' |
||||
html_static_path = ['_static'] |
||||
|
||||
# LaTeX output |
||||
latex_engine = 'lualatex' |
||||
latex_elements = { |
||||
'papersize': 'a4paper', |
||||
'pointsize': '10pt' |
||||
} |
||||
latex_documents = [ |
||||
(master_doc, 'Pyplot.tex', u'Pyplot Documentation', |
||||
u'Damien Goutte-Gattat', 'manual') |
||||
] |
@ -0,0 +1,15 @@
|
||||
Pyplot Manual |
||||
============= |
||||
|
||||
Pyplot, or *incenp.plotting*, is a Python library providing helper |
||||
functions to facilitate the creation of special kinds of plots with |
||||
`Matplotlib`_. |
||||
|
||||
.. _Matplotlib: https://matplotlib.org/ |
||||
|
||||
.. toctree:: |
||||
:maxdepth: 2 |
||||
|
||||
install |
||||
scatterplots |
||||
barplots |
@ -0,0 +1,40 @@
|
||||
***************** |
||||
Installing Pyplot |
||||
***************** |
||||
|
||||
Installing from PyPI |
||||
==================== |
||||
|
||||
Packages for Pyplot are published on the `Python Package Index`_ under |
||||
the name ``incenp.pyplot``. To install the latest version from PyPI: |
||||
|
||||
.. _Python Package Index: https://pypi.org/project/incenp.pebble/ |
||||
|
||||
.. code-block:: console |
||||
|
||||
$ pip install -U incenp.pyplot |
||||
|
||||
|
||||
Installing from source |
||||
====================== |
||||
|
||||
You may download a release tarball from the `homepage`_ or from the |
||||
`release page`_, and then proceed to a manual installation: |
||||
|
||||
.. _homepage: https://incenp.org/dvlpt/pyplot.html |
||||
.. _release page: https://git.incenp.org/damien/pyplot/releases |
||||
|
||||
.. code-block:: console |
||||
|
||||
$ tar zxvf incenp.plotting-0.1.0.tar.gz |
||||
$ cd incenp.plotting-0.1.0 |
||||
$ python setup.py build |
||||
$ python setup.py install |
||||
|
||||
You may also clone the repository: |
||||
|
||||
.. code-block:: console |
||||
|
||||
$ git clone https://git.incenp.org/damien/pyplot.git |
||||
|
||||
and then proceed as above. |
After Width: | Height: | Size: 18 KiB |
After Width: | Height: | Size: 10 KiB |
After Width: | Height: | Size: 11 KiB |
After Width: | Height: | Size: 11 KiB |
After Width: | Height: | Size: 10 KiB |
@ -0,0 +1,176 @@
|
||||
******************** |
||||
Drawing scatterplots |
||||
******************** |
||||
|
||||
The ``incenp.plotting.scatterplot`` module provides a ``scatterplot`` |
||||
function to facilitate the creation of scatter plots. |
||||
|
||||
Note that what I call a ”scatter plot” here may not be the most common |
||||
acceptation of the term. I do *not* mean the 2-dimensional plotting of |
||||
two variables (one on the x-axis, the other on the y-axis). Rather, I |
||||
mean the plotting of a single variable on the y-axis, akin to a bar |
||||
chart, but with all data points depicted as scattered dots. |
||||
|
||||
.. figure:: scatterplot1.png |
||||
|
||||
A sample scatter plot. |
||||
|
||||
The figure above is a sample “scatter plot”. The orange boxes are not |
||||
part of the plot, but have been added to illustrate what are *tracks* |
||||
and *subtracks* in the context of the ``incenp.plotting.scatterplot`` |
||||
module. |
||||
|
||||
|
||||
Sample data |
||||
=========== |
||||
|
||||
The module is intended to work with indexed `DataFrame` objects |
||||
(including multi-indexed `DataFrame`). Let’s create such an object, |
||||
which we will use throughout this page: |
||||
|
||||
.. code-block:: python |
||||
|
||||
index = pd.MultiIndex.from_arrays([ |
||||
['foo'] * 40 + ['bar'] * 40 + ['baz'] * 40 + ['qux'] * 40, |
||||
['one', 'two'] * 80 |
||||
], |
||||
names=['first', 'second'] |
||||
) |
||||
df = pd.DataFrame(np.random.randn(160,4), index = index, |
||||
columns=['A', 'B', 'C', 'D']) |
||||
|
||||
This creates a `DataFrame` with 4 columns (``A`` to ``D``) and 160 |
||||
rows, indexed in two levels (level ``first``, with 4 distinct values |
||||
``foo``, ``bar``, ``baz``, and ``qux``; and level ``second``, with 2 |
||||
distinct values ``one`` and ``two``). |
||||
|
||||
|
||||
Quick start |
||||
=========== |
||||
|
||||
As an initial example, here is the call to ``scatterplot`` to draw the |
||||
graph above (``ax`` is supposed to be a `matplotlib.axes.Axes` object): |
||||
|
||||
.. code-block:: python |
||||
|
||||
scatterplot(ax, df, columns='A', |
||||
tracks=['foo', 'bar', 'baz'], trackname='first', |
||||
subtracks=['one', 'two'], subtrackname='second') |
||||
ax.legend(['one', 'two']) |
||||
|
||||
The ``columns`` parameter indicates that the values to be plotted comes |
||||
from the column named ``A``. |
||||
|
||||
The ``tracks`` parameter gives the index values used to distribute the |
||||
values of column ``A`` into three different tracks (one track for rows |
||||
with index value ``foo``, one track for rows with index value ``bar``, |
||||
and so on); the associated ``trackname`` parameter indicates which index |
||||
level to use to lookup the values specified in the previous parameter, |
||||
if ``df`` is a multi-indexed `DataFrame`. |
||||
|
||||
The ``subtracks`` and ``subtrackname`` parameters are similar to the |
||||
``tracks`` and ``trackname`` parameter above, but for subtracks instead |
||||
of tracks. Here, they are used to say that values from rows with index |
||||
value ``one`` are to be plotted on one subtrack, while values from rows |
||||
with index value ``two`` are to be plotted on another subtrack. |
||||
|
||||
|
||||
Playing with tracks, subtracks, columns |
||||
======================================= |
||||
|
||||
The following code will plot the same values as above, but will invert |
||||
the tracks and the subtracks: the second-level index (``second``) will |
||||
be used to distribute values along tracks while the first-level index |
||||
(``first``) will be used to distribute values along subtracks: |
||||
|
||||
.. code-block:: python |
||||
|
||||
scatterplot(ax, df, columns='A', |
||||
tracks=['one', 'two'], trackname='second', |
||||
subtracks=['foo', 'bar', 'baz'], subtrackname='first') |
||||
ax.legend(['foo', 'bar', 'baz']) |
||||
|
||||
.. figure:: scatterplot2.png |
||||
|
||||
A scatterplot with inverted tracks and subtracks. |
||||
|
||||
|
||||
Values from several columns in the source `DataFrame` can be plotted at |
||||
once, by giving a list of column names (instead of a single name) to the |
||||
``columns`` parameter. By default, values from each column are plotted |
||||
in a different track. In the following examples, values from the columns |
||||
``A``, ``B``, and ``C`` are plotted; the first-level index is used to |
||||
distribute values along three different subtracks; the second-level |
||||
index is used to filter the `DataFrame` prior to plotting so that only |
||||
rows with the index value ``one`` are plotted. |
||||
|
||||
.. code-block:: python |
||||
|
||||
scatterplot(ax, df.xs('one', level='second'), |
||||
columns=['A', 'B', 'C'], |
||||
subtracks=['foo', 'baz', 'qux'], subtrackname='first') |
||||
ax.legend(['foo', 'baz', 'qux']) |
||||
|
||||
.. figure:: scatterplot3.png |
||||
|
||||
A scatterplot with values from several columns of the source |
||||
DataFrame. |
||||
|
||||
|
||||
To plot values from several columns as different subtracks rather than |
||||
different tracks, use the ``subtrackcolumns`` parameter as in the |
||||
example below. The ``tracks`` and ``trackname`` parameters may then be |
||||
used to define what goes into the tracks. |
||||
|
||||
.. code-block:: python |
||||
|
||||
scatterplot(ax, df.xs('one', level='second'), |
||||
columns=['A', 'B', 'C'], subtrackcolumns=True, |
||||
tracks=['foo', 'baz', 'qux'], trackname='first') |
||||
ax.legend(['A', 'B', 'C']) |
||||
|
||||
.. figure:: scatterplot4.png |
||||
|
||||
A scatterplot with values from several columns of the source |
||||
DataFrame, plotted as separate subtracks. |
||||
|
||||
|
||||
Miscellaneous features |
||||
====================== |
||||
|
||||
When plotting *two* subtracks, the ``testfunc`` parameter may be used to |
||||
have the ``scatterplot`` function draws the result of a statistical test |
||||
comparing the values from each subtrack in each track. |
||||
|
||||
The value of the ``testfunc`` parameter should be a function accepting |
||||
two `DataSeries` and returning a P-value, such as a the following |
||||
wrapper around Scipy’s ``mannwhitneyu`` function: |
||||
|
||||
.. code-block:: python |
||||
|
||||
from scipy.stats import mannwhitneyu |
||||
|
||||
def do_mannwhitney(a, b): |
||||
result = mannwhitneyu(a, b) |
||||
return result.pvalue |
||||
|
||||
Below is an example of using such a wrapper, with the resulting plot: |
||||
|
||||
.. code-block:: python |
||||
|
||||
scatterplot(ax, df, columns='B', |
||||
tracks=['foo', 'baz', 'qux'], trackname='first', |
||||
subtracks=['one', 'two'], subtrackname='second', |
||||
testfunc=do_mannwhitney, |
||||
colors='cm') |
||||
ax.legend(['one', 'two']) |
||||
|
||||
.. figure:: scatterplot5.png |
||||
|
||||
A scatterplot with results of statistical tests between subtracks. |
||||
|
||||
The example above also shows the ``colors`` parameter, used to change |
||||
the colors for the different subtracks. It can either be a string |
||||
containing one-letter color codes, or a list of Matplotlib colors. The |
||||
string or the list must be at least as long as the number of subtracks |
||||
to plot. |
Loading…
Reference in new issue