Compare commits

...

7 Commits

Author SHA1 Message Date
Damien Goutte-Gattat 0ac012c1e1 Add the `annotate_bars` function. 3 weeks ago
Damien Goutte-Gattat 5b62023202 Switch to Black-enforced code style. 2 months ago
Damien Goutte-Gattat 49356aded0 Fix long lines. 2 years ago
Damien Goutte-Gattat 1c430a98e8 Add draft documentation. 2 years ago
Damien Goutte-Gattat 3c30328a8c Add a link to the online documentation. 2 years ago
Damien Goutte-Gattat 79f7f70f34 Do not assume the format of color parameters. 2 years ago
Damien Goutte-Gattat 194e507671 Bump version number. 3 years ago
  1. 1
      MANIFEST.in
  2. 6
      README.md
  3. BIN
      docs/barplot1.png
  4. 61
      docs/barplots.rst
  5. 31
      docs/conf.py
  6. 15
      docs/index.rst
  7. 40
      docs/install.rst
  8. BIN
      docs/scatterplot1.png
  9. BIN
      docs/scatterplot2.png
  10. BIN
      docs/scatterplot3.png
  11. BIN
      docs/scatterplot4.png
  12. BIN
      docs/scatterplot5.png
  13. 176
      docs/scatterplots.rst
  14. 2
      incenp/plotting/__init__.py
  15. 36
      incenp/plotting/bar.py
  16. 55
      incenp/plotting/scatter.py
  17. 108
      incenp/plotting/util.py
  18. 2
      pyproject.toml
  19. 16
      setup.py

@ -1 +1,2 @@
include AUTHORS COPYING README.md
graft docs

@ -18,6 +18,12 @@ Available modules
but which can also be used directly. This module provides notably a
*xdistr* function to facilitate creating scatter plots.
Samples
-------
The `docs` directory contains a draft documentation with some examples
of what the library allows to do. An online copy of the documentation
is available on <https://incenp.org/dvlpt/pyplot/>.
Copying
-------
Incenp.plotting is distributed under the terms of the GNU General Public

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.7 KiB

@ -0,0 +1,61 @@
****************
Drawing barplots
****************
The ``incenp.plotting.bar`` module provides a ``barplot`` function
intended to facilitate the creation of bar plots from multi-indexed data.
The module uses the same notion of *tracks* and *subtracks* as the
``incenp.plotting.scatter`` module.
Sample data
===========
Let’s create a multi-indexed `DataFrame` which we will use in the
examples below:
.. code-block:: python
index = pd.MultiIndex.from_arrays([
['foo'] * 2 + ['bar'] * 2 + ['baz'] * 2 + ['qux'] * 2,
['one', 'two'] * 4
],
names=['first', 'second']
)
df = pd.DataFrame(np.random.randint(0, 100, size=(8,2)),
index = index, columns=['A', 'B'])
This creates a `DataFrame` with 2 columns (``A`` and ``B``) and 8
rows, indexed in two levels (level ``first``, with 4 distinct values
``foo``, ``bar``, ``baz``, and ``qux``; and level ``second``, with 2
distinct values ``one`` and ``two``).
Quick start
===========
Here is a quick example illustrating the main points of the ``barplot``
function (``ax`` is supposed to be a `matplotlib.axes.Axes` object):
.. code-block:: python
barplot(ax, df, column='A',
tracks=['foo', 'bar', 'bax'],
subtracks=['one', 'two'], subtrackname='second',
ncolumn='B')
ax.legend(['one', 'two'], loc='upper center')
.. figure:: barplot1.png
A sample bar plot.
The ``column`` parameter indicates which column in the `DataFrame`
contains the value to plot. The ``tracks`` and ``subtracks`` parameters
are used to select and distribute the rows along the tracks and
subtracks, in a similar way to the ``scatterplot`` function of the
``incenp.plotting.scatter`` module.
The ``ncolumn`` parameter, if included, indicates which column in the
`DataFrame` contains the *number of samples*, to be displayed on top of
every bar.

@ -0,0 +1,31 @@
# -*- coding: utf-8 -*-
source_suffix = '.rst'
master_doc = 'index'
copyright = u'2020 Damien Goutte-Gattat'
author = u'Damien Goutte-Gattat <dgouttegattat@incenp.org>'
language = 'en'
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
pygments_style = 'sphinx'
extensions = ['sphinx.ext.intersphinx']
intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
# HTML output
html_theme = 'alabaster'
html_static_path = ['_static']
# LaTeX output
latex_engine = 'lualatex'
latex_elements = {
'papersize': 'a4paper',
'pointsize': '10pt'
}
latex_documents = [
(master_doc, 'Pyplot.tex', u'Pyplot Documentation',
u'Damien Goutte-Gattat', 'manual')
]

@ -0,0 +1,15 @@
Pyplot Manual
=============
Pyplot, or *incenp.plotting*, is a Python library providing helper
functions to facilitate the creation of special kinds of plots with
`Matplotlib`_.
.. _Matplotlib: https://matplotlib.org/
.. toctree::
:maxdepth: 2
install
scatterplots
barplots

@ -0,0 +1,40 @@
*****************
Installing Pyplot
*****************
Installing from PyPI
====================
Packages for Pyplot are published on the `Python Package Index`_ under
the name ``incenp.pyplot``. To install the latest version from PyPI:
.. _Python Package Index: https://pypi.org/project/incenp.pebble/
.. code-block:: console
$ pip install -U incenp.pyplot
Installing from source
======================
You may download a release tarball from the `homepage`_ or from the
`release page`_, and then proceed to a manual installation:
.. _homepage: https://incenp.org/dvlpt/pyplot.html
.. _release page: https://git.incenp.org/damien/pyplot/releases
.. code-block:: console
$ tar zxvf incenp.plotting-0.1.0.tar.gz
$ cd incenp.plotting-0.1.0
$ python setup.py build
$ python setup.py install
You may also clone the repository:
.. code-block:: console
$ git clone https://git.incenp.org/damien/pyplot.git
and then proceed as above.

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

@ -0,0 +1,176 @@
********************
Drawing scatterplots
********************
The ``incenp.plotting.scatterplot`` module provides a ``scatterplot``
function to facilitate the creation of scatter plots.
Note that what I call a ”scatter plot” here may not be the most common
acceptation of the term. I do *not* mean the 2-dimensional plotting of
two variables (one on the x-axis, the other on the y-axis). Rather, I
mean the plotting of a single variable on the y-axis, akin to a bar
chart, but with all data points depicted as scattered dots.
.. figure:: scatterplot1.png
A sample scatter plot.
The figure above is a sample “scatter plot”. The orange boxes are not
part of the plot, but have been added to illustrate what are *tracks*
and *subtracks* in the context of the ``incenp.plotting.scatterplot``
module.
Sample data
===========
The module is intended to work with indexed `DataFrame` objects
(including multi-indexed `DataFrame`). Let’s create such an object,
which we will use throughout this page:
.. code-block:: python
index = pd.MultiIndex.from_arrays([
['foo'] * 40 + ['bar'] * 40 + ['baz'] * 40 + ['qux'] * 40,
['one', 'two'] * 80
],
names=['first', 'second']
)
df = pd.DataFrame(np.random.randn(160,4), index = index,
columns=['A', 'B', 'C', 'D'])
This creates a `DataFrame` with 4 columns (``A`` to ``D``) and 160
rows, indexed in two levels (level ``first``, with 4 distinct values
``foo``, ``bar``, ``baz``, and ``qux``; and level ``second``, with 2
distinct values ``one`` and ``two``).
Quick start
===========
As an initial example, here is the call to ``scatterplot`` to draw the
graph above (``ax`` is supposed to be a `matplotlib.axes.Axes` object):
.. code-block:: python
scatterplot(ax, df, columns='A',
tracks=['foo', 'bar', 'baz'], trackname='first',
subtracks=['one', 'two'], subtrackname='second')
ax.legend(['one', 'two'])
The ``columns`` parameter indicates that the values to be plotted comes
from the column named ``A``.
The ``tracks`` parameter gives the index values used to distribute the
values of column ``A`` into three different tracks (one track for rows
with index value ``foo``, one track for rows with index value ``bar``,
and so on); the associated ``trackname`` parameter indicates which index
level to use to lookup the values specified in the previous parameter,
if ``df`` is a multi-indexed `DataFrame`.
The ``subtracks`` and ``subtrackname`` parameters are similar to the
``tracks`` and ``trackname`` parameter above, but for subtracks instead
of tracks. Here, they are used to say that values from rows with index
value ``one`` are to be plotted on one subtrack, while values from rows
with index value ``two`` are to be plotted on another subtrack.
Playing with tracks, subtracks, columns
=======================================
The following code will plot the same values as above, but will invert
the tracks and the subtracks: the second-level index (``second``) will
be used to distribute values along tracks while the first-level index
(``first``) will be used to distribute values along subtracks:
.. code-block:: python
scatterplot(ax, df, columns='A',
tracks=['one', 'two'], trackname='second',
subtracks=['foo', 'bar', 'baz'], subtrackname='first')
ax.legend(['foo', 'bar', 'baz'])
.. figure:: scatterplot2.png
A scatterplot with inverted tracks and subtracks.
Values from several columns in the source `DataFrame` can be plotted at
once, by giving a list of column names (instead of a single name) to the
``columns`` parameter. By default, values from each column are plotted
in a different track. In the following examples, values from the columns
``A``, ``B``, and ``C`` are plotted; the first-level index is used to
distribute values along three different subtracks; the second-level
index is used to filter the `DataFrame` prior to plotting so that only
rows with the index value ``one`` are plotted.
.. code-block:: python
scatterplot(ax, df.xs('one', level='second'),
columns=['A', 'B', 'C'],
subtracks=['foo', 'baz', 'qux'], subtrackname='first')
ax.legend(['foo', 'baz', 'qux'])
.. figure:: scatterplot3.png
A scatterplot with values from several columns of the source
DataFrame.
To plot values from several columns as different subtracks rather than
different tracks, use the ``subtrackcolumns`` parameter as in the
example below. The ``tracks`` and ``trackname`` parameters may then be
used to define what goes into the tracks.
.. code-block:: python
scatterplot(ax, df.xs('one', level='second'),
columns=['A', 'B', 'C'], subtrackcolumns=True,
tracks=['foo', 'baz', 'qux'], trackname='first')
ax.legend(['A', 'B', 'C'])
.. figure:: scatterplot4.png
A scatterplot with values from several columns of the source
DataFrame, plotted as separate subtracks.
Miscellaneous features
======================
When plotting *two* subtracks, the ``testfunc`` parameter may be used to
have the ``scatterplot`` function draws the result of a statistical test
comparing the values from each subtrack in each track.
The value of the ``testfunc`` parameter should be a function accepting
two `DataSeries` and returning a P-value, such as a the following
wrapper around Scipy’s ``mannwhitneyu`` function:
.. code-block:: python
from scipy.stats import mannwhitneyu
def do_mannwhitney(a, b):
result = mannwhitneyu(a, b)
return result.pvalue
Below is an example of using such a wrapper, with the resulting plot:
.. code-block:: python
scatterplot(ax, df, columns='B',
tracks=['foo', 'baz', 'qux'], trackname='first',
subtracks=['one', 'two'], subtrackname='second',
testfunc=do_mannwhitney,
colors='cm')
ax.legend(['one', 'two'])
.. figure:: scatterplot5.png
A scatterplot with results of statistical tests between subtracks.
The example above also shows the ``colors`` parameter, used to change
the colors for the different subtracks. It can either be a string
containing one-letter color codes, or a list of Matplotlib colors. The
string or the list must be at least as long as the number of subtracks
to plot.

@ -1 +1 @@
__version__ = '0.1.0'
__version__ = '0.1.1'

@ -26,7 +26,7 @@ from numpy import arange as _arange
def get_subtrack_offset(n_subtrack, max_subtrack, width):
"""Get the X offset to center a subtrack around an origin.
:param n_subtrack: The 0-based subtrack index
:param max_subtrack: The number of subtracks
:param The width of a single subtrack
@ -40,10 +40,20 @@ def get_subtrack_offset(n_subtrack, max_subtrack, width):
return base_offset + subtrack_offset + center_offset
def barplot(ax, data, column, tracks, subtracks=[None], subtrackname=1,
ncolumn=None, nformat="n={}", colors='rgb', width=.25):
def barplot(
ax,
data,
column,
tracks,
subtracks=[None],
subtrackname=1,
ncolumn=None,
nformat="n={}",
colors='rgb',
width=0.25,
):
"""Create a barplot from multi-indexed data.
:param ax: The matplotlib axis to draw on
:param data: The data from which the values to plot are taken
:param column: The name of the column in the data frame containing
@ -57,7 +67,7 @@ def barplot(ax, data, column, tracks, subtracks=[None], subtrackname=1,
:param nformat: Format string for the number of samples
:param colors: A list of matplotlib color specifications (one for
each subtrack)
:param width: The width of a single subtrack:
:param width: The width of a single subtrack
"""
for s, subtrack in enumerate(subtracks):
@ -76,12 +86,14 @@ def barplot(ax, data, column, tracks, subtracks=[None], subtrackname=1,
for i, rect in enumerate(rects):
height = rect.get_height()
n = subset.loc[tracks, ncolumn][i]
ax.annotate(nformat.format(n),
xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 5),
textcoords='offset points',
ha='center', va='center',
fontsize='xx-small')
ax.annotate(
nformat.format(n),
xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 5),
textcoords='offset points',
ha='center',
va='center',
fontsize='xx-small',
)
ax.set_xticklabels(tracks)

@ -24,10 +24,11 @@ plots from multi-indexed Panda datasets.
from .util import xdistr, get_stars
def scatterplot_subtrack(ax, data, n_track, n_subtrack, max_subtrack,
color, width=.7, min_sep=-.05):
def scatterplot_subtrack(
ax, data, n_track, n_subtrack, max_subtrack, color, width=0.7, min_sep=-0.05
):
"""Plot a single subtrack.
:param ax: The matplotlib axis to draw on
:param data: The data series from which the values to plot are taken
:param n_track: Offset of the current track
@ -42,16 +43,26 @@ def scatterplot_subtrack(ax, data, n_track, n_subtrack, max_subtrack,
offset = n_track * max_subtrack + n_subtrack
xs = xdistr(data.values, width, center=True, min_sep=min_sep, offset=offset)
ax.plot(xs, data.values, color + '.')
ax.hlines(data.mean(), offset - .5, offset + .5, color)
def scatterplot(ax, data, columns, subtrackcolumns=False,
tracks=[None], trackname=0,
subtracks=[None], subtrackname=1,
colors='rgb', width=.7, min_sep=-.05, testfunc=None):
ax.plot(xs, data.values, color=color, marker='.', linestyle='')
ax.hlines(data.mean(), offset - 0.5, offset + 0.5, color)
def scatterplot(
ax,
data,
columns,
subtrackcolumns=False,
tracks=[None],
trackname=0,
subtracks=[None],
subtrackname=1,
colors='rgb',
width=0.7,
min_sep=-0.05,
testfunc=None,
):
"""Create a scatterplot from multi-indexed data.
:param ax: The matplotlib axis to draw on
:param data: The data frame from which the values to plot are taken
:param columns: The name of the column(s) in the data frame
@ -122,16 +133,23 @@ def scatterplot(ax, data, columns, subtrackcolumns=False,
for column in columns:
subset = _get_dataseries(data, column, track, trackname)
testset.append(subset)
scatterplot_subtrack(ax, subset, i, j, n, colors[j % n], width, min_sep)
scatterplot_subtrack(
ax, subset, i, j, n, colors[j % n], width, min_sep
)
j += 1
else:
for subtrack in subtracks:
indexer = [track, subtrack] if subtrack else [track]
level = [trackname, subtrackname] if subtrack else [trackname]
if subtrack:
level = [trackname, subtrackname]
else:
level = [trackname]
subset = _get_dataseries(data, columns, indexer, level=level)
testset.append(subset)
scatterplot_subtrack(ax, subset, i, j, n, colors[j % n], width, min_sep)
scatterplot_subtrack(
ax, subset, i, j, n, colors[j % n], width, min_sep
)
j += 1
if testfunc and len(testset) == 2:
@ -139,7 +157,7 @@ def scatterplot(ax, data, columns, subtrackcolumns=False,
i += 1
ax.set_xticks([(.5 * (n - 2)) + n * i for i in range(len(labels))])
ax.set_xticks([(0.5 * (n - 2)) + n * i for i in range(len(labels))])
ax.set_xticklabels(labels)
@ -156,6 +174,5 @@ def _do_test(ax, testset, testfunc, n_track, max_subtrack):
if pvalue:
y = max(testset[0].max(), testset[1].max()) * 1.1
offset = n_track * max_subtrack
ax.hlines(y, offset - .5, offset + 1.5)
ax.text(offset + .5, y, get_stars(pvalue), ha='center', va='bottom')
ax.hlines(y, offset - 0.5, offset + 1.5)
ax.text(offset + 0.5, y, get_stars(pvalue), ha='center', va='bottom')

@ -17,13 +17,15 @@
"""Miscellaneous utility functions for plotting tasks."""
from matplotlib.transforms import Bbox
def xdistr(values, width, offset=0, even_max=10, center=False, min_sep=-.05):
def xdistr(values, width, offset=0, even_max=10, center=False, min_sep=-0.05):
"""Distribute coordinates around an axis.
Given a list of values, return an array of x coordinates so that
the plotted values are equally distributed.
:param values: The list of values to distribute
:param width: The distance along which to distribute the values
:param offset: An offset to apply to each returned coordinates
@ -88,3 +90,103 @@ def get_stars(pvalue):
return '*'
else:
return 'ns'
def annotate_bars(
ax, bars, annotations=None, fmt=None, space=0, text_kw={}, text_kw_reversed={}
):
"""Add text annotations to a set of bars.
This function draws text annotations near the end of previously drawn bars.
The text labels are automatically positioned to take into account the available
space between the bars and the edge of the plot.
:param ax: The plot to draw on
:param bars: The bars to annotate
:param annotations: A list containing the text annotations to draw; if None,
the annotations will be inferred from the bars' values
:param fmt: A format string to apply to the annotations when drawing them
:param space: An extra space to insert between the edge of the bars and the
text labels, in plot units (default to zero)
:param text_kw: Extra text parameters
:param text_kw_reversed: Extra text parameters for labels that end up being
placed inside a bar
"""
renderer = ax.figure.canvas.get_renderer()
trans = ax.transData.inverted()
is_vertical = bars.orientation == 'vertical'
if is_vertical:
ymin, ymax = ax.get_ylim()
is_inverted = ymin > ymax
else:
xmin, xmax = ax.get_xlim()
is_inverted = xmin > xmax
if annotations is None:
if is_vertical:
annotations = [bar.get_height() for bar in bars]
else:
annotations = [bar.get_width() for bar in bars]
if fmt is not None:
annotations = [fmt.format(a) for a in annotations]
for i, annotation in enumerate(annotations):
bar = bars[i]
if is_vertical:
x = bar.get_x() + bar.get_width() / 2
y = bar.get_height()
valign = 'bottom'
halign = 'center'
else:
x = bar.get_width()
y = bar.get_y() + bar.get_height() / 2
valign = 'center'
halign = 'left'
text = ax.text(x, y, annotation, va=valign, ha=halign, **text_kw)
box = Bbox(trans.transform(text.get_window_extent(renderer)))
reverse = False
if is_vertical:
if is_inverted:
if box.y0 - box.height + space < ymin:
# We have enough space below the bars, shift the labels down
ny = box.y0 - box.height + space
else:
# Keep the labels above the bottom of the bars, shift them up a bit
ny = box.y0 - space
reverse = True
else:
if box.y0 + box.height + space > ymax:
# Not enough space above the bars, shift the labels down
# We need to apply a correction to factor in the font depth
ny = box.y0 - box.height - (2 / 12 * box.height) - space
reverse = True
else:
# Keep the labels above the top of the bars, shift them up a bit
ny = box.y0 + space
text.set_y(ny)
else:
# Horizontal bars
if is_inverted:
if box.x0 - box.width + space < xmin:
# We have enough space on the left of the bars,
# shift the labels to the left
nx = box.x0 - box.width + space
else:
# Keep the labels inside the bars, shift them a bit to the right
nx = box.x0 - space
reverse = True
else:
if box.x0 + box.width + space > xmax:
# Not enough space to the right of the bars,
# shift the labels inside
nx = box.x0 - box.width - space
reverse = True
else:
# Keep the labels on the right of the bars, shift them a bit further
nx = box.x0 + space
text.set_x(nx)
if reverse and len(text_kw_reversed) > 0:
text.set(**text_kw_reversed)

@ -0,0 +1,2 @@
[tool.black]
skip-string-normalization = true

@ -36,9 +36,15 @@ setup(
'License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)',
'Programming Language :: Python :: 3.7',
'Topic :: Scientific/Engineering :: Visualization',
'Topic :: Software Development :: Libraries :: Python Modules'
],
'Topic :: Software Development :: Libraries :: Python Modules',
],
packages=find_packages(),
include_package_data=True
)
include_package_data=True,
command_options={
'build_sphinx': {
'project': ('setup.py', 'Pyplot'),
'version': ('setup.py', __version__),
'release': ('setup.py', __version__),
}
},
)

Loading…
Cancel
Save