Browse Source

Add draft documentation.

Start writing a more complete documentation for the library.
master
Damien Goutte-Gattat 3 months ago
parent
commit
1c430a98e8
14 changed files with 336 additions and 4 deletions
  1. +1
    -0
      MANIFEST.in
  2. +3
    -2
      README.md
  3. BIN
      docs/barplot1.png
  4. +61
    -0
      docs/barplots.rst
  5. +31
    -0
      docs/conf.py
  6. +15
    -0
      docs/index.rst
  7. +40
    -0
      docs/install.rst
  8. BIN
      docs/scatterplot1.png
  9. BIN
      docs/scatterplot2.png
  10. BIN
      docs/scatterplot3.png
  11. BIN
      docs/scatterplot4.png
  12. BIN
      docs/scatterplot5.png
  13. +176
    -0
      docs/scatterplots.rst
  14. +9
    -2
      setup.py

+ 1
- 0
MANIFEST.in View File

@ -1 +1,2 @@
include AUTHORS COPYING README.md
graft docs

+ 3
- 2
README.md View File

@ -20,8 +20,9 @@ Available modules
Samples
-------
A draft documentation with some examples of what the library allows to
is available on <https://incenp.org/notes/2020/pyplot-examples.html>.
The `docs` directory contains a draft documentation with some examples
of what the library allows to do. An online copy of the documentation
is available on <https://incenp.org/dvlpt/pyplot/>.
Copying
-------


BIN
docs/barplot1.png View File

Before After
Width: 400  |  Height: 400  |  Size: 9.7 KiB

+ 61
- 0
docs/barplots.rst View File

@ -0,0 +1,61 @@
****************
Drawing barplots
****************
The ``incenp.plotting.bar`` module provides a ``barplot`` function
intended to facilitate the creation of bar plots from multi-indexed data.
The module uses the same notion of *tracks* and *subtracks* as the
``incenp.plotting.scatter`` module.
Sample data
===========
Let’s create a multi-indexed `DataFrame` which we will use in the
examples below:
.. code-block:: python
index = pd.MultiIndex.from_arrays([
['foo'] * 2 + ['bar'] * 2 + ['baz'] * 2 + ['qux'] * 2,
['one', 'two'] * 4
],
names=['first', 'second']
)
df = pd.DataFrame(np.random.randint(0, 100, size=(8,2)),
index = index, columns=['A', 'B'])
This creates a `DataFrame` with 2 columns (``A`` and ``B``) and 8
rows, indexed in two levels (level ``first``, with 4 distinct values
``foo``, ``bar``, ``baz``, and ``qux``; and level ``second``, with 2
distinct values ``one`` and ``two``).
Quick start
===========
Here is a quick example illustrating the main points of the ``barplot``
function (``ax`` is supposed to be a `matplotlib.axes.Axes` object):
.. code-block:: python
barplot(ax, df, column='A',
tracks=['foo', 'bar', 'bax'],
subtracks=['one', 'two'], subtrackname='second',
ncolumn='B')
ax.legend(['one', 'two'], loc='upper center')
.. figure:: barplot1.png
A sample bar plot.
The ``column`` parameter indicates which column in the `DataFrame`
contains the value to plot. The ``tracks`` and ``subtracks`` parameters
are used to select and distribute the rows along the tracks and
subtracks, in a similar way to the ``scatterplot`` function of the
``incenp.plotting.scatter`` module.
The ``ncolumn`` parameter, if included, indicates which column in the
`DataFrame` contains the *number of samples*, to be displayed on top of
every bar.

+ 31
- 0
docs/conf.py View File

@ -0,0 +1,31 @@
# -*- coding: utf-8 -*-
source_suffix = '.rst'
master_doc = 'index'
copyright = u'2020 Damien Goutte-Gattat'
author = u'Damien Goutte-Gattat <dgouttegattat@incenp.org>'
language = 'en'
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
pygments_style = 'sphinx'
extensions = ['sphinx.ext.intersphinx']
intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
# HTML output
html_theme = 'alabaster'
html_static_path = ['_static']
# LaTeX output
latex_engine = 'lualatex'
latex_elements = {
'papersize': 'a4paper',
'pointsize': '10pt'
}
latex_documents = [
(master_doc, 'Pyplot.tex', u'Pyplot Documentation',
u'Damien Goutte-Gattat', 'manual')
]

+ 15
- 0
docs/index.rst View File

@ -0,0 +1,15 @@
Pyplot Manual
=============
Pyplot, or *incenp.plotting*, is a Python library providing helper
functions to facilitate the creation of special kinds of plots with
`Matplotlib`_.
.. _Matplotlib: https://matplotlib.org/
.. toctree::
:maxdepth: 2
install
scatterplots
barplots

+ 40
- 0
docs/install.rst View File

@ -0,0 +1,40 @@
*****************
Installing Pyplot
*****************
Installing from PyPI
====================
Packages for Pyplot are published on the `Python Package Index`_ under
the name ``incenp.pyplot``. To install the latest version from PyPI:
.. _Python Package Index: https://pypi.org/project/incenp.pebble/
.. code-block:: console
$ pip install -U incenp.pyplot
Installing from source
======================
You may download a release tarball from the `homepage`_ or from the
`release page`_, and then proceed to a manual installation:
.. _homepage: https://incenp.org/dvlpt/pyplot.html
.. _release page: https://git.incenp.org/damien/pyplot/releases
.. code-block:: console
$ tar zxvf incenp.plotting-0.1.0.tar.gz
$ cd incenp.plotting-0.1.0
$ python setup.py build
$ python setup.py install
You may also clone the repository:
.. code-block:: console
$ git clone https://git.incenp.org/damien/pyplot.git
and then proceed as above.

BIN
docs/scatterplot1.png View File

Before After
Width: 400  |  Height: 400  |  Size: 18 KiB

BIN
docs/scatterplot2.png View File

Before After
Width: 400  |  Height: 400  |  Size: 10 KiB

BIN
docs/scatterplot3.png View File

Before After
Width: 400  |  Height: 400  |  Size: 11 KiB

BIN
docs/scatterplot4.png View File

Before After
Width: 400  |  Height: 400  |  Size: 11 KiB

BIN
docs/scatterplot5.png View File

Before After
Width: 400  |  Height: 400  |  Size: 10 KiB

+ 176
- 0
docs/scatterplots.rst View File

@ -0,0 +1,176 @@
********************
Drawing scatterplots
********************
The ``incenp.plotting.scatterplot`` module provides a ``scatterplot``
function to facilitate the creation of scatter plots.
Note that what I call a ”scatter plot” here may not be the most common
acceptation of the term. I do *not* mean the 2-dimensional plotting of
two variables (one on the x-axis, the other on the y-axis). Rather, I
mean the plotting of a single variable on the y-axis, akin to a bar
chart, but with all data points depicted as scattered dots.
.. figure:: scatterplot1.png
A sample scatter plot.
The figure above is a sample “scatter plot”. The orange boxes are not
part of the plot, but have been added to illustrate what are *tracks*
and *subtracks* in the context of the ``incenp.plotting.scatterplot``
module.
Sample data
===========
The module is intended to work with indexed `DataFrame` objects
(including multi-indexed `DataFrame`). Let’s create such an object,
which we will use throughout this page:
.. code-block:: python
index = pd.MultiIndex.from_arrays([
['foo'] * 40 + ['bar'] * 40 + ['baz'] * 40 + ['qux'] * 40,
['one', 'two'] * 80
],
names=['first', 'second']
)
df = pd.DataFrame(np.random.randn(160,4), index = index,
columns=['A', 'B', 'C', 'D'])
This creates a `DataFrame` with 4 columns (``A`` to ``D``) and 160
rows, indexed in two levels (level ``first``, with 4 distinct values
``foo``, ``bar``, ``baz``, and ``qux``; and level ``second``, with 2
distinct values ``one`` and ``two``).
Quick start
===========
As an initial example, here is the call to ``scatterplot`` to draw the
graph above (``ax`` is supposed to be a `matplotlib.axes.Axes` object):
.. code-block:: python
scatterplot(ax, df, columns='A',
tracks=['foo', 'bar', 'baz'], trackname='first',
subtracks=['one', 'two'], subtrackname='second')
ax.legend(['one', 'two'])
The ``columns`` parameter indicates that the values to be plotted comes
from the column named ``A``.
The ``tracks`` parameter gives the index values used to distribute the
values of column ``A`` into three different tracks (one track for rows
with index value ``foo``, one track for rows with index value ``bar``,
and so on); the associated ``trackname`` parameter indicates which index
level to use to lookup the values specified in the previous parameter,
if ``df`` is a multi-indexed `DataFrame`.
The ``subtracks`` and ``subtrackname`` parameters are similar to the
``tracks`` and ``trackname`` parameter above, but for subtracks instead
of tracks. Here, they are used to say that values from rows with index
value ``one`` are to be plotted on one subtrack, while values from rows
with index value ``two`` are to be plotted on another subtrack.
Playing with tracks, subtracks, columns
=======================================
The following code will plot the same values as above, but will invert
the tracks and the subtracks: the second-level index (``second``) will
be used to distribute values along tracks while the first-level index
(``first``) will be used to distribute values along subtracks:
.. code-block:: python
scatterplot(ax, df, columns='A',
tracks=['one', 'two'], trackname='second',
subtracks=['foo', 'bar', 'baz'], subtrackname='first')
ax.legend(['foo', 'bar', 'baz'])
.. figure:: scatterplot2.png
A scatterplot with inverted tracks and subtracks.
Values from several columns in the source `DataFrame` can be plotted at
once, by giving a list of column names (instead of a single name) to the
``columns`` parameter. By default, values from each column are plotted
in a different track. In the following examples, values from the columns
``A``, ``B``, and ``C`` are plotted; the first-level index is used to
distribute values along three different subtracks; the second-level
index is used to filter the `DataFrame` prior to plotting so that only
rows with the index value ``one`` are plotted.
.. code-block:: python
scatterplot(ax, df.xs('one', level='second'),
columns=['A', 'B', 'C'],
subtracks=['foo', 'baz', 'qux'], subtrackname='first')
ax.legend(['foo', 'baz', 'qux'])
.. figure:: scatterplot3.png
A scatterplot with values from several columns of the source
DataFrame.
To plot values from several columns as different subtracks rather than
different tracks, use the ``subtrackcolumns`` parameter as in the
example below. The ``tracks`` and ``trackname`` parameters may then be
used to define what goes into the tracks.
.. code-block:: python
scatterplot(ax, df.xs('one', level='second'),
columns=['A', 'B', 'C'], subtrackcolumns=True,
tracks=['foo', 'baz', 'qux'], trackname='first')
ax.legend(['A', 'B', 'C'])
.. figure:: scatterplot4.png
A scatterplot with values from several columns of the source
DataFrame, plotted as separate subtracks.
Miscellaneous features
======================
When plotting *two* subtracks, the ``testfunc`` parameter may be used to
have the ``scatterplot`` function draws the result of a statistical test
comparing the values from each subtrack in each track.
The value of the ``testfunc`` parameter should be a function accepting
two `DataSeries` and returning a P-value, such as a the following
wrapper around Scipy’s ``mannwhitneyu`` function:
.. code-block:: python
from scipy.stats import mannwhitneyu
def do_mannwhitney(a, b):
result = mannwhitneyu(a, b)
return result.pvalue
Below is an example of using such a wrapper, with the resulting plot:
.. code-block:: python
scatterplot(ax, df, columns='B',
tracks=['foo', 'baz', 'qux'], trackname='first',
subtracks=['one', 'two'], subtrackname='second',
testfunc=do_mannwhitney,
colors='cm')
ax.legend(['one', 'two'])
.. figure:: scatterplot5.png
A scatterplot with results of statistical tests between subtracks.
The example above also shows the ``colors`` parameter, used to change
the colors for the different subtracks. It can either be a string
containing one-letter color codes, or a list of Matplotlib colors. The
string or the list must be at least as long as the number of subtracks
to plot.

+ 9
- 2
setup.py View File

@ -38,7 +38,14 @@ setup(
'Topic :: Scientific/Engineering :: Visualization',
'Topic :: Software Development :: Libraries :: Python Modules'
],
packages=find_packages(),
include_package_data=True
include_package_data=True,
command_options={
'build_sphinx': {
'project': ('setup.py', 'Pyplot'),
'version': ('setup.py', __version__),
'release': ('setup.py', __version__)
}
}
)

Loading…
Cancel
Save