ntv-pandas.ntv_pandas
NTV-pandas Package
Created on Sept 2023
@author: philippe@loco-labs.io
This package contains the following classes and functions:
ntv-pandas.ntv_pandas.pandas_ntv_connector
:ntv-pandas.ntv_pandas.pandas_ntv_connector.DataFrameConnec
ntv-pandas.ntv_pandas.pandas_ntv_connector.SeriesConnec
ntv-pandas.ntv_pandas.pandas_ntv_connector.to_json
ntv-pandas.ntv_pandas.pandas_ntv_connector.read_json
ntv-pandas.ntv_pandas.pandas_ntv_connector.as_def_type
NTV-pandas : A semantic, compact and reversible JSON-pandas converter
Why a NTV-pandas converter ?
pandas provide JSON converter but three limitations are present:
- the JSON-pandas converter take into account a few data types,
- the JSON-pandas converter is not always reversible (round type)
- external dtype (e.g. TableSchema type) are not included
main features
The NTV-pandas converter uses the semantic NTV format to include a large set of data types in a JSON representation. The converter integrates:
- all the pandas
dtype
and the data-type associated to a JSON representation, - an always reversible conversion,
- a full compatibility with TableSchema specification
example
In the example below, a DataFrame with several data types is converted to JSON.
The DataFrame resulting from this JSON is identical to the initial DataFrame (reversibility).
With the existing JSON interface, this conversion is not possible.
data example
In [1]: from shapely.geometry import Point
from datetime import date
import pandas as pd
import ntv_pandas as npd
In [2]: data = {'index': [100, 200, 300, 400, 500, 600],
'dates::date': pd.Series([date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5), date(2022,1,21)]),
'value': [10, 10, 20, 20, 30, 30],
'value32': pd.Series([12, 12, 22, 22, 32, 32], dtype='int32'),
'res': [10, 20, 30, 10, 20, 30],
'coord::point': pd.Series([Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4), Point(5,6)]),
'names': pd.Series(['john', 'eric', 'judith', 'mila', 'hector', 'maria'], dtype='string'),
'unique': True }
In [3]: df = pd.DataFrame(data).set_index('index')
In [4]: df
Out[4]:
dates::date value value32 res coord::point names unique
index
100 1964-01-01 10 12 10 POINT (1 2) john True
200 1985-02-05 10 12 20 POINT (3 4) eric True
300 2022-01-21 20 22 30 POINT (5 6) judith True
400 1964-01-01 20 22 10 POINT (7 8) mila True
500 1985-02-05 30 32 20 POINT (3 4) hector True
600 2022-01-21 30 32 30 POINT (5 6) maria True
JSON representation
In [5]: df_to_json = npd.to_json(df)
pprint(df_to_json, compact=True, width=120)
Out[5]:
{':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],
'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'],
'index': [100, 200, 300, 400, 500, 600],
'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
'res': [10, 20, 30, 10, 20, 30],
'unique': [True, True, True, True, True, True],
'value': [10, 10, 20, 20, 30, 30],
'value32::int32': [12, 12, 22, 22, 32, 32]}}
Reversibility
In [5]: df_from_json = npd.read_json(df_to_json)
print('df created from JSON is equal to initial df ? ', df_from_json.equals(df))
Out[5]: df created from JSON is equal to initial df ? True
1# -*- coding: utf-8 -*- 2""" 3***NTV-pandas Package*** 4 5Created on Sept 2023 6 7@author: philippe@loco-labs.io 8 9This package contains the following classes and functions: 10 11- `ntv-pandas.ntv_pandas.pandas_ntv_connector` : 12 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.DataFrameConnec` 13 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.SeriesConnec` 14 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.to_json` 15 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.read_json` 16 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.as_def_type` 17 18 19# NTV-pandas : A semantic, compact and reversible JSON-pandas converter 20 21# Why a NTV-pandas converter ? 22 23pandas provide JSON converter but three limitations are present: 24- the JSON-pandas converter take into account a few data types, 25- the JSON-pandas converter is not always reversible (round type) 26- external dtype (e.g. TableSchema type) are not included 27 28# main features 29 30The NTV-pandas converter uses the [semantic NTV format 31](https://loco-philippe.github.io/ES/JSON%20semantic%20format%20(JSON-NTV).htm) 32to include a large set of data types in a JSON representation. 33The converter integrates: 34- all the pandas `dtype` and the data-type associated to a JSON representation, 35- an always reversible conversion, 36- a full compatibility with TableSchema specification 37 38# example 39 40In the example below, a DataFrame with several data types is converted to JSON. 41 42The DataFrame resulting from this JSON is identical to the initial DataFrame (reversibility). 43 44With the existing JSON interface, this conversion is not possible. 45 46*data example* 47```python 48In [1]: from shapely.geometry import Point 49 from datetime import date 50 import pandas as pd 51 import ntv_pandas as npd 52 53In [2]: data = {'index': [100, 200, 300, 400, 500, 600], 54 'dates::date': pd.Series([date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5), date(2022,1,21)]), 55 'value': [10, 10, 20, 20, 30, 30], 56 'value32': pd.Series([12, 12, 22, 22, 32, 32], dtype='int32'), 57 'res': [10, 20, 30, 10, 20, 30], 58 'coord::point': pd.Series([Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4), Point(5,6)]), 59 'names': pd.Series(['john', 'eric', 'judith', 'mila', 'hector', 'maria'], dtype='string'), 60 'unique': True } 61 62In [3]: df = pd.DataFrame(data).set_index('index') 63 64In [4]: df 65Out[4]: 66 dates::date value value32 res coord::point names unique 67 index 68 100 1964-01-01 10 12 10 POINT (1 2) john True 69 200 1985-02-05 10 12 20 POINT (3 4) eric True 70 300 2022-01-21 20 22 30 POINT (5 6) judith True 71 400 1964-01-01 20 22 10 POINT (7 8) mila True 72 500 1985-02-05 30 32 20 POINT (3 4) hector True 73 600 2022-01-21 30 32 30 POINT (5 6) maria True 74``` 75 76*JSON representation* 77 78```python 79In [5]: df_to_json = npd.to_json(df) 80 pprint(df_to_json, compact=True, width=120) 81Out[5]: 82 {':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]], 83 'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], 84 'index': [100, 200, 300, 400, 500, 600], 85 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'], 86 'res': [10, 20, 30, 10, 20, 30], 87 'unique': [True, True, True, True, True, True], 88 'value': [10, 10, 20, 20, 30, 30], 89 'value32::int32': [12, 12, 22, 22, 32, 32]}} 90``` 91 92*Reversibility* 93 94```python 95In [5]: df_from_json = npd.read_json(df_to_json) 96 print('df created from JSON is equal to initial df ? ', df_from_json.equals(df)) 97 98Out[5]: df created from JSON is equal to initial df ? True 99``` 100 101 102""" 103from ntv_pandas.pandas_ntv_connector import DataFrameConnec, SeriesConnec, read_json, to_json, as_def_type 104 105 106#print('package :', __package__)