python.observation.dataset

Created on Thu May 26 20:30:00 2022

@author: philippe@loco-labs.io

The python.observation.dataset module contains the Dataset class.

Documentation is available in other pages :


  1# -*- coding: utf-8 -*-
  2"""
  3Created on Thu May 26 20:30:00 2022
  4
  5@author: philippe@loco-labs.io
  6
  7The `python.observation.dataset` module contains the `Dataset` class.
  8
  9Documentation is available in other pages :
 10
 11- The Json Standard for Dataset is define
 12[here](https://github.com/loco-philippe/Environmental-Sensing/tree/main/documentation/DatasetJSON-Standard.pdf)
 13- The concept of 'indexed list' is describe in
 14[this page](https://github.com/loco-philippe/Environmental-Sensing/wiki/Indexed-list).
 15- The non-regression test are at
 16[this page](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Tests/test_dataset.py)
 17- The [examples](https://github.com/loco-philippe/Environmental-Sensing/tree/main/python/Examples/Dataset)
 18 are :
 19    - [creation](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_creation.ipynb)
 20    - [variable](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_variable.ipynb)
 21    - [update](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_update.ipynb)
 22    - [structure](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_structure.ipynb)
 23    - [structure-analysis](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_structure-analysis.ipynb)
 24
 25---
 26"""
 27# %% declarations
 28from collections import Counter
 29from copy import copy
 30from abc import ABC
 31import math
 32import json
 33import csv
 34
 35from observation.fields import Nfield
 36from observation.util import util
 37from observation.dataset_interface import DatasetInterface, DatasetError
 38from observation.dataset_structure import DatasetStructure
 39from observation.dataset_analysis import Analysis
 40from json_ntv.ntv import Ntv, NtvConnector
 41
 42class Dataset(DatasetStructure, DatasetInterface, ABC):
 43    # %% intro
 44    '''
 45    An `Dataset` is a representation of an indexed list.
 46
 47    *Attributes (for @property see methods)* :
 48
 49    - **lindex** : list of Field
 50    - **analysis** : Analysis object (data structure)
 51
 52    The methods defined in this class are :
 53
 54    *constructor (@classmethod))*
 55
 56    - `Dataset.ntv`
 57    - `Dataset.from_csv`
 58    - `Dataset.from_ntv`
 59    - `Dataset.from_file`
 60    - `Dataset.merge`
 61
 62    *abstract static methods (@abstractmethod, @staticmethod)*
 63
 64    - `Dataset.field_class`
 65    
 66    *dynamic value - module analysis (getters @property)*
 67
 68    - `Dataset.extidx`
 69    - `Dataset.extidxext`
 70    - `Dataset.groups`
 71    - `Dataset.idxname`
 72    - `Dataset.idxlen`
 73    - `Dataset.iidx`
 74    - `Dataset.lenidx`
 75    - `Dataset.lidx`
 76    - `Dataset.lidxrow`
 77    - `Dataset.lisvar`
 78    - `Dataset.lvar`
 79    - `Dataset.lvarname`
 80    - `Dataset.lvarrow`
 81    - `Dataset.lunicname`
 82    - `Dataset.lunicrow`
 83    - `Dataset.primaryname`
 84    - `Dataset.setidx`
 85    - `Dataset.zip`
 86
 87    *dynamic value (getters @property)*
 88
 89    - `Dataset.keys`
 90    - `Dataset.iindex`
 91    - `Dataset.indexlen`
 92    - `Dataset.lenindex`
 93    - `Dataset.lname`
 94    - `Dataset.tiindex`
 95
 96    *global value (getters @property)*
 97
 98    - `Dataset.category`
 99    - `Dataset.complete`
100    - `Dataset.consistent`
101    - `Dataset.dimension`
102    - `Dataset.lencomplete`
103    - `Dataset.primary`
104    - `Dataset.secondary`
105
106    *selecting - infos methods (`observation.dataset_structure.DatasetStructure`)*
107
108    - `Dataset.couplingmatrix`
109    - `Dataset.idxrecord`
110    - `Dataset.indexinfos`
111    - `Dataset.indicator`
112    - `Dataset.iscanonorder`
113    - `Dataset.isinrecord`
114    - `Dataset.keytoval`
115    - `Dataset.loc`
116    - `Dataset.nindex`
117    - `Dataset.record`
118    - `Dataset.recidx`
119    - `Dataset.recvar`
120    - `Dataset.tree`
121    - `Dataset.valtokey`
122
123    *add - update methods (`observation.dataset_structure.DatasetStructure`)*
124
125    - `Dataset.add`
126    - `Dataset.addindex`
127    - `Dataset.append`
128    - `Dataset.delindex`
129    - `Dataset.delrecord`
130    - `Dataset.orindex`
131    - `Dataset.renameindex`
132    - `Dataset.setvar`
133    - `Dataset.setname`
134    - `Dataset.updateindex`
135
136    *structure management - methods (`observation.dataset_structure.DatasetStructure`)*
137
138    - `Dataset.applyfilter`
139    - `Dataset.coupling`
140    - `Dataset.full`
141    - `Dataset.getduplicates`
142    - `Dataset.mix`
143    - `Dataset.merging`
144    - `Dataset.reindex`
145    - `Dataset.reorder`
146    - `Dataset.setfilter`
147    - `Dataset.sort`
148    - `Dataset.swapindex`
149    - `Dataset.setcanonorder`
150    - `Dataset.tostdcodec`
151
152    *exports methods (`observation.dataset_interface.DatasetInterface`)*
153
154    - `Dataset.json`
155    - `Dataset.plot`
156    - `Dataset.to_obj`
157    - `Dataset.to_csv`
158    - `Dataset.to_dataframe`
159    - `Dataset.to_file`
160    - `Dataset.to_ntv`
161    - `Dataset.to_obj`
162    - `Dataset.to_xarray`
163    - `Dataset.view`
164    - `Dataset.vlist`
165    - `Dataset.voxel`
166    '''
167
168    field_class = None
169    
170    def __init__(self, listidx=None, reindex=True):
171        '''
172        Dataset constructor.
173
174        *Parameters*
175
176        - **listidx** :  list (default None) - list of Field data
177        - **reindex** : boolean (default True) - if True, default codec for each Field'''
178
179        self.name     = self.__class__.__name__
180        self.field    = self.field_class
181        self.analysis = Analysis(self)
182        self.lindex   = []
183        if listidx.__class__.__name__ in ['Dataset', 'Observation', 'Ndataset', 'Sdataset']:
184            self.lindex = [copy(idx) for idx in listidx.lindex]
185            return
186        if not listidx:
187            return
188        self.lindex   = listidx
189        if reindex:
190            self.reindex()
191        self.analysis.actualize()
192        return
193
194    """@classmethod
195    def dic(cls, idxdic=None, reindex=True):
196        '''
197        Dataset constructor (external dictionnary).
198
199        *Parameters*
200
201        - **idxdic** : {name : values}  (see data model)
202        if not idxdic:
203            return cls.ext(idxval=None, idxname=None, reindex=reindex)
204        if isinstance(idxdic, Dataset):
205            return idxdic
206        if not isinstance(idxdic, dict):
207            raise DatasetError("idxdic not dict")
208        return cls.ext(idxval=list(idxdic.values()), idxname=list(idxdic.keys()),
209                       reindex=reindex)"""
210
211    """@classmethod
212    def ext(cls, idxval=None, idxname=None, reindex=True):
213        '''
214        Dataset constructor (external index).
215
216        *Parameters*
217
218        - **idxval** : list of Field or list of values (see data model)
219        - **idxname** : list of string (default None) - list of Field name (see data model)
220        if idxval is None:
221            idxval = []
222        if not isinstance(idxval, list):
223            return None
224        val = [ [idx] if not isinstance(idx, list) else idx for idx in idxval]
225        lenval = [len(idx) for idx in val]
226        if lenval and max(lenval) != min(lenval):
227            raise DatasetError('the length of Field are different')
228        length = lenval[0] if lenval else 0
229        if idxname is None:
230            idxname = [None] * len(val)
231        for ind, name in enumerate(idxname):
232            if name is None or name == ES.defaultindex:
233                idxname[ind] = 'i'+str(ind)
234        lidx = [list(FieldInterface.decodeobj(
235            idx, typevalue, context=False)) for idx in val]
236        lindex = [Field(idx[2], name, list(range(length)), idx[1],
237                         lendefault=length, reindex=reindex)
238                  for idx, name in zip(lidx, idxname)]
239        return cls(lindex, reindex=False)"""
240
241    @classmethod
242    def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True,
243                 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}):
244        '''
245        Dataset constructor (from a csv file). Each column represents index values.
246
247        *Parameters*
248
249        - **filename** : string (default 'dataset.csv'), name of the file to read
250        - **header** : boolean (default True). If True, the first raw is dedicated to names
251        - **nrow** : integer (default None). Number of row. If None, all the row else nrow
252        - **optcsv** : dict (default : quoting) - see csv.reader options'''
253        if not optcsv:
254            optcsv = {}
255        if not nrow:
256            nrow = -1
257        with open(filename, newline='', encoding="utf-8") as file:
258            reader = csv.reader(file, **optcsv)
259            irow = 0
260            for row in reader:
261                if irow == nrow:
262                    break
263                if irow == 0:
264                    idxval = [[] for i in range(len(row))]
265                    idxname = [''] * len(row)
266                if irow == 0 and header:
267                    idxname = row
268                else:
269                    for i in range(len(row)):
270                        if decode_json:
271                            try:
272                                idxval[i].append(json.loads(row[i]))
273                            except:
274                                idxval[i].append(row[i])
275                        else:
276                            idxval[i].append(row[i])
277                irow += 1
278        lindex = [cls.field_class.from_ntv({name:idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)]
279        return cls(listidx=lindex, reindex=True)
280
281    @classmethod
282    def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
283        '''
284        Generate Object from file storage.
285
286         *Parameters*
287
288        - **filename** : string - file name (with path)
289        - **forcestring** : boolean (default False) - if True,
290        forces the UTF-8 data format, else the format is calculated
291        - **reindex** : boolean (default True) - if True, default codec for each Field
292        - **decode_str**: boolean (default False) - if True, string are loaded in json data
293
294        *Returns* : new Object'''
295        with open(filename, 'rb') as file:
296            btype = file.read(1)
297        if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring:
298            with open(filename, 'r', newline='', encoding="utf-8") as file:
299                bjson = file.read()
300        else:
301            with open(filename, 'rb') as file:
302                bjson = file.read()
303        return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)
304
305    """@classmethod
306    def obj(cls, bsd=None, reindex=True, context=True):
307        '''
308        Generate a new Object from a bytes, string or list value
309
310        *Parameters*
311
312        - **bsd** : bytes, string or list data to convert
313        - **reindex** : boolean (default True) - if True, default codec for each Field
314        - **context** : boolean (default True) - if False, only codec and keys are included'''
315        return cls.from_obj(bsd, reindex=reindex, context=context)"""
316
317    @classmethod
318    def ntv(cls, ntv_value, reindex=True):
319        '''Generate an Dataset Object from a ntv_value
320
321        *Parameters*
322
323        - **ntv_value** : bytes, string, Ntv object to convert
324        - **reindex** : boolean (default True) - if True, default codec for each Field'''
325        return cls.from_ntv(ntv_value, reindex=reindex)
326    
327    @classmethod
328    def from_ntv(cls, ntv_value, reindex=True, decode_str=False):
329        '''Generate an Dataset Object from a ntv_value
330
331        *Parameters*
332
333        - **ntv_value** : bytes, string, Ntv object to convert
334        - **reindex** : boolean (default True) - if True, default codec for each Field
335        - **decode_str**: boolean (default False) - if True, string are loaded in json data'''
336        ntv = Ntv.obj(ntv_value, decode_str=decode_str)
337        if len(ntv) == 0:
338            return cls()
339        lidx = [list(cls.field_class.decode_ntv(ntvf)) for ntvf in ntv]
340        leng = max([idx[6] for idx in lidx])
341        for ind in range(len(lidx)):
342            if lidx[ind][0] == '':
343                lidx[ind][0] = 'i'+str(ind)
344            NtvConnector.init_ntv_keys(ind, lidx, leng)
345            #Dataset._init_ntv_keys(ind, lidx, leng)
346        lindex = [cls.field_class(idx[2], idx[0], idx[4], None, # idx[1] pour le type,
347                     reindex=reindex) for idx in lidx]
348        return cls(lindex, reindex=reindex)
349
350    """@classmethod
351    def from_obj(cls, bsd=None, reindex=True, context=True):
352        '''
353        Generate an Dataset Object from a bytes, string or list value
354
355        *Parameters*
356
357        - **bsd** : bytes, string, DataFrame or list data to convert
358        - **reindex** : boolean (default True) - if True, default codec for each Field
359        - **context** : boolean (default True) - if False, only codec and keys are included'''
360        if isinstance(bsd, cls):
361            return bsd
362        if bsd is None:
363            bsd = []
364        if isinstance(bsd, bytes):
365            lis = cbor2.loads(bsd)
366        elif isinstance(bsd, str):
367            lis = json.loads(bsd, object_hook=CborDecoder().codecbor)
368        elif isinstance(bsd, (list, dict)) or bsd.__class__.__name__ == 'DataFrame':
369            lis = bsd
370        else:
371            raise DatasetError("the type of parameter is not available")
372        return cls._init_obj(lis, reindex=reindex, context=context)"""
373
374    def merge(self, fillvalue=math.nan, reindex=False, simplename=False):
375        '''
376        Merge method replaces Dataset objects included into its constituents.
377
378        *Parameters*
379
380        - **fillvalue** : object (default nan) - value used for the additional data
381        - **reindex** : boolean (default False) - if True, set default codec after transformation
382        - **simplename** : boolean (default False) - if True, new Field name are
383        the same as merged Field name else it is a composed name.
384
385        *Returns*: merged Dataset '''
386        ilc = copy(self)
387        delname = []
388        row = ilc[0]
389        if not isinstance(row, list):
390            row = [row]
391        merged, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
392                                                      simplename=simplename)
393        if oldname and not oldname in merged.lname:
394            delname.append(oldname)
395        for ind in range(1, len(ilc)):
396            oldidx = ilc.nindex(oldname)
397            for name in newname:
398                ilc.addindex(self.field(oldidx.codec, name, oldidx.keys))
399            row = ilc[ind]
400            if not isinstance(row, list):
401                row = [row]
402            rec, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
403                                                       simplename=simplename)
404            if oldname and newname != [oldname]:
405                delname.append(oldname)
406            for name in newname:
407                oldidx = merged.nindex(oldname)
408                fillval = self.field.s_to_i(fillvalue)
409                merged.addindex(
410                    self.field([fillval] * len(merged), name, oldidx.keys))
411            merged += rec
412        for name in set(delname):
413            if name:
414                merged.delindex(name)
415        if reindex:
416            merged.reindex()
417        ilc.lindex = merged.lindex
418        return ilc
419
420    @classmethod
421    def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
422        '''
423        Dataset constructor (external index).
424
425        *Parameters*
426
427        - **idxval** : list of Field or list of values (see data model)
428        - **idxname** : list of string (default None) - list of Field name (see data model)'''
429        if idxval is None:
430            idxval = []
431        if not isinstance(idxval, list):
432            return None
433        val = []
434        for idx in idxval:
435            if not isinstance(idx, list):
436                val.append([idx])
437            else:
438                val.append(idx)
439        lenval = [len(idx) for idx in val]
440        if lenval and max(lenval) != min(lenval):
441            raise DatasetError('the length of Iindex are different')
442        length = lenval[0] if lenval else 0
443        idxname = [None] * len(val) if idxname is None else idxname
444        for ind, name in enumerate(idxname):
445            if name is None or name == '$default':
446                idxname[ind] = 'i'+str(ind)
447        lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex,
448                                  fast=fast) for codec, name in zip(val, idxname)]
449        return cls(lindex, reindex=False)
450    
451# %% internal
452
453    """@staticmethod
454    def _init_ntv_keys(ind, lidx, leng):
455        ''' initialization of explicit keys data in lidx object'''
456        # name: 0, type: 1, codec: 2, parent: 3, keys: 4, coef: 5, leng: 6
457        name, typ, codec, parent, keys, coef, length = lidx[ind]
458        if (keys, parent, coef) == (None, None, None):  # full or unique
459            if len(codec) == 1: # unique
460                lidx[ind][4] = [0] * leng
461            elif len(codec) == leng:    # full
462                lidx[ind][4] = list(range(leng))
463            else:
464                raise DatasetError('impossible to generate keys')
465            return
466        if keys and len(keys) > 1 and parent is None:  #complete
467            return
468        if coef:  #primary
469            lidx[ind][4] = [(ikey % (coef * len(codec))) // coef for ikey in range(leng)]
470            lidx[ind][3] = None
471            return  
472        if parent is None:
473            raise DatasetError('keys not referenced')          
474        if not lidx[parent][4] or len(lidx[parent][4]) != leng:
475            Dataset._init_ntv_keys(parent, lidx, leng)
476        if not keys and len(codec) == len(lidx[parent][2]):    # implicit
477            lidx[ind][4] = lidx[parent][4]
478            lidx[ind][3] = None
479            return
480        lidx[ind][4] = Nfield.keysfromderkeys(lidx[parent][4], keys)  # relative
481        lidx[ind][3] = None
482        return"""
483
484    @staticmethod
485    def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False):
486        #row = rec[0] if isinstance(rec, list) else rec
487        row = rec[0]
488        if not isinstance(row, list):
489            row = [row]
490        var = -1
491        for ind, val in enumerate(row):
492            if val.__class__.__name__ in ['Sdataset', 'Ndataset', 'Observation']:
493                var = ind
494                break
495        if var < 0:
496            return (rec, None, [])
497        ilis = row[var]
498        oldname = rec.lname[var]
499        if ilis.lname == ['i0']:
500            newname = [oldname]
501            ilis.setname(newname)
502        elif not simplename:
503            newname = [oldname + '_' + name for name in ilis.lname]
504            ilis.setname(newname)
505        else:
506            newname = copy(ilis.lname)
507        for name in rec.lname:
508            if name in newname:
509                newname.remove(name)
510            else:
511                updidx = name in ilis.lname and not updateidx
512                ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)},
513                              merge=mergeidx, update=updidx)
514                #ilis.addindex([name, [rec.nindex(name)[0]] * len(ilis)],
515                #              merge=mergeidx, update=updidx)
516        return (ilis, oldname, newname)
517
518# %% special
519    def __str__(self):
520        '''return string format for var and lidx'''
521        stri = ''
522        if self.lvar:
523            stri += 'variables :\n'
524            for idx in self.lvar:
525                stri += '    ' + str(idx) + '\n'
526        if self.lidx:
527            stri += 'index :\n'
528            for idx in self.lidx:
529                stri += '    ' + str(idx) + '\n'
530        return stri
531
532    def __repr__(self):
533        '''return classname, number of value and number of indexes'''
534        return self.__class__.__name__ + '[' + str(len(self)) + ', ' + str(self.lenindex) + ']'
535
536    def __len__(self):
537        ''' len of values'''
538        if not self.lindex:
539            return 0
540        return len(self.lindex[0])
541
542    def __contains__(self, item):
543        ''' list of lindex values'''
544        return item in self.lindex
545
546    def __getitem__(self, ind):
547        ''' return value record (value conversion)'''
548        res = [idx[ind] for idx in self.lindex]
549        if len(res) == 1:
550            return res[0]
551        return res
552
553    def __setitem__(self, ind, item):
554        ''' modify the Field values for each Field at the row ind'''
555        if not isinstance(item, list):
556            item = [item]
557        for val, idx in zip(item, self.lindex):
558            idx[ind] = val
559
560    def __delitem__(self, ind):
561        ''' remove all Field item at the row ind'''
562        for idx in self.lindex:
563            del idx[ind]
564
565    def __hash__(self):
566        '''return sum of all hash(Field)'''
567        return sum([hash(idx) for idx in self.lindex])
568
569    def _hashi(self):
570        '''return sum of all hashi(Field)'''
571        return sum([idx._hashi() for idx in self.lindex])
572
573    def __eq__(self, other):
574        ''' equal if hash values are equal'''
575        return hash(self) == hash(other)
576
577    def __add__(self, other):
578        ''' Add other's values to self's values in a new Dataset'''
579        newil = copy(self)
580        newil.__iadd__(other)
581        return newil
582
583    def __iadd__(self, other):
584        ''' Add other's values to self's values'''
585        return self.add(other, name=True, solve=False)
586
587    def __or__(self, other):
588        ''' Add other's index to self's index in a new Dataset'''
589        newil = copy(self)
590        newil.__ior__(other)
591        return newil
592
593    def __ior__(self, other):
594        ''' Add other's index to self's index'''
595        return self.orindex(other, first=False, merge=True, update=False)
596
597    def __copy__(self):
598        ''' Copy all the data '''
599        return self.__class__(self)
600
601# %% property
602    @property
603    def complete(self):
604        '''return a boolean (True if Dataset is complete and consistent)'''
605        return self.lencomplete == len(self) and self.consistent
606
607    @property
608    def consistent(self):
609        ''' True if all the record are different'''
610        if not self.iidx:
611            return True
612        return max(Counter(zip(*self.iidx)).values()) == 1
613
614    @property
615    def category(self):
616        ''' dict with category for each Field'''
617        return {field['name']: field['cat'] for field in self.indexinfos()}
618
619    @property
620    def dimension(self):
621        ''' integer : number of primary Field'''
622        return len(self.primary)
623
624    @property
625    def extidx(self):
626        '''idx values (see data model)'''
627        return [idx.values for idx in self.lidx]
628
629    @property
630    def extidxext(self):
631        '''idx val (see data model)'''
632        return [idx.val for idx in self.lidx]
633
634    @property
635    def groups(self):
636        ''' list with crossed Field groups'''
637        return self.analysis.getgroups()
638
639    @property
640    def idxname(self):
641        ''' list of idx name'''
642        return [idx.name for idx in self.lidx]
643
644    @property
645    def idxlen(self):
646        ''' list of idx codec length'''
647        return [len(idx.codec) for idx in self.lidx]
648
649    @property
650    def indexlen(self):
651        ''' list of index codec length'''
652        return [len(idx.codec) for idx in self.lindex]
653
654    @property
655    def iidx(self):
656        ''' list of keys for each idx'''
657        return [idx.keys for idx in self.lidx]
658
659    @property
660    def iindex(self):
661        ''' list of keys for each index'''
662        return [idx.keys for idx in self.lindex]
663
664    @property
665    def keys(self):
666        ''' list of keys for each index'''
667        return [idx.keys for idx in self.lindex]
668
669    @property
670    def lencomplete(self):
671        '''number of values if complete (prod(idxlen primary))'''
672        primary = self.primary
673        return util.mul([self.idxlen[i] for i in primary])
674
675    @property
676    def lenindex(self):
677        ''' number of indexes'''
678        return len(self.lindex)
679
680    @property
681    def lenidx(self):
682        ''' number of idx'''
683        return len(self.lidx)
684
685    @property
686    def lidx(self):
687        '''list of idx'''
688        return [self.lindex[i] for i in self.lidxrow]
689
690    @property
691    def lisvar(self):
692        '''list of boolean : True if Field is var'''
693        return [name in self.lvarname for name in self.lname]
694
695    @property
696    def lvar(self):
697        '''list of var'''
698        return [self.lindex[i] for i in self.lvarrow]
699
700    @property
701    def lvarname(self):
702        ''' list of variable Field name'''
703        return self.analysis.getvarname()
704
705    @property
706    def lunicrow(self):
707        '''list of unic idx row'''
708        return [self.lname.index(name) for name in self.lunicname]
709
710    @property
711    def lvarrow(self):
712        '''list of var row'''
713        return [self.lname.index(name) for name in self.lvarname]
714
715    @property
716    def lidxrow(self):
717        '''list of idx row'''
718        return [i for i in range(self.lenindex) if i not in self.lvarrow]
719
720    @property
721    def lunicname(self):
722        ''' list of unique index name'''
723        return [idx.name for idx in self.lindex if len(idx.codec) == 1]
724
725    @property
726    def lname(self):
727        ''' list of index name'''
728        return [idx.name for idx in self.lindex]
729
730    @property
731    def primary(self):
732        ''' list of primary idx'''
733        return self.analysis.getprimary()
734
735    @property
736    def primaryname(self):
737        ''' list of primary name'''
738        return [self.lidx[idx].name for idx in self.primary]
739
740    @property
741    def secondary(self):
742        ''' list of secondary idx'''
743        return self.analysis.getsecondary()
744
745    @property
746    def secondaryname(self):
747        ''' list of secondary name'''
748        return [self.lindex[idx].name for idx in self.secondary]
749
750    @property
751    def setidx(self):
752        '''list of codec for each idx'''
753        return [idx.codec for idx in self.lidx]
754
755    @property
756    def tiindex(self):
757        ''' list of keys for each record'''
758        return util.list(list(zip(*self.iindex)))
759
760    @property
761    def zip(self):
762        '''return a zip format for transpose(extidx) : tuple(tuple(rec))'''
763        textidx = util.transpose(self.extidx)
764        if not textidx:
765            return None
766        return tuple(tuple(idx) for idx in textidx)
class Dataset(observation.dataset_structure.DatasetStructure, observation.dataset_interface.DatasetInterface, abc.ABC):
 43class Dataset(DatasetStructure, DatasetInterface, ABC):
 44    # %% intro
 45    '''
 46    An `Dataset` is a representation of an indexed list.
 47
 48    *Attributes (for @property see methods)* :
 49
 50    - **lindex** : list of Field
 51    - **analysis** : Analysis object (data structure)
 52
 53    The methods defined in this class are :
 54
 55    *constructor (@classmethod))*
 56
 57    - `Dataset.ntv`
 58    - `Dataset.from_csv`
 59    - `Dataset.from_ntv`
 60    - `Dataset.from_file`
 61    - `Dataset.merge`
 62
 63    *abstract static methods (@abstractmethod, @staticmethod)*
 64
 65    - `Dataset.field_class`
 66    
 67    *dynamic value - module analysis (getters @property)*
 68
 69    - `Dataset.extidx`
 70    - `Dataset.extidxext`
 71    - `Dataset.groups`
 72    - `Dataset.idxname`
 73    - `Dataset.idxlen`
 74    - `Dataset.iidx`
 75    - `Dataset.lenidx`
 76    - `Dataset.lidx`
 77    - `Dataset.lidxrow`
 78    - `Dataset.lisvar`
 79    - `Dataset.lvar`
 80    - `Dataset.lvarname`
 81    - `Dataset.lvarrow`
 82    - `Dataset.lunicname`
 83    - `Dataset.lunicrow`
 84    - `Dataset.primaryname`
 85    - `Dataset.setidx`
 86    - `Dataset.zip`
 87
 88    *dynamic value (getters @property)*
 89
 90    - `Dataset.keys`
 91    - `Dataset.iindex`
 92    - `Dataset.indexlen`
 93    - `Dataset.lenindex`
 94    - `Dataset.lname`
 95    - `Dataset.tiindex`
 96
 97    *global value (getters @property)*
 98
 99    - `Dataset.category`
100    - `Dataset.complete`
101    - `Dataset.consistent`
102    - `Dataset.dimension`
103    - `Dataset.lencomplete`
104    - `Dataset.primary`
105    - `Dataset.secondary`
106
107    *selecting - infos methods (`observation.dataset_structure.DatasetStructure`)*
108
109    - `Dataset.couplingmatrix`
110    - `Dataset.idxrecord`
111    - `Dataset.indexinfos`
112    - `Dataset.indicator`
113    - `Dataset.iscanonorder`
114    - `Dataset.isinrecord`
115    - `Dataset.keytoval`
116    - `Dataset.loc`
117    - `Dataset.nindex`
118    - `Dataset.record`
119    - `Dataset.recidx`
120    - `Dataset.recvar`
121    - `Dataset.tree`
122    - `Dataset.valtokey`
123
124    *add - update methods (`observation.dataset_structure.DatasetStructure`)*
125
126    - `Dataset.add`
127    - `Dataset.addindex`
128    - `Dataset.append`
129    - `Dataset.delindex`
130    - `Dataset.delrecord`
131    - `Dataset.orindex`
132    - `Dataset.renameindex`
133    - `Dataset.setvar`
134    - `Dataset.setname`
135    - `Dataset.updateindex`
136
137    *structure management - methods (`observation.dataset_structure.DatasetStructure`)*
138
139    - `Dataset.applyfilter`
140    - `Dataset.coupling`
141    - `Dataset.full`
142    - `Dataset.getduplicates`
143    - `Dataset.mix`
144    - `Dataset.merging`
145    - `Dataset.reindex`
146    - `Dataset.reorder`
147    - `Dataset.setfilter`
148    - `Dataset.sort`
149    - `Dataset.swapindex`
150    - `Dataset.setcanonorder`
151    - `Dataset.tostdcodec`
152
153    *exports methods (`observation.dataset_interface.DatasetInterface`)*
154
155    - `Dataset.json`
156    - `Dataset.plot`
157    - `Dataset.to_obj`
158    - `Dataset.to_csv`
159    - `Dataset.to_dataframe`
160    - `Dataset.to_file`
161    - `Dataset.to_ntv`
162    - `Dataset.to_obj`
163    - `Dataset.to_xarray`
164    - `Dataset.view`
165    - `Dataset.vlist`
166    - `Dataset.voxel`
167    '''
168
169    field_class = None
170    
171    def __init__(self, listidx=None, reindex=True):
172        '''
173        Dataset constructor.
174
175        *Parameters*
176
177        - **listidx** :  list (default None) - list of Field data
178        - **reindex** : boolean (default True) - if True, default codec for each Field'''
179
180        self.name     = self.__class__.__name__
181        self.field    = self.field_class
182        self.analysis = Analysis(self)
183        self.lindex   = []
184        if listidx.__class__.__name__ in ['Dataset', 'Observation', 'Ndataset', 'Sdataset']:
185            self.lindex = [copy(idx) for idx in listidx.lindex]
186            return
187        if not listidx:
188            return
189        self.lindex   = listidx
190        if reindex:
191            self.reindex()
192        self.analysis.actualize()
193        return
194
195    """@classmethod
196    def dic(cls, idxdic=None, reindex=True):
197        '''
198        Dataset constructor (external dictionnary).
199
200        *Parameters*
201
202        - **idxdic** : {name : values}  (see data model)
203        if not idxdic:
204            return cls.ext(idxval=None, idxname=None, reindex=reindex)
205        if isinstance(idxdic, Dataset):
206            return idxdic
207        if not isinstance(idxdic, dict):
208            raise DatasetError("idxdic not dict")
209        return cls.ext(idxval=list(idxdic.values()), idxname=list(idxdic.keys()),
210                       reindex=reindex)"""
211
212    """@classmethod
213    def ext(cls, idxval=None, idxname=None, reindex=True):
214        '''
215        Dataset constructor (external index).
216
217        *Parameters*
218
219        - **idxval** : list of Field or list of values (see data model)
220        - **idxname** : list of string (default None) - list of Field name (see data model)
221        if idxval is None:
222            idxval = []
223        if not isinstance(idxval, list):
224            return None
225        val = [ [idx] if not isinstance(idx, list) else idx for idx in idxval]
226        lenval = [len(idx) for idx in val]
227        if lenval and max(lenval) != min(lenval):
228            raise DatasetError('the length of Field are different')
229        length = lenval[0] if lenval else 0
230        if idxname is None:
231            idxname = [None] * len(val)
232        for ind, name in enumerate(idxname):
233            if name is None or name == ES.defaultindex:
234                idxname[ind] = 'i'+str(ind)
235        lidx = [list(FieldInterface.decodeobj(
236            idx, typevalue, context=False)) for idx in val]
237        lindex = [Field(idx[2], name, list(range(length)), idx[1],
238                         lendefault=length, reindex=reindex)
239                  for idx, name in zip(lidx, idxname)]
240        return cls(lindex, reindex=False)"""
241
242    @classmethod
243    def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True,
244                 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}):
245        '''
246        Dataset constructor (from a csv file). Each column represents index values.
247
248        *Parameters*
249
250        - **filename** : string (default 'dataset.csv'), name of the file to read
251        - **header** : boolean (default True). If True, the first raw is dedicated to names
252        - **nrow** : integer (default None). Number of row. If None, all the row else nrow
253        - **optcsv** : dict (default : quoting) - see csv.reader options'''
254        if not optcsv:
255            optcsv = {}
256        if not nrow:
257            nrow = -1
258        with open(filename, newline='', encoding="utf-8") as file:
259            reader = csv.reader(file, **optcsv)
260            irow = 0
261            for row in reader:
262                if irow == nrow:
263                    break
264                if irow == 0:
265                    idxval = [[] for i in range(len(row))]
266                    idxname = [''] * len(row)
267                if irow == 0 and header:
268                    idxname = row
269                else:
270                    for i in range(len(row)):
271                        if decode_json:
272                            try:
273                                idxval[i].append(json.loads(row[i]))
274                            except:
275                                idxval[i].append(row[i])
276                        else:
277                            idxval[i].append(row[i])
278                irow += 1
279        lindex = [cls.field_class.from_ntv({name:idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)]
280        return cls(listidx=lindex, reindex=True)
281
282    @classmethod
283    def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
284        '''
285        Generate Object from file storage.
286
287         *Parameters*
288
289        - **filename** : string - file name (with path)
290        - **forcestring** : boolean (default False) - if True,
291        forces the UTF-8 data format, else the format is calculated
292        - **reindex** : boolean (default True) - if True, default codec for each Field
293        - **decode_str**: boolean (default False) - if True, string are loaded in json data
294
295        *Returns* : new Object'''
296        with open(filename, 'rb') as file:
297            btype = file.read(1)
298        if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring:
299            with open(filename, 'r', newline='', encoding="utf-8") as file:
300                bjson = file.read()
301        else:
302            with open(filename, 'rb') as file:
303                bjson = file.read()
304        return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)
305
306    """@classmethod
307    def obj(cls, bsd=None, reindex=True, context=True):
308        '''
309        Generate a new Object from a bytes, string or list value
310
311        *Parameters*
312
313        - **bsd** : bytes, string or list data to convert
314        - **reindex** : boolean (default True) - if True, default codec for each Field
315        - **context** : boolean (default True) - if False, only codec and keys are included'''
316        return cls.from_obj(bsd, reindex=reindex, context=context)"""
317
318    @classmethod
319    def ntv(cls, ntv_value, reindex=True):
320        '''Generate an Dataset Object from a ntv_value
321
322        *Parameters*
323
324        - **ntv_value** : bytes, string, Ntv object to convert
325        - **reindex** : boolean (default True) - if True, default codec for each Field'''
326        return cls.from_ntv(ntv_value, reindex=reindex)
327    
328    @classmethod
329    def from_ntv(cls, ntv_value, reindex=True, decode_str=False):
330        '''Generate an Dataset Object from a ntv_value
331
332        *Parameters*
333
334        - **ntv_value** : bytes, string, Ntv object to convert
335        - **reindex** : boolean (default True) - if True, default codec for each Field
336        - **decode_str**: boolean (default False) - if True, string are loaded in json data'''
337        ntv = Ntv.obj(ntv_value, decode_str=decode_str)
338        if len(ntv) == 0:
339            return cls()
340        lidx = [list(cls.field_class.decode_ntv(ntvf)) for ntvf in ntv]
341        leng = max([idx[6] for idx in lidx])
342        for ind in range(len(lidx)):
343            if lidx[ind][0] == '':
344                lidx[ind][0] = 'i'+str(ind)
345            NtvConnector.init_ntv_keys(ind, lidx, leng)
346            #Dataset._init_ntv_keys(ind, lidx, leng)
347        lindex = [cls.field_class(idx[2], idx[0], idx[4], None, # idx[1] pour le type,
348                     reindex=reindex) for idx in lidx]
349        return cls(lindex, reindex=reindex)
350
351    """@classmethod
352    def from_obj(cls, bsd=None, reindex=True, context=True):
353        '''
354        Generate an Dataset Object from a bytes, string or list value
355
356        *Parameters*
357
358        - **bsd** : bytes, string, DataFrame or list data to convert
359        - **reindex** : boolean (default True) - if True, default codec for each Field
360        - **context** : boolean (default True) - if False, only codec and keys are included'''
361        if isinstance(bsd, cls):
362            return bsd
363        if bsd is None:
364            bsd = []
365        if isinstance(bsd, bytes):
366            lis = cbor2.loads(bsd)
367        elif isinstance(bsd, str):
368            lis = json.loads(bsd, object_hook=CborDecoder().codecbor)
369        elif isinstance(bsd, (list, dict)) or bsd.__class__.__name__ == 'DataFrame':
370            lis = bsd
371        else:
372            raise DatasetError("the type of parameter is not available")
373        return cls._init_obj(lis, reindex=reindex, context=context)"""
374
375    def merge(self, fillvalue=math.nan, reindex=False, simplename=False):
376        '''
377        Merge method replaces Dataset objects included into its constituents.
378
379        *Parameters*
380
381        - **fillvalue** : object (default nan) - value used for the additional data
382        - **reindex** : boolean (default False) - if True, set default codec after transformation
383        - **simplename** : boolean (default False) - if True, new Field name are
384        the same as merged Field name else it is a composed name.
385
386        *Returns*: merged Dataset '''
387        ilc = copy(self)
388        delname = []
389        row = ilc[0]
390        if not isinstance(row, list):
391            row = [row]
392        merged, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
393                                                      simplename=simplename)
394        if oldname and not oldname in merged.lname:
395            delname.append(oldname)
396        for ind in range(1, len(ilc)):
397            oldidx = ilc.nindex(oldname)
398            for name in newname:
399                ilc.addindex(self.field(oldidx.codec, name, oldidx.keys))
400            row = ilc[ind]
401            if not isinstance(row, list):
402                row = [row]
403            rec, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
404                                                       simplename=simplename)
405            if oldname and newname != [oldname]:
406                delname.append(oldname)
407            for name in newname:
408                oldidx = merged.nindex(oldname)
409                fillval = self.field.s_to_i(fillvalue)
410                merged.addindex(
411                    self.field([fillval] * len(merged), name, oldidx.keys))
412            merged += rec
413        for name in set(delname):
414            if name:
415                merged.delindex(name)
416        if reindex:
417            merged.reindex()
418        ilc.lindex = merged.lindex
419        return ilc
420
421    @classmethod
422    def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
423        '''
424        Dataset constructor (external index).
425
426        *Parameters*
427
428        - **idxval** : list of Field or list of values (see data model)
429        - **idxname** : list of string (default None) - list of Field name (see data model)'''
430        if idxval is None:
431            idxval = []
432        if not isinstance(idxval, list):
433            return None
434        val = []
435        for idx in idxval:
436            if not isinstance(idx, list):
437                val.append([idx])
438            else:
439                val.append(idx)
440        lenval = [len(idx) for idx in val]
441        if lenval and max(lenval) != min(lenval):
442            raise DatasetError('the length of Iindex are different')
443        length = lenval[0] if lenval else 0
444        idxname = [None] * len(val) if idxname is None else idxname
445        for ind, name in enumerate(idxname):
446            if name is None or name == '$default':
447                idxname[ind] = 'i'+str(ind)
448        lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex,
449                                  fast=fast) for codec, name in zip(val, idxname)]
450        return cls(lindex, reindex=False)
451    
452# %% internal
453
454    """@staticmethod
455    def _init_ntv_keys(ind, lidx, leng):
456        ''' initialization of explicit keys data in lidx object'''
457        # name: 0, type: 1, codec: 2, parent: 3, keys: 4, coef: 5, leng: 6
458        name, typ, codec, parent, keys, coef, length = lidx[ind]
459        if (keys, parent, coef) == (None, None, None):  # full or unique
460            if len(codec) == 1: # unique
461                lidx[ind][4] = [0] * leng
462            elif len(codec) == leng:    # full
463                lidx[ind][4] = list(range(leng))
464            else:
465                raise DatasetError('impossible to generate keys')
466            return
467        if keys and len(keys) > 1 and parent is None:  #complete
468            return
469        if coef:  #primary
470            lidx[ind][4] = [(ikey % (coef * len(codec))) // coef for ikey in range(leng)]
471            lidx[ind][3] = None
472            return  
473        if parent is None:
474            raise DatasetError('keys not referenced')          
475        if not lidx[parent][4] or len(lidx[parent][4]) != leng:
476            Dataset._init_ntv_keys(parent, lidx, leng)
477        if not keys and len(codec) == len(lidx[parent][2]):    # implicit
478            lidx[ind][4] = lidx[parent][4]
479            lidx[ind][3] = None
480            return
481        lidx[ind][4] = Nfield.keysfromderkeys(lidx[parent][4], keys)  # relative
482        lidx[ind][3] = None
483        return"""
484
485    @staticmethod
486    def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False):
487        #row = rec[0] if isinstance(rec, list) else rec
488        row = rec[0]
489        if not isinstance(row, list):
490            row = [row]
491        var = -1
492        for ind, val in enumerate(row):
493            if val.__class__.__name__ in ['Sdataset', 'Ndataset', 'Observation']:
494                var = ind
495                break
496        if var < 0:
497            return (rec, None, [])
498        ilis = row[var]
499        oldname = rec.lname[var]
500        if ilis.lname == ['i0']:
501            newname = [oldname]
502            ilis.setname(newname)
503        elif not simplename:
504            newname = [oldname + '_' + name for name in ilis.lname]
505            ilis.setname(newname)
506        else:
507            newname = copy(ilis.lname)
508        for name in rec.lname:
509            if name in newname:
510                newname.remove(name)
511            else:
512                updidx = name in ilis.lname and not updateidx
513                ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)},
514                              merge=mergeidx, update=updidx)
515                #ilis.addindex([name, [rec.nindex(name)[0]] * len(ilis)],
516                #              merge=mergeidx, update=updidx)
517        return (ilis, oldname, newname)
518
519# %% special
520    def __str__(self):
521        '''return string format for var and lidx'''
522        stri = ''
523        if self.lvar:
524            stri += 'variables :\n'
525            for idx in self.lvar:
526                stri += '    ' + str(idx) + '\n'
527        if self.lidx:
528            stri += 'index :\n'
529            for idx in self.lidx:
530                stri += '    ' + str(idx) + '\n'
531        return stri
532
533    def __repr__(self):
534        '''return classname, number of value and number of indexes'''
535        return self.__class__.__name__ + '[' + str(len(self)) + ', ' + str(self.lenindex) + ']'
536
537    def __len__(self):
538        ''' len of values'''
539        if not self.lindex:
540            return 0
541        return len(self.lindex[0])
542
543    def __contains__(self, item):
544        ''' list of lindex values'''
545        return item in self.lindex
546
547    def __getitem__(self, ind):
548        ''' return value record (value conversion)'''
549        res = [idx[ind] for idx in self.lindex]
550        if len(res) == 1:
551            return res[0]
552        return res
553
554    def __setitem__(self, ind, item):
555        ''' modify the Field values for each Field at the row ind'''
556        if not isinstance(item, list):
557            item = [item]
558        for val, idx in zip(item, self.lindex):
559            idx[ind] = val
560
561    def __delitem__(self, ind):
562        ''' remove all Field item at the row ind'''
563        for idx in self.lindex:
564            del idx[ind]
565
566    def __hash__(self):
567        '''return sum of all hash(Field)'''
568        return sum([hash(idx) for idx in self.lindex])
569
570    def _hashi(self):
571        '''return sum of all hashi(Field)'''
572        return sum([idx._hashi() for idx in self.lindex])
573
574    def __eq__(self, other):
575        ''' equal if hash values are equal'''
576        return hash(self) == hash(other)
577
578    def __add__(self, other):
579        ''' Add other's values to self's values in a new Dataset'''
580        newil = copy(self)
581        newil.__iadd__(other)
582        return newil
583
584    def __iadd__(self, other):
585        ''' Add other's values to self's values'''
586        return self.add(other, name=True, solve=False)
587
588    def __or__(self, other):
589        ''' Add other's index to self's index in a new Dataset'''
590        newil = copy(self)
591        newil.__ior__(other)
592        return newil
593
594    def __ior__(self, other):
595        ''' Add other's index to self's index'''
596        return self.orindex(other, first=False, merge=True, update=False)
597
598    def __copy__(self):
599        ''' Copy all the data '''
600        return self.__class__(self)
601
602# %% property
603    @property
604    def complete(self):
605        '''return a boolean (True if Dataset is complete and consistent)'''
606        return self.lencomplete == len(self) and self.consistent
607
608    @property
609    def consistent(self):
610        ''' True if all the record are different'''
611        if not self.iidx:
612            return True
613        return max(Counter(zip(*self.iidx)).values()) == 1
614
615    @property
616    def category(self):
617        ''' dict with category for each Field'''
618        return {field['name']: field['cat'] for field in self.indexinfos()}
619
620    @property
621    def dimension(self):
622        ''' integer : number of primary Field'''
623        return len(self.primary)
624
625    @property
626    def extidx(self):
627        '''idx values (see data model)'''
628        return [idx.values for idx in self.lidx]
629
630    @property
631    def extidxext(self):
632        '''idx val (see data model)'''
633        return [idx.val for idx in self.lidx]
634
635    @property
636    def groups(self):
637        ''' list with crossed Field groups'''
638        return self.analysis.getgroups()
639
640    @property
641    def idxname(self):
642        ''' list of idx name'''
643        return [idx.name for idx in self.lidx]
644
645    @property
646    def idxlen(self):
647        ''' list of idx codec length'''
648        return [len(idx.codec) for idx in self.lidx]
649
650    @property
651    def indexlen(self):
652        ''' list of index codec length'''
653        return [len(idx.codec) for idx in self.lindex]
654
655    @property
656    def iidx(self):
657        ''' list of keys for each idx'''
658        return [idx.keys for idx in self.lidx]
659
660    @property
661    def iindex(self):
662        ''' list of keys for each index'''
663        return [idx.keys for idx in self.lindex]
664
665    @property
666    def keys(self):
667        ''' list of keys for each index'''
668        return [idx.keys for idx in self.lindex]
669
670    @property
671    def lencomplete(self):
672        '''number of values if complete (prod(idxlen primary))'''
673        primary = self.primary
674        return util.mul([self.idxlen[i] for i in primary])
675
676    @property
677    def lenindex(self):
678        ''' number of indexes'''
679        return len(self.lindex)
680
681    @property
682    def lenidx(self):
683        ''' number of idx'''
684        return len(self.lidx)
685
686    @property
687    def lidx(self):
688        '''list of idx'''
689        return [self.lindex[i] for i in self.lidxrow]
690
691    @property
692    def lisvar(self):
693        '''list of boolean : True if Field is var'''
694        return [name in self.lvarname for name in self.lname]
695
696    @property
697    def lvar(self):
698        '''list of var'''
699        return [self.lindex[i] for i in self.lvarrow]
700
701    @property
702    def lvarname(self):
703        ''' list of variable Field name'''
704        return self.analysis.getvarname()
705
706    @property
707    def lunicrow(self):
708        '''list of unic idx row'''
709        return [self.lname.index(name) for name in self.lunicname]
710
711    @property
712    def lvarrow(self):
713        '''list of var row'''
714        return [self.lname.index(name) for name in self.lvarname]
715
716    @property
717    def lidxrow(self):
718        '''list of idx row'''
719        return [i for i in range(self.lenindex) if i not in self.lvarrow]
720
721    @property
722    def lunicname(self):
723        ''' list of unique index name'''
724        return [idx.name for idx in self.lindex if len(idx.codec) == 1]
725
726    @property
727    def lname(self):
728        ''' list of index name'''
729        return [idx.name for idx in self.lindex]
730
731    @property
732    def primary(self):
733        ''' list of primary idx'''
734        return self.analysis.getprimary()
735
736    @property
737    def primaryname(self):
738        ''' list of primary name'''
739        return [self.lidx[idx].name for idx in self.primary]
740
741    @property
742    def secondary(self):
743        ''' list of secondary idx'''
744        return self.analysis.getsecondary()
745
746    @property
747    def secondaryname(self):
748        ''' list of secondary name'''
749        return [self.lindex[idx].name for idx in self.secondary]
750
751    @property
752    def setidx(self):
753        '''list of codec for each idx'''
754        return [idx.codec for idx in self.lidx]
755
756    @property
757    def tiindex(self):
758        ''' list of keys for each record'''
759        return util.list(list(zip(*self.iindex)))
760
761    @property
762    def zip(self):
763        '''return a zip format for transpose(extidx) : tuple(tuple(rec))'''
764        textidx = util.transpose(self.extidx)
765        if not textidx:
766            return None
767        return tuple(tuple(idx) for idx in textidx)

An Dataset is a representation of an indexed list.

Attributes (for @property see methods) :

  • lindex : list of Field
  • analysis : Analysis object (data structure)

The methods defined in this class are :

constructor (@classmethod))

abstract static methods (@abstractmethod, @staticmethod)

dynamic value - module analysis (getters @property)

dynamic value (getters @property)

global value (getters @property)

selecting - infos methods (observation.dataset_structure.DatasetStructure)

add - update methods (observation.dataset_structure.DatasetStructure)

structure management - methods (observation.dataset_structure.DatasetStructure)

exports methods (observation.dataset_interface.DatasetInterface)

Dataset(listidx=None, reindex=True)
171    def __init__(self, listidx=None, reindex=True):
172        '''
173        Dataset constructor.
174
175        *Parameters*
176
177        - **listidx** :  list (default None) - list of Field data
178        - **reindex** : boolean (default True) - if True, default codec for each Field'''
179
180        self.name     = self.__class__.__name__
181        self.field    = self.field_class
182        self.analysis = Analysis(self)
183        self.lindex   = []
184        if listidx.__class__.__name__ in ['Dataset', 'Observation', 'Ndataset', 'Sdataset']:
185            self.lindex = [copy(idx) for idx in listidx.lindex]
186            return
187        if not listidx:
188            return
189        self.lindex   = listidx
190        if reindex:
191            self.reindex()
192        self.analysis.actualize()
193        return

Dataset constructor.

Parameters

  • listidx : list (default None) - list of Field data
  • reindex : boolean (default True) - if True, default codec for each Field
field_class = None
name
field
analysis
lindex
@classmethod
def from_csv( cls, filename='dataset.csv', header=True, nrow=None, decode_str=True, decode_json=True, optcsv={'quoting': 2}):
242    @classmethod
243    def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True,
244                 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}):
245        '''
246        Dataset constructor (from a csv file). Each column represents index values.
247
248        *Parameters*
249
250        - **filename** : string (default 'dataset.csv'), name of the file to read
251        - **header** : boolean (default True). If True, the first raw is dedicated to names
252        - **nrow** : integer (default None). Number of row. If None, all the row else nrow
253        - **optcsv** : dict (default : quoting) - see csv.reader options'''
254        if not optcsv:
255            optcsv = {}
256        if not nrow:
257            nrow = -1
258        with open(filename, newline='', encoding="utf-8") as file:
259            reader = csv.reader(file, **optcsv)
260            irow = 0
261            for row in reader:
262                if irow == nrow:
263                    break
264                if irow == 0:
265                    idxval = [[] for i in range(len(row))]
266                    idxname = [''] * len(row)
267                if irow == 0 and header:
268                    idxname = row
269                else:
270                    for i in range(len(row)):
271                        if decode_json:
272                            try:
273                                idxval[i].append(json.loads(row[i]))
274                            except:
275                                idxval[i].append(row[i])
276                        else:
277                            idxval[i].append(row[i])
278                irow += 1
279        lindex = [cls.field_class.from_ntv({name:idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)]
280        return cls(listidx=lindex, reindex=True)

Dataset constructor (from a csv file). Each column represents index values.

Parameters

  • filename : string (default 'dataset.csv'), name of the file to read
  • header : boolean (default True). If True, the first raw is dedicated to names
  • nrow : integer (default None). Number of row. If None, all the row else nrow
  • optcsv : dict (default : quoting) - see csv.reader options
@classmethod
def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
282    @classmethod
283    def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
284        '''
285        Generate Object from file storage.
286
287         *Parameters*
288
289        - **filename** : string - file name (with path)
290        - **forcestring** : boolean (default False) - if True,
291        forces the UTF-8 data format, else the format is calculated
292        - **reindex** : boolean (default True) - if True, default codec for each Field
293        - **decode_str**: boolean (default False) - if True, string are loaded in json data
294
295        *Returns* : new Object'''
296        with open(filename, 'rb') as file:
297            btype = file.read(1)
298        if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring:
299            with open(filename, 'r', newline='', encoding="utf-8") as file:
300                bjson = file.read()
301        else:
302            with open(filename, 'rb') as file:
303                bjson = file.read()
304        return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)

Generate Object from file storage.

Parameters

  • filename : string - file name (with path)
  • forcestring : boolean (default False) - if True, forces the UTF-8 data format, else the format is calculated
  • reindex : boolean (default True) - if True, default codec for each Field
  • decode_str: boolean (default False) - if True, string are loaded in json data

Returns : new Object

@classmethod
def ntv(cls, ntv_value, reindex=True):
318    @classmethod
319    def ntv(cls, ntv_value, reindex=True):
320        '''Generate an Dataset Object from a ntv_value
321
322        *Parameters*
323
324        - **ntv_value** : bytes, string, Ntv object to convert
325        - **reindex** : boolean (default True) - if True, default codec for each Field'''
326        return cls.from_ntv(ntv_value, reindex=reindex)

Generate an Dataset Object from a ntv_value

Parameters

  • ntv_value : bytes, string, Ntv object to convert
  • reindex : boolean (default True) - if True, default codec for each Field
@classmethod
def from_ntv(cls, ntv_value, reindex=True, decode_str=False):
328    @classmethod
329    def from_ntv(cls, ntv_value, reindex=True, decode_str=False):
330        '''Generate an Dataset Object from a ntv_value
331
332        *Parameters*
333
334        - **ntv_value** : bytes, string, Ntv object to convert
335        - **reindex** : boolean (default True) - if True, default codec for each Field
336        - **decode_str**: boolean (default False) - if True, string are loaded in json data'''
337        ntv = Ntv.obj(ntv_value, decode_str=decode_str)
338        if len(ntv) == 0:
339            return cls()
340        lidx = [list(cls.field_class.decode_ntv(ntvf)) for ntvf in ntv]
341        leng = max([idx[6] for idx in lidx])
342        for ind in range(len(lidx)):
343            if lidx[ind][0] == '':
344                lidx[ind][0] = 'i'+str(ind)
345            NtvConnector.init_ntv_keys(ind, lidx, leng)
346            #Dataset._init_ntv_keys(ind, lidx, leng)
347        lindex = [cls.field_class(idx[2], idx[0], idx[4], None, # idx[1] pour le type,
348                     reindex=reindex) for idx in lidx]
349        return cls(lindex, reindex=reindex)

Generate an Dataset Object from a ntv_value

Parameters

  • ntv_value : bytes, string, Ntv object to convert
  • reindex : boolean (default True) - if True, default codec for each Field
  • decode_str: boolean (default False) - if True, string are loaded in json data
def merge(self, fillvalue=nan, reindex=False, simplename=False):
375    def merge(self, fillvalue=math.nan, reindex=False, simplename=False):
376        '''
377        Merge method replaces Dataset objects included into its constituents.
378
379        *Parameters*
380
381        - **fillvalue** : object (default nan) - value used for the additional data
382        - **reindex** : boolean (default False) - if True, set default codec after transformation
383        - **simplename** : boolean (default False) - if True, new Field name are
384        the same as merged Field name else it is a composed name.
385
386        *Returns*: merged Dataset '''
387        ilc = copy(self)
388        delname = []
389        row = ilc[0]
390        if not isinstance(row, list):
391            row = [row]
392        merged, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
393                                                      simplename=simplename)
394        if oldname and not oldname in merged.lname:
395            delname.append(oldname)
396        for ind in range(1, len(ilc)):
397            oldidx = ilc.nindex(oldname)
398            for name in newname:
399                ilc.addindex(self.field(oldidx.codec, name, oldidx.keys))
400            row = ilc[ind]
401            if not isinstance(row, list):
402                row = [row]
403            rec, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
404                                                       simplename=simplename)
405            if oldname and newname != [oldname]:
406                delname.append(oldname)
407            for name in newname:
408                oldidx = merged.nindex(oldname)
409                fillval = self.field.s_to_i(fillvalue)
410                merged.addindex(
411                    self.field([fillval] * len(merged), name, oldidx.keys))
412            merged += rec
413        for name in set(delname):
414            if name:
415                merged.delindex(name)
416        if reindex:
417            merged.reindex()
418        ilc.lindex = merged.lindex
419        return ilc

Merge method replaces Dataset objects included into its constituents.

Parameters

  • fillvalue : object (default nan) - value used for the additional data
  • reindex : boolean (default False) - if True, set default codec after transformation
  • simplename : boolean (default False) - if True, new Field name are the same as merged Field name else it is a composed name.

Returns: merged Dataset

@classmethod
def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
421    @classmethod
422    def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
423        '''
424        Dataset constructor (external index).
425
426        *Parameters*
427
428        - **idxval** : list of Field or list of values (see data model)
429        - **idxname** : list of string (default None) - list of Field name (see data model)'''
430        if idxval is None:
431            idxval = []
432        if not isinstance(idxval, list):
433            return None
434        val = []
435        for idx in idxval:
436            if not isinstance(idx, list):
437                val.append([idx])
438            else:
439                val.append(idx)
440        lenval = [len(idx) for idx in val]
441        if lenval and max(lenval) != min(lenval):
442            raise DatasetError('the length of Iindex are different')
443        length = lenval[0] if lenval else 0
444        idxname = [None] * len(val) if idxname is None else idxname
445        for ind, name in enumerate(idxname):
446            if name is None or name == '$default':
447                idxname[ind] = 'i'+str(ind)
448        lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex,
449                                  fast=fast) for codec, name in zip(val, idxname)]
450        return cls(lindex, reindex=False)

Dataset constructor (external index).

Parameters

  • idxval : list of Field or list of values (see data model)
  • idxname : list of string (default None) - list of Field name (see data model)
complete

return a boolean (True if Dataset is complete and consistent)

consistent

True if all the record are different

category

dict with category for each Field

dimension

integer : number of primary Field

extidx

idx values (see data model)

extidxext

idx val (see data model)

groups

list with crossed Field groups

idxname

list of idx name

idxlen

list of idx codec length

indexlen

list of index codec length

iidx

list of keys for each idx

iindex

list of keys for each index

keys

list of keys for each index

lencomplete

number of values if complete (prod(idxlen primary))

lenindex

number of indexes

lenidx

number of idx

lidx

list of idx

lisvar

list of boolean : True if Field is var

lvar

list of var

lvarname

list of variable Field name

lunicrow

list of unic idx row

lvarrow

list of var row

lidxrow

list of idx row

lunicname

list of unique index name

lname

list of index name

primary

list of primary idx

primaryname

list of primary name

secondary

list of secondary idx

secondaryname

list of secondary name

setidx

list of codec for each idx

tiindex

list of keys for each record

zip

return a zip format for transpose(extidx) : tuple(tuple(rec))

Inherited Members
observation.dataset_structure.DatasetStructure
add
addindex
append
applyfilter
couplingmatrix
coupling
delrecord
delindex
full
getduplicates
iscanonorder
isinrecord
idxrecord
indexinfos
indicator
keytoval
loc
mix
merging
nindex
orindex
record
recidx
recvar
reindex
renameindex
reorder
setcanonorder
setfilter
setname
sort
swapindex
tostdcodec
tree
updateindex
valtokey
observation.dataset_interface.DatasetInterface
json
plot
to_csv
to_dataframe
to_file
to_ntv
to_xarray
voxel
view
vlist