python.observation.dataset
Created on Thu May 26 20:30:00 2022
@author: philippe@loco-labs.io
The python.observation.dataset
module contains the Dataset
class.
Documentation is available in other pages :
- The Json Standard for Dataset is define here
- The concept of 'indexed list' is describe in this page.
- The non-regression test are at this page
- The examples are :
1# -*- coding: utf-8 -*- 2""" 3Created on Thu May 26 20:30:00 2022 4 5@author: philippe@loco-labs.io 6 7The `python.observation.dataset` module contains the `Dataset` class. 8 9Documentation is available in other pages : 10 11- The Json Standard for Dataset is define 12[here](https://github.com/loco-philippe/Environmental-Sensing/tree/main/documentation/DatasetJSON-Standard.pdf) 13- The concept of 'indexed list' is describe in 14[this page](https://github.com/loco-philippe/Environmental-Sensing/wiki/Indexed-list). 15- The non-regression test are at 16[this page](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Tests/test_dataset.py) 17- The [examples](https://github.com/loco-philippe/Environmental-Sensing/tree/main/python/Examples/Dataset) 18 are : 19 - [creation](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_creation.ipynb) 20 - [variable](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_variable.ipynb) 21 - [update](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_update.ipynb) 22 - [structure](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_structure.ipynb) 23 - [structure-analysis](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_structure-analysis.ipynb) 24 25--- 26""" 27# %% declarations 28from collections import Counter 29from copy import copy 30from abc import ABC 31import math 32import json 33import csv 34 35from observation.fields import Nfield 36from observation.util import util 37from observation.dataset_interface import DatasetInterface, DatasetError 38from observation.dataset_structure import DatasetStructure 39from observation.dataset_analysis import Analysis 40from json_ntv.ntv import Ntv, NtvConnector 41 42class Dataset(DatasetStructure, DatasetInterface, ABC): 43 # %% intro 44 ''' 45 An `Dataset` is a representation of an indexed list. 46 47 *Attributes (for @property see methods)* : 48 49 - **lindex** : list of Field 50 - **analysis** : Analysis object (data structure) 51 52 The methods defined in this class are : 53 54 *constructor (@classmethod))* 55 56 - `Dataset.ntv` 57 - `Dataset.from_csv` 58 - `Dataset.from_ntv` 59 - `Dataset.from_file` 60 - `Dataset.merge` 61 62 *abstract static methods (@abstractmethod, @staticmethod)* 63 64 - `Dataset.field_class` 65 66 *dynamic value - module analysis (getters @property)* 67 68 - `Dataset.extidx` 69 - `Dataset.extidxext` 70 - `Dataset.groups` 71 - `Dataset.idxname` 72 - `Dataset.idxlen` 73 - `Dataset.iidx` 74 - `Dataset.lenidx` 75 - `Dataset.lidx` 76 - `Dataset.lidxrow` 77 - `Dataset.lisvar` 78 - `Dataset.lvar` 79 - `Dataset.lvarname` 80 - `Dataset.lvarrow` 81 - `Dataset.lunicname` 82 - `Dataset.lunicrow` 83 - `Dataset.primaryname` 84 - `Dataset.setidx` 85 - `Dataset.zip` 86 87 *dynamic value (getters @property)* 88 89 - `Dataset.keys` 90 - `Dataset.iindex` 91 - `Dataset.indexlen` 92 - `Dataset.lenindex` 93 - `Dataset.lname` 94 - `Dataset.tiindex` 95 96 *global value (getters @property)* 97 98 - `Dataset.category` 99 - `Dataset.complete` 100 - `Dataset.consistent` 101 - `Dataset.dimension` 102 - `Dataset.lencomplete` 103 - `Dataset.primary` 104 - `Dataset.secondary` 105 106 *selecting - infos methods (`observation.dataset_structure.DatasetStructure`)* 107 108 - `Dataset.couplingmatrix` 109 - `Dataset.idxrecord` 110 - `Dataset.indexinfos` 111 - `Dataset.indicator` 112 - `Dataset.iscanonorder` 113 - `Dataset.isinrecord` 114 - `Dataset.keytoval` 115 - `Dataset.loc` 116 - `Dataset.nindex` 117 - `Dataset.record` 118 - `Dataset.recidx` 119 - `Dataset.recvar` 120 - `Dataset.tree` 121 - `Dataset.valtokey` 122 123 *add - update methods (`observation.dataset_structure.DatasetStructure`)* 124 125 - `Dataset.add` 126 - `Dataset.addindex` 127 - `Dataset.append` 128 - `Dataset.delindex` 129 - `Dataset.delrecord` 130 - `Dataset.orindex` 131 - `Dataset.renameindex` 132 - `Dataset.setvar` 133 - `Dataset.setname` 134 - `Dataset.updateindex` 135 136 *structure management - methods (`observation.dataset_structure.DatasetStructure`)* 137 138 - `Dataset.applyfilter` 139 - `Dataset.coupling` 140 - `Dataset.full` 141 - `Dataset.getduplicates` 142 - `Dataset.mix` 143 - `Dataset.merging` 144 - `Dataset.reindex` 145 - `Dataset.reorder` 146 - `Dataset.setfilter` 147 - `Dataset.sort` 148 - `Dataset.swapindex` 149 - `Dataset.setcanonorder` 150 - `Dataset.tostdcodec` 151 152 *exports methods (`observation.dataset_interface.DatasetInterface`)* 153 154 - `Dataset.json` 155 - `Dataset.plot` 156 - `Dataset.to_obj` 157 - `Dataset.to_csv` 158 - `Dataset.to_dataframe` 159 - `Dataset.to_file` 160 - `Dataset.to_ntv` 161 - `Dataset.to_obj` 162 - `Dataset.to_xarray` 163 - `Dataset.view` 164 - `Dataset.vlist` 165 - `Dataset.voxel` 166 ''' 167 168 field_class = None 169 170 def __init__(self, listidx=None, reindex=True): 171 ''' 172 Dataset constructor. 173 174 *Parameters* 175 176 - **listidx** : list (default None) - list of Field data 177 - **reindex** : boolean (default True) - if True, default codec for each Field''' 178 179 self.name = self.__class__.__name__ 180 self.field = self.field_class 181 self.analysis = Analysis(self) 182 self.lindex = [] 183 if listidx.__class__.__name__ in ['Dataset', 'Observation', 'Ndataset', 'Sdataset']: 184 self.lindex = [copy(idx) for idx in listidx.lindex] 185 return 186 if not listidx: 187 return 188 self.lindex = listidx 189 if reindex: 190 self.reindex() 191 self.analysis.actualize() 192 return 193 194 """@classmethod 195 def dic(cls, idxdic=None, reindex=True): 196 ''' 197 Dataset constructor (external dictionnary). 198 199 *Parameters* 200 201 - **idxdic** : {name : values} (see data model) 202 if not idxdic: 203 return cls.ext(idxval=None, idxname=None, reindex=reindex) 204 if isinstance(idxdic, Dataset): 205 return idxdic 206 if not isinstance(idxdic, dict): 207 raise DatasetError("idxdic not dict") 208 return cls.ext(idxval=list(idxdic.values()), idxname=list(idxdic.keys()), 209 reindex=reindex)""" 210 211 """@classmethod 212 def ext(cls, idxval=None, idxname=None, reindex=True): 213 ''' 214 Dataset constructor (external index). 215 216 *Parameters* 217 218 - **idxval** : list of Field or list of values (see data model) 219 - **idxname** : list of string (default None) - list of Field name (see data model) 220 if idxval is None: 221 idxval = [] 222 if not isinstance(idxval, list): 223 return None 224 val = [ [idx] if not isinstance(idx, list) else idx for idx in idxval] 225 lenval = [len(idx) for idx in val] 226 if lenval and max(lenval) != min(lenval): 227 raise DatasetError('the length of Field are different') 228 length = lenval[0] if lenval else 0 229 if idxname is None: 230 idxname = [None] * len(val) 231 for ind, name in enumerate(idxname): 232 if name is None or name == ES.defaultindex: 233 idxname[ind] = 'i'+str(ind) 234 lidx = [list(FieldInterface.decodeobj( 235 idx, typevalue, context=False)) for idx in val] 236 lindex = [Field(idx[2], name, list(range(length)), idx[1], 237 lendefault=length, reindex=reindex) 238 for idx, name in zip(lidx, idxname)] 239 return cls(lindex, reindex=False)""" 240 241 @classmethod 242 def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True, 243 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}): 244 ''' 245 Dataset constructor (from a csv file). Each column represents index values. 246 247 *Parameters* 248 249 - **filename** : string (default 'dataset.csv'), name of the file to read 250 - **header** : boolean (default True). If True, the first raw is dedicated to names 251 - **nrow** : integer (default None). Number of row. If None, all the row else nrow 252 - **optcsv** : dict (default : quoting) - see csv.reader options''' 253 if not optcsv: 254 optcsv = {} 255 if not nrow: 256 nrow = -1 257 with open(filename, newline='', encoding="utf-8") as file: 258 reader = csv.reader(file, **optcsv) 259 irow = 0 260 for row in reader: 261 if irow == nrow: 262 break 263 if irow == 0: 264 idxval = [[] for i in range(len(row))] 265 idxname = [''] * len(row) 266 if irow == 0 and header: 267 idxname = row 268 else: 269 for i in range(len(row)): 270 if decode_json: 271 try: 272 idxval[i].append(json.loads(row[i])) 273 except: 274 idxval[i].append(row[i]) 275 else: 276 idxval[i].append(row[i]) 277 irow += 1 278 lindex = [cls.field_class.from_ntv({name:idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)] 279 return cls(listidx=lindex, reindex=True) 280 281 @classmethod 282 def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False): 283 ''' 284 Generate Object from file storage. 285 286 *Parameters* 287 288 - **filename** : string - file name (with path) 289 - **forcestring** : boolean (default False) - if True, 290 forces the UTF-8 data format, else the format is calculated 291 - **reindex** : boolean (default True) - if True, default codec for each Field 292 - **decode_str**: boolean (default False) - if True, string are loaded in json data 293 294 *Returns* : new Object''' 295 with open(filename, 'rb') as file: 296 btype = file.read(1) 297 if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring: 298 with open(filename, 'r', newline='', encoding="utf-8") as file: 299 bjson = file.read() 300 else: 301 with open(filename, 'rb') as file: 302 bjson = file.read() 303 return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str) 304 305 """@classmethod 306 def obj(cls, bsd=None, reindex=True, context=True): 307 ''' 308 Generate a new Object from a bytes, string or list value 309 310 *Parameters* 311 312 - **bsd** : bytes, string or list data to convert 313 - **reindex** : boolean (default True) - if True, default codec for each Field 314 - **context** : boolean (default True) - if False, only codec and keys are included''' 315 return cls.from_obj(bsd, reindex=reindex, context=context)""" 316 317 @classmethod 318 def ntv(cls, ntv_value, reindex=True): 319 '''Generate an Dataset Object from a ntv_value 320 321 *Parameters* 322 323 - **ntv_value** : bytes, string, Ntv object to convert 324 - **reindex** : boolean (default True) - if True, default codec for each Field''' 325 return cls.from_ntv(ntv_value, reindex=reindex) 326 327 @classmethod 328 def from_ntv(cls, ntv_value, reindex=True, decode_str=False): 329 '''Generate an Dataset Object from a ntv_value 330 331 *Parameters* 332 333 - **ntv_value** : bytes, string, Ntv object to convert 334 - **reindex** : boolean (default True) - if True, default codec for each Field 335 - **decode_str**: boolean (default False) - if True, string are loaded in json data''' 336 ntv = Ntv.obj(ntv_value, decode_str=decode_str) 337 if len(ntv) == 0: 338 return cls() 339 lidx = [list(cls.field_class.decode_ntv(ntvf)) for ntvf in ntv] 340 leng = max([idx[6] for idx in lidx]) 341 for ind in range(len(lidx)): 342 if lidx[ind][0] == '': 343 lidx[ind][0] = 'i'+str(ind) 344 NtvConnector.init_ntv_keys(ind, lidx, leng) 345 #Dataset._init_ntv_keys(ind, lidx, leng) 346 lindex = [cls.field_class(idx[2], idx[0], idx[4], None, # idx[1] pour le type, 347 reindex=reindex) for idx in lidx] 348 return cls(lindex, reindex=reindex) 349 350 """@classmethod 351 def from_obj(cls, bsd=None, reindex=True, context=True): 352 ''' 353 Generate an Dataset Object from a bytes, string or list value 354 355 *Parameters* 356 357 - **bsd** : bytes, string, DataFrame or list data to convert 358 - **reindex** : boolean (default True) - if True, default codec for each Field 359 - **context** : boolean (default True) - if False, only codec and keys are included''' 360 if isinstance(bsd, cls): 361 return bsd 362 if bsd is None: 363 bsd = [] 364 if isinstance(bsd, bytes): 365 lis = cbor2.loads(bsd) 366 elif isinstance(bsd, str): 367 lis = json.loads(bsd, object_hook=CborDecoder().codecbor) 368 elif isinstance(bsd, (list, dict)) or bsd.__class__.__name__ == 'DataFrame': 369 lis = bsd 370 else: 371 raise DatasetError("the type of parameter is not available") 372 return cls._init_obj(lis, reindex=reindex, context=context)""" 373 374 def merge(self, fillvalue=math.nan, reindex=False, simplename=False): 375 ''' 376 Merge method replaces Dataset objects included into its constituents. 377 378 *Parameters* 379 380 - **fillvalue** : object (default nan) - value used for the additional data 381 - **reindex** : boolean (default False) - if True, set default codec after transformation 382 - **simplename** : boolean (default False) - if True, new Field name are 383 the same as merged Field name else it is a composed name. 384 385 *Returns*: merged Dataset ''' 386 ilc = copy(self) 387 delname = [] 388 row = ilc[0] 389 if not isinstance(row, list): 390 row = [row] 391 merged, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname), 392 simplename=simplename) 393 if oldname and not oldname in merged.lname: 394 delname.append(oldname) 395 for ind in range(1, len(ilc)): 396 oldidx = ilc.nindex(oldname) 397 for name in newname: 398 ilc.addindex(self.field(oldidx.codec, name, oldidx.keys)) 399 row = ilc[ind] 400 if not isinstance(row, list): 401 row = [row] 402 rec, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname), 403 simplename=simplename) 404 if oldname and newname != [oldname]: 405 delname.append(oldname) 406 for name in newname: 407 oldidx = merged.nindex(oldname) 408 fillval = self.field.s_to_i(fillvalue) 409 merged.addindex( 410 self.field([fillval] * len(merged), name, oldidx.keys)) 411 merged += rec 412 for name in set(delname): 413 if name: 414 merged.delindex(name) 415 if reindex: 416 merged.reindex() 417 ilc.lindex = merged.lindex 418 return ilc 419 420 @classmethod 421 def ext(cls, idxval=None, idxname=None, reindex=True, fast=False): 422 ''' 423 Dataset constructor (external index). 424 425 *Parameters* 426 427 - **idxval** : list of Field or list of values (see data model) 428 - **idxname** : list of string (default None) - list of Field name (see data model)''' 429 if idxval is None: 430 idxval = [] 431 if not isinstance(idxval, list): 432 return None 433 val = [] 434 for idx in idxval: 435 if not isinstance(idx, list): 436 val.append([idx]) 437 else: 438 val.append(idx) 439 lenval = [len(idx) for idx in val] 440 if lenval and max(lenval) != min(lenval): 441 raise DatasetError('the length of Iindex are different') 442 length = lenval[0] if lenval else 0 443 idxname = [None] * len(val) if idxname is None else idxname 444 for ind, name in enumerate(idxname): 445 if name is None or name == '$default': 446 idxname[ind] = 'i'+str(ind) 447 lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex, 448 fast=fast) for codec, name in zip(val, idxname)] 449 return cls(lindex, reindex=False) 450 451# %% internal 452 453 """@staticmethod 454 def _init_ntv_keys(ind, lidx, leng): 455 ''' initialization of explicit keys data in lidx object''' 456 # name: 0, type: 1, codec: 2, parent: 3, keys: 4, coef: 5, leng: 6 457 name, typ, codec, parent, keys, coef, length = lidx[ind] 458 if (keys, parent, coef) == (None, None, None): # full or unique 459 if len(codec) == 1: # unique 460 lidx[ind][4] = [0] * leng 461 elif len(codec) == leng: # full 462 lidx[ind][4] = list(range(leng)) 463 else: 464 raise DatasetError('impossible to generate keys') 465 return 466 if keys and len(keys) > 1 and parent is None: #complete 467 return 468 if coef: #primary 469 lidx[ind][4] = [(ikey % (coef * len(codec))) // coef for ikey in range(leng)] 470 lidx[ind][3] = None 471 return 472 if parent is None: 473 raise DatasetError('keys not referenced') 474 if not lidx[parent][4] or len(lidx[parent][4]) != leng: 475 Dataset._init_ntv_keys(parent, lidx, leng) 476 if not keys and len(codec) == len(lidx[parent][2]): # implicit 477 lidx[ind][4] = lidx[parent][4] 478 lidx[ind][3] = None 479 return 480 lidx[ind][4] = Nfield.keysfromderkeys(lidx[parent][4], keys) # relative 481 lidx[ind][3] = None 482 return""" 483 484 @staticmethod 485 def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False): 486 #row = rec[0] if isinstance(rec, list) else rec 487 row = rec[0] 488 if not isinstance(row, list): 489 row = [row] 490 var = -1 491 for ind, val in enumerate(row): 492 if val.__class__.__name__ in ['Sdataset', 'Ndataset', 'Observation']: 493 var = ind 494 break 495 if var < 0: 496 return (rec, None, []) 497 ilis = row[var] 498 oldname = rec.lname[var] 499 if ilis.lname == ['i0']: 500 newname = [oldname] 501 ilis.setname(newname) 502 elif not simplename: 503 newname = [oldname + '_' + name for name in ilis.lname] 504 ilis.setname(newname) 505 else: 506 newname = copy(ilis.lname) 507 for name in rec.lname: 508 if name in newname: 509 newname.remove(name) 510 else: 511 updidx = name in ilis.lname and not updateidx 512 ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)}, 513 merge=mergeidx, update=updidx) 514 #ilis.addindex([name, [rec.nindex(name)[0]] * len(ilis)], 515 # merge=mergeidx, update=updidx) 516 return (ilis, oldname, newname) 517 518# %% special 519 def __str__(self): 520 '''return string format for var and lidx''' 521 stri = '' 522 if self.lvar: 523 stri += 'variables :\n' 524 for idx in self.lvar: 525 stri += ' ' + str(idx) + '\n' 526 if self.lidx: 527 stri += 'index :\n' 528 for idx in self.lidx: 529 stri += ' ' + str(idx) + '\n' 530 return stri 531 532 def __repr__(self): 533 '''return classname, number of value and number of indexes''' 534 return self.__class__.__name__ + '[' + str(len(self)) + ', ' + str(self.lenindex) + ']' 535 536 def __len__(self): 537 ''' len of values''' 538 if not self.lindex: 539 return 0 540 return len(self.lindex[0]) 541 542 def __contains__(self, item): 543 ''' list of lindex values''' 544 return item in self.lindex 545 546 def __getitem__(self, ind): 547 ''' return value record (value conversion)''' 548 res = [idx[ind] for idx in self.lindex] 549 if len(res) == 1: 550 return res[0] 551 return res 552 553 def __setitem__(self, ind, item): 554 ''' modify the Field values for each Field at the row ind''' 555 if not isinstance(item, list): 556 item = [item] 557 for val, idx in zip(item, self.lindex): 558 idx[ind] = val 559 560 def __delitem__(self, ind): 561 ''' remove all Field item at the row ind''' 562 for idx in self.lindex: 563 del idx[ind] 564 565 def __hash__(self): 566 '''return sum of all hash(Field)''' 567 return sum([hash(idx) for idx in self.lindex]) 568 569 def _hashi(self): 570 '''return sum of all hashi(Field)''' 571 return sum([idx._hashi() for idx in self.lindex]) 572 573 def __eq__(self, other): 574 ''' equal if hash values are equal''' 575 return hash(self) == hash(other) 576 577 def __add__(self, other): 578 ''' Add other's values to self's values in a new Dataset''' 579 newil = copy(self) 580 newil.__iadd__(other) 581 return newil 582 583 def __iadd__(self, other): 584 ''' Add other's values to self's values''' 585 return self.add(other, name=True, solve=False) 586 587 def __or__(self, other): 588 ''' Add other's index to self's index in a new Dataset''' 589 newil = copy(self) 590 newil.__ior__(other) 591 return newil 592 593 def __ior__(self, other): 594 ''' Add other's index to self's index''' 595 return self.orindex(other, first=False, merge=True, update=False) 596 597 def __copy__(self): 598 ''' Copy all the data ''' 599 return self.__class__(self) 600 601# %% property 602 @property 603 def complete(self): 604 '''return a boolean (True if Dataset is complete and consistent)''' 605 return self.lencomplete == len(self) and self.consistent 606 607 @property 608 def consistent(self): 609 ''' True if all the record are different''' 610 if not self.iidx: 611 return True 612 return max(Counter(zip(*self.iidx)).values()) == 1 613 614 @property 615 def category(self): 616 ''' dict with category for each Field''' 617 return {field['name']: field['cat'] for field in self.indexinfos()} 618 619 @property 620 def dimension(self): 621 ''' integer : number of primary Field''' 622 return len(self.primary) 623 624 @property 625 def extidx(self): 626 '''idx values (see data model)''' 627 return [idx.values for idx in self.lidx] 628 629 @property 630 def extidxext(self): 631 '''idx val (see data model)''' 632 return [idx.val for idx in self.lidx] 633 634 @property 635 def groups(self): 636 ''' list with crossed Field groups''' 637 return self.analysis.getgroups() 638 639 @property 640 def idxname(self): 641 ''' list of idx name''' 642 return [idx.name for idx in self.lidx] 643 644 @property 645 def idxlen(self): 646 ''' list of idx codec length''' 647 return [len(idx.codec) for idx in self.lidx] 648 649 @property 650 def indexlen(self): 651 ''' list of index codec length''' 652 return [len(idx.codec) for idx in self.lindex] 653 654 @property 655 def iidx(self): 656 ''' list of keys for each idx''' 657 return [idx.keys for idx in self.lidx] 658 659 @property 660 def iindex(self): 661 ''' list of keys for each index''' 662 return [idx.keys for idx in self.lindex] 663 664 @property 665 def keys(self): 666 ''' list of keys for each index''' 667 return [idx.keys for idx in self.lindex] 668 669 @property 670 def lencomplete(self): 671 '''number of values if complete (prod(idxlen primary))''' 672 primary = self.primary 673 return util.mul([self.idxlen[i] for i in primary]) 674 675 @property 676 def lenindex(self): 677 ''' number of indexes''' 678 return len(self.lindex) 679 680 @property 681 def lenidx(self): 682 ''' number of idx''' 683 return len(self.lidx) 684 685 @property 686 def lidx(self): 687 '''list of idx''' 688 return [self.lindex[i] for i in self.lidxrow] 689 690 @property 691 def lisvar(self): 692 '''list of boolean : True if Field is var''' 693 return [name in self.lvarname for name in self.lname] 694 695 @property 696 def lvar(self): 697 '''list of var''' 698 return [self.lindex[i] for i in self.lvarrow] 699 700 @property 701 def lvarname(self): 702 ''' list of variable Field name''' 703 return self.analysis.getvarname() 704 705 @property 706 def lunicrow(self): 707 '''list of unic idx row''' 708 return [self.lname.index(name) for name in self.lunicname] 709 710 @property 711 def lvarrow(self): 712 '''list of var row''' 713 return [self.lname.index(name) for name in self.lvarname] 714 715 @property 716 def lidxrow(self): 717 '''list of idx row''' 718 return [i for i in range(self.lenindex) if i not in self.lvarrow] 719 720 @property 721 def lunicname(self): 722 ''' list of unique index name''' 723 return [idx.name for idx in self.lindex if len(idx.codec) == 1] 724 725 @property 726 def lname(self): 727 ''' list of index name''' 728 return [idx.name for idx in self.lindex] 729 730 @property 731 def primary(self): 732 ''' list of primary idx''' 733 return self.analysis.getprimary() 734 735 @property 736 def primaryname(self): 737 ''' list of primary name''' 738 return [self.lidx[idx].name for idx in self.primary] 739 740 @property 741 def secondary(self): 742 ''' list of secondary idx''' 743 return self.analysis.getsecondary() 744 745 @property 746 def secondaryname(self): 747 ''' list of secondary name''' 748 return [self.lindex[idx].name for idx in self.secondary] 749 750 @property 751 def setidx(self): 752 '''list of codec for each idx''' 753 return [idx.codec for idx in self.lidx] 754 755 @property 756 def tiindex(self): 757 ''' list of keys for each record''' 758 return util.list(list(zip(*self.iindex))) 759 760 @property 761 def zip(self): 762 '''return a zip format for transpose(extidx) : tuple(tuple(rec))''' 763 textidx = util.transpose(self.extidx) 764 if not textidx: 765 return None 766 return tuple(tuple(idx) for idx in textidx)
43class Dataset(DatasetStructure, DatasetInterface, ABC): 44 # %% intro 45 ''' 46 An `Dataset` is a representation of an indexed list. 47 48 *Attributes (for @property see methods)* : 49 50 - **lindex** : list of Field 51 - **analysis** : Analysis object (data structure) 52 53 The methods defined in this class are : 54 55 *constructor (@classmethod))* 56 57 - `Dataset.ntv` 58 - `Dataset.from_csv` 59 - `Dataset.from_ntv` 60 - `Dataset.from_file` 61 - `Dataset.merge` 62 63 *abstract static methods (@abstractmethod, @staticmethod)* 64 65 - `Dataset.field_class` 66 67 *dynamic value - module analysis (getters @property)* 68 69 - `Dataset.extidx` 70 - `Dataset.extidxext` 71 - `Dataset.groups` 72 - `Dataset.idxname` 73 - `Dataset.idxlen` 74 - `Dataset.iidx` 75 - `Dataset.lenidx` 76 - `Dataset.lidx` 77 - `Dataset.lidxrow` 78 - `Dataset.lisvar` 79 - `Dataset.lvar` 80 - `Dataset.lvarname` 81 - `Dataset.lvarrow` 82 - `Dataset.lunicname` 83 - `Dataset.lunicrow` 84 - `Dataset.primaryname` 85 - `Dataset.setidx` 86 - `Dataset.zip` 87 88 *dynamic value (getters @property)* 89 90 - `Dataset.keys` 91 - `Dataset.iindex` 92 - `Dataset.indexlen` 93 - `Dataset.lenindex` 94 - `Dataset.lname` 95 - `Dataset.tiindex` 96 97 *global value (getters @property)* 98 99 - `Dataset.category` 100 - `Dataset.complete` 101 - `Dataset.consistent` 102 - `Dataset.dimension` 103 - `Dataset.lencomplete` 104 - `Dataset.primary` 105 - `Dataset.secondary` 106 107 *selecting - infos methods (`observation.dataset_structure.DatasetStructure`)* 108 109 - `Dataset.couplingmatrix` 110 - `Dataset.idxrecord` 111 - `Dataset.indexinfos` 112 - `Dataset.indicator` 113 - `Dataset.iscanonorder` 114 - `Dataset.isinrecord` 115 - `Dataset.keytoval` 116 - `Dataset.loc` 117 - `Dataset.nindex` 118 - `Dataset.record` 119 - `Dataset.recidx` 120 - `Dataset.recvar` 121 - `Dataset.tree` 122 - `Dataset.valtokey` 123 124 *add - update methods (`observation.dataset_structure.DatasetStructure`)* 125 126 - `Dataset.add` 127 - `Dataset.addindex` 128 - `Dataset.append` 129 - `Dataset.delindex` 130 - `Dataset.delrecord` 131 - `Dataset.orindex` 132 - `Dataset.renameindex` 133 - `Dataset.setvar` 134 - `Dataset.setname` 135 - `Dataset.updateindex` 136 137 *structure management - methods (`observation.dataset_structure.DatasetStructure`)* 138 139 - `Dataset.applyfilter` 140 - `Dataset.coupling` 141 - `Dataset.full` 142 - `Dataset.getduplicates` 143 - `Dataset.mix` 144 - `Dataset.merging` 145 - `Dataset.reindex` 146 - `Dataset.reorder` 147 - `Dataset.setfilter` 148 - `Dataset.sort` 149 - `Dataset.swapindex` 150 - `Dataset.setcanonorder` 151 - `Dataset.tostdcodec` 152 153 *exports methods (`observation.dataset_interface.DatasetInterface`)* 154 155 - `Dataset.json` 156 - `Dataset.plot` 157 - `Dataset.to_obj` 158 - `Dataset.to_csv` 159 - `Dataset.to_dataframe` 160 - `Dataset.to_file` 161 - `Dataset.to_ntv` 162 - `Dataset.to_obj` 163 - `Dataset.to_xarray` 164 - `Dataset.view` 165 - `Dataset.vlist` 166 - `Dataset.voxel` 167 ''' 168 169 field_class = None 170 171 def __init__(self, listidx=None, reindex=True): 172 ''' 173 Dataset constructor. 174 175 *Parameters* 176 177 - **listidx** : list (default None) - list of Field data 178 - **reindex** : boolean (default True) - if True, default codec for each Field''' 179 180 self.name = self.__class__.__name__ 181 self.field = self.field_class 182 self.analysis = Analysis(self) 183 self.lindex = [] 184 if listidx.__class__.__name__ in ['Dataset', 'Observation', 'Ndataset', 'Sdataset']: 185 self.lindex = [copy(idx) for idx in listidx.lindex] 186 return 187 if not listidx: 188 return 189 self.lindex = listidx 190 if reindex: 191 self.reindex() 192 self.analysis.actualize() 193 return 194 195 """@classmethod 196 def dic(cls, idxdic=None, reindex=True): 197 ''' 198 Dataset constructor (external dictionnary). 199 200 *Parameters* 201 202 - **idxdic** : {name : values} (see data model) 203 if not idxdic: 204 return cls.ext(idxval=None, idxname=None, reindex=reindex) 205 if isinstance(idxdic, Dataset): 206 return idxdic 207 if not isinstance(idxdic, dict): 208 raise DatasetError("idxdic not dict") 209 return cls.ext(idxval=list(idxdic.values()), idxname=list(idxdic.keys()), 210 reindex=reindex)""" 211 212 """@classmethod 213 def ext(cls, idxval=None, idxname=None, reindex=True): 214 ''' 215 Dataset constructor (external index). 216 217 *Parameters* 218 219 - **idxval** : list of Field or list of values (see data model) 220 - **idxname** : list of string (default None) - list of Field name (see data model) 221 if idxval is None: 222 idxval = [] 223 if not isinstance(idxval, list): 224 return None 225 val = [ [idx] if not isinstance(idx, list) else idx for idx in idxval] 226 lenval = [len(idx) for idx in val] 227 if lenval and max(lenval) != min(lenval): 228 raise DatasetError('the length of Field are different') 229 length = lenval[0] if lenval else 0 230 if idxname is None: 231 idxname = [None] * len(val) 232 for ind, name in enumerate(idxname): 233 if name is None or name == ES.defaultindex: 234 idxname[ind] = 'i'+str(ind) 235 lidx = [list(FieldInterface.decodeobj( 236 idx, typevalue, context=False)) for idx in val] 237 lindex = [Field(idx[2], name, list(range(length)), idx[1], 238 lendefault=length, reindex=reindex) 239 for idx, name in zip(lidx, idxname)] 240 return cls(lindex, reindex=False)""" 241 242 @classmethod 243 def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True, 244 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}): 245 ''' 246 Dataset constructor (from a csv file). Each column represents index values. 247 248 *Parameters* 249 250 - **filename** : string (default 'dataset.csv'), name of the file to read 251 - **header** : boolean (default True). If True, the first raw is dedicated to names 252 - **nrow** : integer (default None). Number of row. If None, all the row else nrow 253 - **optcsv** : dict (default : quoting) - see csv.reader options''' 254 if not optcsv: 255 optcsv = {} 256 if not nrow: 257 nrow = -1 258 with open(filename, newline='', encoding="utf-8") as file: 259 reader = csv.reader(file, **optcsv) 260 irow = 0 261 for row in reader: 262 if irow == nrow: 263 break 264 if irow == 0: 265 idxval = [[] for i in range(len(row))] 266 idxname = [''] * len(row) 267 if irow == 0 and header: 268 idxname = row 269 else: 270 for i in range(len(row)): 271 if decode_json: 272 try: 273 idxval[i].append(json.loads(row[i])) 274 except: 275 idxval[i].append(row[i]) 276 else: 277 idxval[i].append(row[i]) 278 irow += 1 279 lindex = [cls.field_class.from_ntv({name:idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)] 280 return cls(listidx=lindex, reindex=True) 281 282 @classmethod 283 def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False): 284 ''' 285 Generate Object from file storage. 286 287 *Parameters* 288 289 - **filename** : string - file name (with path) 290 - **forcestring** : boolean (default False) - if True, 291 forces the UTF-8 data format, else the format is calculated 292 - **reindex** : boolean (default True) - if True, default codec for each Field 293 - **decode_str**: boolean (default False) - if True, string are loaded in json data 294 295 *Returns* : new Object''' 296 with open(filename, 'rb') as file: 297 btype = file.read(1) 298 if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring: 299 with open(filename, 'r', newline='', encoding="utf-8") as file: 300 bjson = file.read() 301 else: 302 with open(filename, 'rb') as file: 303 bjson = file.read() 304 return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str) 305 306 """@classmethod 307 def obj(cls, bsd=None, reindex=True, context=True): 308 ''' 309 Generate a new Object from a bytes, string or list value 310 311 *Parameters* 312 313 - **bsd** : bytes, string or list data to convert 314 - **reindex** : boolean (default True) - if True, default codec for each Field 315 - **context** : boolean (default True) - if False, only codec and keys are included''' 316 return cls.from_obj(bsd, reindex=reindex, context=context)""" 317 318 @classmethod 319 def ntv(cls, ntv_value, reindex=True): 320 '''Generate an Dataset Object from a ntv_value 321 322 *Parameters* 323 324 - **ntv_value** : bytes, string, Ntv object to convert 325 - **reindex** : boolean (default True) - if True, default codec for each Field''' 326 return cls.from_ntv(ntv_value, reindex=reindex) 327 328 @classmethod 329 def from_ntv(cls, ntv_value, reindex=True, decode_str=False): 330 '''Generate an Dataset Object from a ntv_value 331 332 *Parameters* 333 334 - **ntv_value** : bytes, string, Ntv object to convert 335 - **reindex** : boolean (default True) - if True, default codec for each Field 336 - **decode_str**: boolean (default False) - if True, string are loaded in json data''' 337 ntv = Ntv.obj(ntv_value, decode_str=decode_str) 338 if len(ntv) == 0: 339 return cls() 340 lidx = [list(cls.field_class.decode_ntv(ntvf)) for ntvf in ntv] 341 leng = max([idx[6] for idx in lidx]) 342 for ind in range(len(lidx)): 343 if lidx[ind][0] == '': 344 lidx[ind][0] = 'i'+str(ind) 345 NtvConnector.init_ntv_keys(ind, lidx, leng) 346 #Dataset._init_ntv_keys(ind, lidx, leng) 347 lindex = [cls.field_class(idx[2], idx[0], idx[4], None, # idx[1] pour le type, 348 reindex=reindex) for idx in lidx] 349 return cls(lindex, reindex=reindex) 350 351 """@classmethod 352 def from_obj(cls, bsd=None, reindex=True, context=True): 353 ''' 354 Generate an Dataset Object from a bytes, string or list value 355 356 *Parameters* 357 358 - **bsd** : bytes, string, DataFrame or list data to convert 359 - **reindex** : boolean (default True) - if True, default codec for each Field 360 - **context** : boolean (default True) - if False, only codec and keys are included''' 361 if isinstance(bsd, cls): 362 return bsd 363 if bsd is None: 364 bsd = [] 365 if isinstance(bsd, bytes): 366 lis = cbor2.loads(bsd) 367 elif isinstance(bsd, str): 368 lis = json.loads(bsd, object_hook=CborDecoder().codecbor) 369 elif isinstance(bsd, (list, dict)) or bsd.__class__.__name__ == 'DataFrame': 370 lis = bsd 371 else: 372 raise DatasetError("the type of parameter is not available") 373 return cls._init_obj(lis, reindex=reindex, context=context)""" 374 375 def merge(self, fillvalue=math.nan, reindex=False, simplename=False): 376 ''' 377 Merge method replaces Dataset objects included into its constituents. 378 379 *Parameters* 380 381 - **fillvalue** : object (default nan) - value used for the additional data 382 - **reindex** : boolean (default False) - if True, set default codec after transformation 383 - **simplename** : boolean (default False) - if True, new Field name are 384 the same as merged Field name else it is a composed name. 385 386 *Returns*: merged Dataset ''' 387 ilc = copy(self) 388 delname = [] 389 row = ilc[0] 390 if not isinstance(row, list): 391 row = [row] 392 merged, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname), 393 simplename=simplename) 394 if oldname and not oldname in merged.lname: 395 delname.append(oldname) 396 for ind in range(1, len(ilc)): 397 oldidx = ilc.nindex(oldname) 398 for name in newname: 399 ilc.addindex(self.field(oldidx.codec, name, oldidx.keys)) 400 row = ilc[ind] 401 if not isinstance(row, list): 402 row = [row] 403 rec, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname), 404 simplename=simplename) 405 if oldname and newname != [oldname]: 406 delname.append(oldname) 407 for name in newname: 408 oldidx = merged.nindex(oldname) 409 fillval = self.field.s_to_i(fillvalue) 410 merged.addindex( 411 self.field([fillval] * len(merged), name, oldidx.keys)) 412 merged += rec 413 for name in set(delname): 414 if name: 415 merged.delindex(name) 416 if reindex: 417 merged.reindex() 418 ilc.lindex = merged.lindex 419 return ilc 420 421 @classmethod 422 def ext(cls, idxval=None, idxname=None, reindex=True, fast=False): 423 ''' 424 Dataset constructor (external index). 425 426 *Parameters* 427 428 - **idxval** : list of Field or list of values (see data model) 429 - **idxname** : list of string (default None) - list of Field name (see data model)''' 430 if idxval is None: 431 idxval = [] 432 if not isinstance(idxval, list): 433 return None 434 val = [] 435 for idx in idxval: 436 if not isinstance(idx, list): 437 val.append([idx]) 438 else: 439 val.append(idx) 440 lenval = [len(idx) for idx in val] 441 if lenval and max(lenval) != min(lenval): 442 raise DatasetError('the length of Iindex are different') 443 length = lenval[0] if lenval else 0 444 idxname = [None] * len(val) if idxname is None else idxname 445 for ind, name in enumerate(idxname): 446 if name is None or name == '$default': 447 idxname[ind] = 'i'+str(ind) 448 lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex, 449 fast=fast) for codec, name in zip(val, idxname)] 450 return cls(lindex, reindex=False) 451 452# %% internal 453 454 """@staticmethod 455 def _init_ntv_keys(ind, lidx, leng): 456 ''' initialization of explicit keys data in lidx object''' 457 # name: 0, type: 1, codec: 2, parent: 3, keys: 4, coef: 5, leng: 6 458 name, typ, codec, parent, keys, coef, length = lidx[ind] 459 if (keys, parent, coef) == (None, None, None): # full or unique 460 if len(codec) == 1: # unique 461 lidx[ind][4] = [0] * leng 462 elif len(codec) == leng: # full 463 lidx[ind][4] = list(range(leng)) 464 else: 465 raise DatasetError('impossible to generate keys') 466 return 467 if keys and len(keys) > 1 and parent is None: #complete 468 return 469 if coef: #primary 470 lidx[ind][4] = [(ikey % (coef * len(codec))) // coef for ikey in range(leng)] 471 lidx[ind][3] = None 472 return 473 if parent is None: 474 raise DatasetError('keys not referenced') 475 if not lidx[parent][4] or len(lidx[parent][4]) != leng: 476 Dataset._init_ntv_keys(parent, lidx, leng) 477 if not keys and len(codec) == len(lidx[parent][2]): # implicit 478 lidx[ind][4] = lidx[parent][4] 479 lidx[ind][3] = None 480 return 481 lidx[ind][4] = Nfield.keysfromderkeys(lidx[parent][4], keys) # relative 482 lidx[ind][3] = None 483 return""" 484 485 @staticmethod 486 def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False): 487 #row = rec[0] if isinstance(rec, list) else rec 488 row = rec[0] 489 if not isinstance(row, list): 490 row = [row] 491 var = -1 492 for ind, val in enumerate(row): 493 if val.__class__.__name__ in ['Sdataset', 'Ndataset', 'Observation']: 494 var = ind 495 break 496 if var < 0: 497 return (rec, None, []) 498 ilis = row[var] 499 oldname = rec.lname[var] 500 if ilis.lname == ['i0']: 501 newname = [oldname] 502 ilis.setname(newname) 503 elif not simplename: 504 newname = [oldname + '_' + name for name in ilis.lname] 505 ilis.setname(newname) 506 else: 507 newname = copy(ilis.lname) 508 for name in rec.lname: 509 if name in newname: 510 newname.remove(name) 511 else: 512 updidx = name in ilis.lname and not updateidx 513 ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)}, 514 merge=mergeidx, update=updidx) 515 #ilis.addindex([name, [rec.nindex(name)[0]] * len(ilis)], 516 # merge=mergeidx, update=updidx) 517 return (ilis, oldname, newname) 518 519# %% special 520 def __str__(self): 521 '''return string format for var and lidx''' 522 stri = '' 523 if self.lvar: 524 stri += 'variables :\n' 525 for idx in self.lvar: 526 stri += ' ' + str(idx) + '\n' 527 if self.lidx: 528 stri += 'index :\n' 529 for idx in self.lidx: 530 stri += ' ' + str(idx) + '\n' 531 return stri 532 533 def __repr__(self): 534 '''return classname, number of value and number of indexes''' 535 return self.__class__.__name__ + '[' + str(len(self)) + ', ' + str(self.lenindex) + ']' 536 537 def __len__(self): 538 ''' len of values''' 539 if not self.lindex: 540 return 0 541 return len(self.lindex[0]) 542 543 def __contains__(self, item): 544 ''' list of lindex values''' 545 return item in self.lindex 546 547 def __getitem__(self, ind): 548 ''' return value record (value conversion)''' 549 res = [idx[ind] for idx in self.lindex] 550 if len(res) == 1: 551 return res[0] 552 return res 553 554 def __setitem__(self, ind, item): 555 ''' modify the Field values for each Field at the row ind''' 556 if not isinstance(item, list): 557 item = [item] 558 for val, idx in zip(item, self.lindex): 559 idx[ind] = val 560 561 def __delitem__(self, ind): 562 ''' remove all Field item at the row ind''' 563 for idx in self.lindex: 564 del idx[ind] 565 566 def __hash__(self): 567 '''return sum of all hash(Field)''' 568 return sum([hash(idx) for idx in self.lindex]) 569 570 def _hashi(self): 571 '''return sum of all hashi(Field)''' 572 return sum([idx._hashi() for idx in self.lindex]) 573 574 def __eq__(self, other): 575 ''' equal if hash values are equal''' 576 return hash(self) == hash(other) 577 578 def __add__(self, other): 579 ''' Add other's values to self's values in a new Dataset''' 580 newil = copy(self) 581 newil.__iadd__(other) 582 return newil 583 584 def __iadd__(self, other): 585 ''' Add other's values to self's values''' 586 return self.add(other, name=True, solve=False) 587 588 def __or__(self, other): 589 ''' Add other's index to self's index in a new Dataset''' 590 newil = copy(self) 591 newil.__ior__(other) 592 return newil 593 594 def __ior__(self, other): 595 ''' Add other's index to self's index''' 596 return self.orindex(other, first=False, merge=True, update=False) 597 598 def __copy__(self): 599 ''' Copy all the data ''' 600 return self.__class__(self) 601 602# %% property 603 @property 604 def complete(self): 605 '''return a boolean (True if Dataset is complete and consistent)''' 606 return self.lencomplete == len(self) and self.consistent 607 608 @property 609 def consistent(self): 610 ''' True if all the record are different''' 611 if not self.iidx: 612 return True 613 return max(Counter(zip(*self.iidx)).values()) == 1 614 615 @property 616 def category(self): 617 ''' dict with category for each Field''' 618 return {field['name']: field['cat'] for field in self.indexinfos()} 619 620 @property 621 def dimension(self): 622 ''' integer : number of primary Field''' 623 return len(self.primary) 624 625 @property 626 def extidx(self): 627 '''idx values (see data model)''' 628 return [idx.values for idx in self.lidx] 629 630 @property 631 def extidxext(self): 632 '''idx val (see data model)''' 633 return [idx.val for idx in self.lidx] 634 635 @property 636 def groups(self): 637 ''' list with crossed Field groups''' 638 return self.analysis.getgroups() 639 640 @property 641 def idxname(self): 642 ''' list of idx name''' 643 return [idx.name for idx in self.lidx] 644 645 @property 646 def idxlen(self): 647 ''' list of idx codec length''' 648 return [len(idx.codec) for idx in self.lidx] 649 650 @property 651 def indexlen(self): 652 ''' list of index codec length''' 653 return [len(idx.codec) for idx in self.lindex] 654 655 @property 656 def iidx(self): 657 ''' list of keys for each idx''' 658 return [idx.keys for idx in self.lidx] 659 660 @property 661 def iindex(self): 662 ''' list of keys for each index''' 663 return [idx.keys for idx in self.lindex] 664 665 @property 666 def keys(self): 667 ''' list of keys for each index''' 668 return [idx.keys for idx in self.lindex] 669 670 @property 671 def lencomplete(self): 672 '''number of values if complete (prod(idxlen primary))''' 673 primary = self.primary 674 return util.mul([self.idxlen[i] for i in primary]) 675 676 @property 677 def lenindex(self): 678 ''' number of indexes''' 679 return len(self.lindex) 680 681 @property 682 def lenidx(self): 683 ''' number of idx''' 684 return len(self.lidx) 685 686 @property 687 def lidx(self): 688 '''list of idx''' 689 return [self.lindex[i] for i in self.lidxrow] 690 691 @property 692 def lisvar(self): 693 '''list of boolean : True if Field is var''' 694 return [name in self.lvarname for name in self.lname] 695 696 @property 697 def lvar(self): 698 '''list of var''' 699 return [self.lindex[i] for i in self.lvarrow] 700 701 @property 702 def lvarname(self): 703 ''' list of variable Field name''' 704 return self.analysis.getvarname() 705 706 @property 707 def lunicrow(self): 708 '''list of unic idx row''' 709 return [self.lname.index(name) for name in self.lunicname] 710 711 @property 712 def lvarrow(self): 713 '''list of var row''' 714 return [self.lname.index(name) for name in self.lvarname] 715 716 @property 717 def lidxrow(self): 718 '''list of idx row''' 719 return [i for i in range(self.lenindex) if i not in self.lvarrow] 720 721 @property 722 def lunicname(self): 723 ''' list of unique index name''' 724 return [idx.name for idx in self.lindex if len(idx.codec) == 1] 725 726 @property 727 def lname(self): 728 ''' list of index name''' 729 return [idx.name for idx in self.lindex] 730 731 @property 732 def primary(self): 733 ''' list of primary idx''' 734 return self.analysis.getprimary() 735 736 @property 737 def primaryname(self): 738 ''' list of primary name''' 739 return [self.lidx[idx].name for idx in self.primary] 740 741 @property 742 def secondary(self): 743 ''' list of secondary idx''' 744 return self.analysis.getsecondary() 745 746 @property 747 def secondaryname(self): 748 ''' list of secondary name''' 749 return [self.lindex[idx].name for idx in self.secondary] 750 751 @property 752 def setidx(self): 753 '''list of codec for each idx''' 754 return [idx.codec for idx in self.lidx] 755 756 @property 757 def tiindex(self): 758 ''' list of keys for each record''' 759 return util.list(list(zip(*self.iindex))) 760 761 @property 762 def zip(self): 763 '''return a zip format for transpose(extidx) : tuple(tuple(rec))''' 764 textidx = util.transpose(self.extidx) 765 if not textidx: 766 return None 767 return tuple(tuple(idx) for idx in textidx)
An Dataset
is a representation of an indexed list.
Attributes (for @property see methods) :
- lindex : list of Field
- analysis : Analysis object (data structure)
The methods defined in this class are :
constructor (@classmethod))
abstract static methods (@abstractmethod, @staticmethod)
dynamic value - module analysis (getters @property)
Dataset.extidx
Dataset.extidxext
Dataset.groups
Dataset.idxname
Dataset.idxlen
Dataset.iidx
Dataset.lenidx
Dataset.lidx
Dataset.lidxrow
Dataset.lisvar
Dataset.lvar
Dataset.lvarname
Dataset.lvarrow
Dataset.lunicname
Dataset.lunicrow
Dataset.primaryname
Dataset.setidx
Dataset.zip
dynamic value (getters @property)
global value (getters @property)
Dataset.category
Dataset.complete
Dataset.consistent
Dataset.dimension
Dataset.lencomplete
Dataset.primary
Dataset.secondary
selecting - infos methods (observation.dataset_structure.DatasetStructure
)
Dataset.couplingmatrix
Dataset.idxrecord
Dataset.indexinfos
Dataset.indicator
Dataset.iscanonorder
Dataset.isinrecord
Dataset.keytoval
Dataset.loc
Dataset.nindex
Dataset.record
Dataset.recidx
Dataset.recvar
Dataset.tree
Dataset.valtokey
add - update methods (observation.dataset_structure.DatasetStructure
)
Dataset.add
Dataset.addindex
Dataset.append
Dataset.delindex
Dataset.delrecord
Dataset.orindex
Dataset.renameindex
Dataset.setvar
Dataset.setname
Dataset.updateindex
structure management - methods (observation.dataset_structure.DatasetStructure
)
Dataset.applyfilter
Dataset.coupling
Dataset.full
Dataset.getduplicates
Dataset.mix
Dataset.merging
Dataset.reindex
Dataset.reorder
Dataset.setfilter
Dataset.sort
Dataset.swapindex
Dataset.setcanonorder
Dataset.tostdcodec
exports methods (observation.dataset_interface.DatasetInterface
)
Dataset.json
Dataset.plot
Dataset.to_obj
Dataset.to_csv
Dataset.to_dataframe
Dataset.to_file
Dataset.to_ntv
Dataset.to_obj
Dataset.to_xarray
Dataset.view
Dataset.vlist
Dataset.voxel
171 def __init__(self, listidx=None, reindex=True): 172 ''' 173 Dataset constructor. 174 175 *Parameters* 176 177 - **listidx** : list (default None) - list of Field data 178 - **reindex** : boolean (default True) - if True, default codec for each Field''' 179 180 self.name = self.__class__.__name__ 181 self.field = self.field_class 182 self.analysis = Analysis(self) 183 self.lindex = [] 184 if listidx.__class__.__name__ in ['Dataset', 'Observation', 'Ndataset', 'Sdataset']: 185 self.lindex = [copy(idx) for idx in listidx.lindex] 186 return 187 if not listidx: 188 return 189 self.lindex = listidx 190 if reindex: 191 self.reindex() 192 self.analysis.actualize() 193 return
Dataset constructor.
Parameters
- listidx : list (default None) - list of Field data
- reindex : boolean (default True) - if True, default codec for each Field
242 @classmethod 243 def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True, 244 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}): 245 ''' 246 Dataset constructor (from a csv file). Each column represents index values. 247 248 *Parameters* 249 250 - **filename** : string (default 'dataset.csv'), name of the file to read 251 - **header** : boolean (default True). If True, the first raw is dedicated to names 252 - **nrow** : integer (default None). Number of row. If None, all the row else nrow 253 - **optcsv** : dict (default : quoting) - see csv.reader options''' 254 if not optcsv: 255 optcsv = {} 256 if not nrow: 257 nrow = -1 258 with open(filename, newline='', encoding="utf-8") as file: 259 reader = csv.reader(file, **optcsv) 260 irow = 0 261 for row in reader: 262 if irow == nrow: 263 break 264 if irow == 0: 265 idxval = [[] for i in range(len(row))] 266 idxname = [''] * len(row) 267 if irow == 0 and header: 268 idxname = row 269 else: 270 for i in range(len(row)): 271 if decode_json: 272 try: 273 idxval[i].append(json.loads(row[i])) 274 except: 275 idxval[i].append(row[i]) 276 else: 277 idxval[i].append(row[i]) 278 irow += 1 279 lindex = [cls.field_class.from_ntv({name:idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)] 280 return cls(listidx=lindex, reindex=True)
Dataset constructor (from a csv file). Each column represents index values.
Parameters
- filename : string (default 'dataset.csv'), name of the file to read
- header : boolean (default True). If True, the first raw is dedicated to names
- nrow : integer (default None). Number of row. If None, all the row else nrow
- optcsv : dict (default : quoting) - see csv.reader options
282 @classmethod 283 def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False): 284 ''' 285 Generate Object from file storage. 286 287 *Parameters* 288 289 - **filename** : string - file name (with path) 290 - **forcestring** : boolean (default False) - if True, 291 forces the UTF-8 data format, else the format is calculated 292 - **reindex** : boolean (default True) - if True, default codec for each Field 293 - **decode_str**: boolean (default False) - if True, string are loaded in json data 294 295 *Returns* : new Object''' 296 with open(filename, 'rb') as file: 297 btype = file.read(1) 298 if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring: 299 with open(filename, 'r', newline='', encoding="utf-8") as file: 300 bjson = file.read() 301 else: 302 with open(filename, 'rb') as file: 303 bjson = file.read() 304 return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)
Generate Object from file storage.
Parameters
- filename : string - file name (with path)
- forcestring : boolean (default False) - if True, forces the UTF-8 data format, else the format is calculated
- reindex : boolean (default True) - if True, default codec for each Field
- decode_str: boolean (default False) - if True, string are loaded in json data
Returns : new Object
318 @classmethod 319 def ntv(cls, ntv_value, reindex=True): 320 '''Generate an Dataset Object from a ntv_value 321 322 *Parameters* 323 324 - **ntv_value** : bytes, string, Ntv object to convert 325 - **reindex** : boolean (default True) - if True, default codec for each Field''' 326 return cls.from_ntv(ntv_value, reindex=reindex)
Generate an Dataset Object from a ntv_value
Parameters
- ntv_value : bytes, string, Ntv object to convert
- reindex : boolean (default True) - if True, default codec for each Field
328 @classmethod 329 def from_ntv(cls, ntv_value, reindex=True, decode_str=False): 330 '''Generate an Dataset Object from a ntv_value 331 332 *Parameters* 333 334 - **ntv_value** : bytes, string, Ntv object to convert 335 - **reindex** : boolean (default True) - if True, default codec for each Field 336 - **decode_str**: boolean (default False) - if True, string are loaded in json data''' 337 ntv = Ntv.obj(ntv_value, decode_str=decode_str) 338 if len(ntv) == 0: 339 return cls() 340 lidx = [list(cls.field_class.decode_ntv(ntvf)) for ntvf in ntv] 341 leng = max([idx[6] for idx in lidx]) 342 for ind in range(len(lidx)): 343 if lidx[ind][0] == '': 344 lidx[ind][0] = 'i'+str(ind) 345 NtvConnector.init_ntv_keys(ind, lidx, leng) 346 #Dataset._init_ntv_keys(ind, lidx, leng) 347 lindex = [cls.field_class(idx[2], idx[0], idx[4], None, # idx[1] pour le type, 348 reindex=reindex) for idx in lidx] 349 return cls(lindex, reindex=reindex)
Generate an Dataset Object from a ntv_value
Parameters
- ntv_value : bytes, string, Ntv object to convert
- reindex : boolean (default True) - if True, default codec for each Field
- decode_str: boolean (default False) - if True, string are loaded in json data
375 def merge(self, fillvalue=math.nan, reindex=False, simplename=False): 376 ''' 377 Merge method replaces Dataset objects included into its constituents. 378 379 *Parameters* 380 381 - **fillvalue** : object (default nan) - value used for the additional data 382 - **reindex** : boolean (default False) - if True, set default codec after transformation 383 - **simplename** : boolean (default False) - if True, new Field name are 384 the same as merged Field name else it is a composed name. 385 386 *Returns*: merged Dataset ''' 387 ilc = copy(self) 388 delname = [] 389 row = ilc[0] 390 if not isinstance(row, list): 391 row = [row] 392 merged, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname), 393 simplename=simplename) 394 if oldname and not oldname in merged.lname: 395 delname.append(oldname) 396 for ind in range(1, len(ilc)): 397 oldidx = ilc.nindex(oldname) 398 for name in newname: 399 ilc.addindex(self.field(oldidx.codec, name, oldidx.keys)) 400 row = ilc[ind] 401 if not isinstance(row, list): 402 row = [row] 403 rec, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname), 404 simplename=simplename) 405 if oldname and newname != [oldname]: 406 delname.append(oldname) 407 for name in newname: 408 oldidx = merged.nindex(oldname) 409 fillval = self.field.s_to_i(fillvalue) 410 merged.addindex( 411 self.field([fillval] * len(merged), name, oldidx.keys)) 412 merged += rec 413 for name in set(delname): 414 if name: 415 merged.delindex(name) 416 if reindex: 417 merged.reindex() 418 ilc.lindex = merged.lindex 419 return ilc
Merge method replaces Dataset objects included into its constituents.
Parameters
- fillvalue : object (default nan) - value used for the additional data
- reindex : boolean (default False) - if True, set default codec after transformation
- simplename : boolean (default False) - if True, new Field name are the same as merged Field name else it is a composed name.
Returns: merged Dataset
421 @classmethod 422 def ext(cls, idxval=None, idxname=None, reindex=True, fast=False): 423 ''' 424 Dataset constructor (external index). 425 426 *Parameters* 427 428 - **idxval** : list of Field or list of values (see data model) 429 - **idxname** : list of string (default None) - list of Field name (see data model)''' 430 if idxval is None: 431 idxval = [] 432 if not isinstance(idxval, list): 433 return None 434 val = [] 435 for idx in idxval: 436 if not isinstance(idx, list): 437 val.append([idx]) 438 else: 439 val.append(idx) 440 lenval = [len(idx) for idx in val] 441 if lenval and max(lenval) != min(lenval): 442 raise DatasetError('the length of Iindex are different') 443 length = lenval[0] if lenval else 0 444 idxname = [None] * len(val) if idxname is None else idxname 445 for ind, name in enumerate(idxname): 446 if name is None or name == '$default': 447 idxname[ind] = 'i'+str(ind) 448 lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex, 449 fast=fast) for codec, name in zip(val, idxname)] 450 return cls(lindex, reindex=False)
Dataset constructor (external index).
Parameters
- idxval : list of Field or list of values (see data model)
- idxname : list of string (default None) - list of Field name (see data model)
Inherited Members
- observation.dataset_structure.DatasetStructure
- add
- addindex
- append
- applyfilter
- couplingmatrix
- coupling
- delrecord
- delindex
- full
- getduplicates
- iscanonorder
- isinrecord
- idxrecord
- indexinfos
- indicator
- keytoval
- loc
- mix
- merging
- nindex
- orindex
- record
- recidx
- recvar
- reindex
- renameindex
- reorder
- setcanonorder
- setfilter
- setname
- sort
- swapindex
- tostdcodec
- tree
- updateindex
- valtokey
- observation.dataset_interface.DatasetInterface
- json
- plot
- to_csv
- to_dataframe
- to_file
- to_ntv
- to_xarray
- voxel
- view
- vlist