tab-dataset.tab_dataset.dataset
The dataset module is part of the tab-dataset package.
It contains the classes DatasetAnalysis, Cdataset for Dataset entities.
For more information, see the user guide or the github repository.
1# -*- coding: utf-8 -*- 2""" 3The `dataset` module is part of the `tab-dataset` package. 4 5It contains the classes `DatasetAnalysis`, `Cdataset` for Dataset entities. 6 7For more information, see the 8[user guide](https://loco-philippe.github.io/tab-dataset/docs/user_guide.html) 9or the [github repository](https://github.com/loco-philippe/tab-dataset). 10""" 11from collections import Counter 12from copy import copy 13import math 14import json 15import csv 16 17 18from tab_dataset.cfield import Cutil 19from tab_dataset.dataset_interface import DatasetInterface 20from tab_dataset.field import Nfield, Sfield 21from tab_dataset.cdataset import Cdataset, DatasetError 22 23FILTER = '$filter' 24 25class Sdataset(DatasetInterface, Cdataset): 26 # %% intro 27 ''' 28 `Sdataset` is a child class of Cdataset where internal value can be different 29 from external value (list is converted in tuple and dict in json-object). 30 31 One attribute is added: 'field' to define the 'field' class. 32 33 The methods defined in this class are : 34 35 *constructor (@classmethod)* 36 37 - `Sdataset.from_csv` 38 - `Sdataset.from_file` 39 - `Sdataset.merge` 40 - `Sdataset.ext` 41 - `Cdataset.ntv` 42 - `Cdataset.from_ntv` 43 44 *dynamic value - module analysis (getters @property)* 45 46 - `DatasetAnalysis.analysis` 47 - `DatasetAnalysis.anafields` 48 - `Sdataset.extidx` 49 - `Sdataset.extidxext` 50 - `DatasetAnalysis.field_partition` 51 - `Sdataset.idxname` 52 - `Sdataset.idxlen` 53 - `Sdataset.iidx` 54 - `Sdataset.lenidx` 55 - `Sdataset.lidx` 56 - `Sdataset.lidxrow` 57 - `Sdataset.lisvar` 58 - `Sdataset.lvar` 59 - `DatasetAnalysis.lvarname` 60 - `Sdataset.lvarrow` 61 - `Cdataset.lunicname` 62 - `Cdataset.lunicrow` 63 - `DatasetAnalysis.partitions` 64 - `DatasetAnalysis.primaryname` 65 - `DatasetAnalysis.relation` 66 - `DatasetAnalysis.secondaryname` 67 - `Sdataset.setidx` 68 - `Sdataset.zip` 69 70 *dynamic value (getters @property)* 71 72 - `Cdataset.keys` 73 - `Cdataset.iindex` 74 - `Cdataset.indexlen` 75 - `Cdataset.lenindex` 76 - `Cdataset.lname` 77 - `Cdataset.tiindex` 78 79 *global value (getters @property)* 80 81 - `DatasetAnalysis.complete` 82 - `Sdataset.consistent` 83 - `DatasetAnalysis.dimension` 84 - `Sdataset.primary` 85 - `Sdataset.secondary` 86 87 *selecting - infos methods* 88 89 - `Sdataset.idxrecord` 90 - `DatasetAnalysis.indexinfos` 91 - `DatasetAnalysis.indicator` 92 - `Sdataset.iscanonorder` 93 - `Sdataset.isinrecord` 94 - `Sdataset.keytoval` 95 - `Sdataset.loc` 96 - `Cdataset.nindex` 97 - `Sdataset.record` 98 - `Sdataset.recidx` 99 - `Sdataset.recvar` 100 - `Cdataset.to_analysis` 101 - `DatasetAnalysis.tree` 102 - `Sdataset.valtokey` 103 104 *add - update methods* 105 106 - `Cdataset.add` 107 - `Sdataset.addindex` 108 - `Sdataset.append` 109 - `Cdataset.delindex` 110 - `Sdataset.delrecord` 111 - `Sdataset.orindex` 112 - `Cdataset.renameindex` 113 - `Cdataset.setname` 114 - `Sdataset.updateindex` 115 116 *structure management - methods* 117 118 - `Sdataset.applyfilter` 119 - `Cdataset.check_relation` 120 - `Cdataset.check_relationship` 121 - `Sdataset.coupling` 122 - `Sdataset.full` 123 - `Sdataset.getduplicates` 124 - `Sdataset.mix` 125 - `Sdataset.merging` 126 - `Cdataset.reindex` 127 - `Cdataset.reorder` 128 - `Sdataset.setfilter` 129 - `Sdataset.sort` 130 - `Cdataset.swapindex` 131 - `Sdataset.setcanonorder` 132 - `Sdataset.tostdcodec` 133 134 *exports methods (`observation.dataset_interface.DatasetInterface`)* 135 136 - `Dataset.json` 137 - `Dataset.plot` 138 - `Dataset.to_obj` 139 - `Dataset.to_csv` 140 - `Dataset.to_dataframe` 141 - `Dataset.to_file` 142 - `Dataset.to_ntv` 143 - `Dataset.to_obj` 144 - `Dataset.to_xarray` 145 - `Dataset.view` 146 - `Dataset.vlist` 147 - `Dataset.voxel` 148 ''' 149 150 field_class = Sfield 151 152 def __init__(self, listidx=None, name=None, reindex=True): 153 ''' 154 Dataset constructor. 155 156 *Parameters* 157 158 - **listidx** : list (default None) - list of Field data 159 - **name** : string (default None) - name of the dataset 160 - **reindex** : boolean (default True) - if True, default codec for each Field''' 161 162 self.field = self.field_class 163 Cdataset.__init__(self, listidx, name, reindex=reindex) 164 165 @classmethod 166 def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True, 167 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}): 168 ''' 169 Dataset constructor (from a csv file). Each column represents index values. 170 171 *Parameters* 172 173 - **filename** : string (default 'dataset.csv'), name of the file to read 174 - **header** : boolean (default True). If True, the first raw is dedicated to names 175 - **nrow** : integer (default None). Number of row. If None, all the row else nrow 176 - **optcsv** : dict (default : quoting) - see csv.reader options''' 177 if not optcsv: 178 optcsv = {} 179 if not nrow: 180 nrow = -1 181 with open(filename, newline='', encoding="utf-8") as file: 182 reader = csv.reader(file, **optcsv) 183 irow = 0 184 for row in reader: 185 if irow == nrow: 186 break 187 if irow == 0: 188 idxval = [[] for i in range(len(row))] 189 idxname = [''] * len(row) 190 if irow == 0 and header: 191 idxname = row 192 else: 193 for i in range(len(row)): 194 if decode_json: 195 try: 196 idxval[i].append(json.loads(row[i])) 197 except: 198 idxval[i].append(row[i]) 199 else: 200 idxval[i].append(row[i]) 201 irow += 1 202 lindex = [cls.field_class.from_ntv( 203 {name: idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)] 204 return cls(listidx=lindex, reindex=True) 205 206 @classmethod 207 def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False): 208 ''' 209 Generate Object from file storage. 210 211 *Parameters* 212 213 - **filename** : string - file name (with path) 214 - **forcestring** : boolean (default False) - if True, 215 forces the UTF-8 data format, else the format is calculated 216 - **reindex** : boolean (default True) - if True, default codec for each Field 217 - **decode_str**: boolean (default False) - if True, string are loaded in json data 218 219 *Returns* : new Object''' 220 with open(filename, 'rb') as file: 221 btype = file.read(1) 222 if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring: 223 with open(filename, 'r', newline='', encoding="utf-8") as file: 224 bjson = file.read() 225 else: 226 with open(filename, 'rb') as file: 227 bjson = file.read() 228 return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str) 229 230 def merge(self, fillvalue=math.nan, reindex=False, simplename=False): 231 ''' 232 Merge method replaces Dataset objects included into its constituents. 233 234 *Parameters* 235 236 - **fillvalue** : object (default nan) - value used for the additional data 237 - **reindex** : boolean (default False) - if True, set default codec after transformation 238 - **simplename** : boolean (default False) - if True, new Field name are 239 the same as merged Field name else it is a composed name. 240 241 *Returns*: merged Dataset ''' 242 ilc = copy(self) 243 delname = [] 244 row = ilc[0] 245 if not isinstance(row, list): 246 row = [row] 247 merged, oldname, newname = self.__class__._mergerecord( 248 self.ext(row, ilc.lname), simplename=simplename, fillvalue=fillvalue, 249 reindex=reindex) 250 delname.append(oldname) 251 for ind in range(1, len(ilc)): 252 oldidx = ilc.nindex(oldname) 253 for name in newname: 254 ilc.addindex(self.field(oldidx.codec, name, oldidx.keys)) 255 row = ilc[ind] 256 if not isinstance(row, list): 257 row = [row] 258 rec, oldname, newname = self.__class__._mergerecord( 259 self.ext(row, ilc.lname), simplename=simplename) 260 if oldname and newname != [oldname]: 261 delname.append(oldname) 262 for name in newname: 263 oldidx = merged.nindex(oldname) 264 fillval = self.field.s_to_i(fillvalue) 265 merged.addindex( 266 self.field([fillval] * len(merged), name, oldidx.keys)) 267 merged += rec 268 for name in set(delname): 269 if name: 270 merged.delindex(name) 271 if reindex: 272 merged.reindex() 273 ilc.lindex = merged.lindex 274 return ilc 275 276 @classmethod 277 def ext(cls, idxval=None, idxname=None, reindex=True, fast=False): 278 ''' 279 Dataset constructor (external index). 280 281 *Parameters* 282 283 - **idxval** : list of Field or list of values (see data model) 284 - **idxname** : list of string (default None) - list of Field name (see data model)''' 285 if idxval is None: 286 idxval = [] 287 if not isinstance(idxval, list): 288 return None 289 val = [] 290 for idx in idxval: 291 if not isinstance(idx, list): 292 val.append([idx]) 293 else: 294 val.append(idx) 295 lenval = [len(idx) for idx in val] 296 if lenval and max(lenval) != min(lenval): 297 raise DatasetError('the length of Iindex are different') 298 length = lenval[0] if lenval else 0 299 idxname = [None] * len(val) if idxname is None else idxname 300 for ind, name in enumerate(idxname): 301 if name is None or name == '$default': 302 idxname[ind] = 'i'+str(ind) 303 lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex, 304 fast=fast) for codec, name in zip(val, idxname)] 305 return cls(lindex, reindex=False) 306 307# %% internal 308 @staticmethod 309 def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False, 310 fillvalue=math.nan, reindex=False): 311 row = rec[0] 312 if not isinstance(row, list): 313 row = [row] 314 var = -1 315 for ind, val in enumerate(row): 316 if val.__class__.__name__ in ['Sdataset', 'Ndataset']: 317 var = ind 318 break 319 if var < 0: 320 return (rec, None, []) 321 #ilis = row[var] 322 ilis = row[var].merge(simplename=simplename, fillvalue=fillvalue, reindex=reindex) 323 oldname = rec.lname[var] 324 if ilis.lname == ['i0']: 325 newname = [oldname] 326 ilis.setname(newname) 327 elif not simplename: 328 newname = [oldname + '_' + name for name in ilis.lname] 329 ilis.setname(newname) 330 else: 331 newname = copy(ilis.lname) 332 for name in rec.lname: 333 if name in newname: 334 newname.remove(name) 335 else: 336 updidx = name in ilis.lname and not updateidx 337 #ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)}, 338 ilis.addindex(ilis.field([rec.nindex(name)[0]] * len(ilis), name), 339 merge=mergeidx, update=updidx) 340 return (ilis, oldname, newname) 341 342# %% special 343 def __str__(self): 344 '''return string format for var and lidx''' 345 stri = '' 346 if self.lvar: 347 stri += 'variables :\n' 348 for idx in self.lvar: 349 stri += ' ' + str(idx) + '\n' 350 if self.lidx: 351 stri += 'index :\n' 352 for idx in self.lidx: 353 stri += ' ' + str(idx) + '\n' 354 return stri 355 356 def __add__(self, other): 357 ''' Add other's values to self's values in a new Dataset''' 358 newil = copy(self) 359 newil.__iadd__(other) 360 return newil 361 362 def __iadd__(self, other): 363 ''' Add other's values to self's values''' 364 return self.add(other, name=True, solve=False) 365 366 def __or__(self, other): 367 ''' Add other's index to self's index in a new Dataset''' 368 newil = copy(self) 369 newil.__ior__(other) 370 return newil 371 372 def __ior__(self, other): 373 ''' Add other's index to self's index''' 374 return self.orindex(other, first=False, merge=True, update=False) 375 376# %% property 377 @property 378 def consistent(self): 379 ''' True if all the record are different''' 380 selfiidx = self.iidx 381 if not selfiidx: 382 return True 383 return max(Counter(zip(*selfiidx)).values()) == 1 384 385 @property 386 def extidx(self): 387 '''idx values (see data model)''' 388 return [idx.values for idx in self.lidx] 389 390 @property 391 def extidxext(self): 392 '''idx val (see data model)''' 393 return [idx.val for idx in self.lidx] 394 395 @property 396 def idxname(self): 397 ''' list of idx name''' 398 return [idx.name for idx in self.lidx] 399 400 @property 401 def idxlen(self): 402 ''' list of idx codec length''' 403 return [len(idx.codec) for idx in self.lidx] 404 405 @property 406 def iidx(self): 407 ''' list of keys for each idx''' 408 return [idx.keys for idx in self.lidx] 409 410 @property 411 def lenidx(self): 412 ''' number of idx''' 413 return len(self.lidx) 414 415 @property 416 def lidx(self): 417 '''list of idx''' 418 return [self.lindex[i] for i in self.lidxrow] 419 420 @property 421 def lisvar(self): 422 '''list of boolean : True if Field is var''' 423 return [name in self.lvarname for name in self.lname] 424 425 @property 426 def lvar(self): 427 '''list of var''' 428 return [self.lindex[i] for i in self.lvarrow] 429 430 @property 431 def lvarrow(self): 432 '''list of var row''' 433 return [self.lname.index(name) for name in self.lvarname] 434 435 @property 436 def lidxrow(self): 437 '''list of idx row''' 438 return [i for i in range(self.lenindex) if i not in self.lvarrow] 439 440 @property 441 def primary(self): 442 ''' list of primary idx''' 443 return [self.lidxrow.index(self.lname.index(name)) for name in self.primaryname] 444 445 @property 446 def secondary(self): 447 ''' list of secondary idx''' 448 return [self.lidxrow.index(self.lname.index(name)) for name in self.secondaryname] 449 450 @property 451 def setidx(self): 452 '''list of codec for each idx''' 453 return [idx.codec for idx in self.lidx] 454 455 @property 456 def zip(self): 457 '''return a zip format for transpose(extidx) : tuple(tuple(rec))''' 458 textidx = Cutil.transpose(self.extidx) 459 if not textidx: 460 return None 461 return tuple(tuple(idx) for idx in textidx) 462 463 # %% structure 464 def addindex(self, index, first=False, merge=False, update=False): 465 '''add a new index. 466 467 *Parameters* 468 469 - **index** : Field - index to add (can be index Ntv representation) 470 - **first** : If True insert index at the first row, else at the end 471 - **merge** : create a new index if merge is False 472 - **update** : if True, update actual values if index name is present (and merge is True) 473 474 *Returns* : none ''' 475 idx = self.field.ntv(index) 476 idxname = self.lname 477 if len(idx) != len(self) and len(self) > 0: 478 raise DatasetError('sizes are different') 479 if not idx.name in idxname: 480 if first: 481 self.lindex.insert(0, idx) 482 else: 483 self.lindex.append(idx) 484 elif not merge: # si idx.name in idxname 485 while idx.name in idxname: 486 idx.name += '(2)' 487 if first: 488 self.lindex.insert(0, idx) 489 else: 490 self.lindex.append(idx) 491 elif update: # si merge et si idx.name in idxname 492 self.lindex[idxname.index(idx.name)].setlistvalue(idx.values) 493 494 def append(self, record, unique=False): 495 '''add a new record. 496 497 *Parameters* 498 499 - **record** : list of new index values to add to Dataset 500 - **unique** : boolean (default False) - Append isn't done if unique 501 is True and record present 502 503 *Returns* : list - key record''' 504 if self.lenindex != len(record): 505 raise DatasetError('len(record) not consistent') 506 record = self.field.l_to_i(record) 507 if self.isinrecord(self.idxrecord(record), False) and unique: 508 return None 509 return [self.lindex[i].append(record[i]) for i in range(self.lenindex)] 510 511 def applyfilter(self, reverse=False, filtname=FILTER, delfilter=True, inplace=True): 512 '''delete records with defined filter value. 513 Filter is deleted after record filtering. 514 515 *Parameters* 516 517 - **reverse** : boolean (default False) - delete record with filter's 518 value is reverse 519 - **filtname** : string (default FILTER) - Name of the filter Field added 520 - **delfilter** : boolean (default True) - If True, delete filter's Field 521 - **inplace** : boolean (default True) - if True, filter is apply to self, 522 523 *Returns* : self or new Dataset''' 524 if not filtname in self.lname: 525 return None 526 if inplace: 527 ilis = self 528 else: 529 ilis = copy(self) 530 ifilt = ilis.lname.index(filtname) 531 ilis.sort([ifilt], reverse=not reverse, func=None) 532 lisind = ilis.lindex[ifilt].recordfromvalue(reverse) 533 if lisind: 534 minind = min(lisind) 535 for idx in ilis.lindex: 536 del idx.keys[minind:] 537 if inplace: 538 self.delindex(filtname) 539 else: 540 ilis.delindex(filtname) 541 if delfilter: 542 self.delindex(filtname) 543 ilis.reindex() 544 return ilis 545 546 def coupling(self, derived=True, level=0.1): 547 '''Transform idx with low dist in coupled or derived indexes (codec extension). 548 549 *Parameters* 550 551 - **level** : float (default 0.1) - param threshold to apply coupling. 552 - **derived** : boolean (default : True). If True, indexes are derived, 553 else coupled. 554 555 *Returns* : None''' 556 ana = self.analysis 557 child = [[]] * len(ana) 558 childroot = [] 559 level = level * len(self) 560 for idx in range(self.lenindex): 561 if derived: 562 iparent = ana.fields[idx].p_distomin.index 563 else: 564 iparent = ana.fields[idx].p_distance.index 565 if iparent == -1: 566 childroot.append(idx) 567 else: 568 child[iparent].append(idx) 569 for idx in childroot: 570 self._couplingidx(idx, child, derived, level, ana) 571 572 def _couplingidx(self, idx, child, derived, level, ana): 573 ''' Field coupling (included childrens of the Field)''' 574 fields = ana.fields 575 if derived: 576 iparent = fields[idx].p_distomin.index 577 dparent = ana.get_relation(*sorted([idx, iparent])).distomin 578 else: 579 iparent = fields[idx].p_distance.index 580 dparent = ana.get_relation(*sorted([idx, iparent])).distance 581 # if fields[idx].category in ('coupled', 'unique') or iparent == -1\ 582 if fields[idx].category in ('coupled', 'unique') \ 583 or dparent >= level or dparent == 0: 584 return 585 if child[idx]: 586 for childidx in child[idx]: 587 self._couplingidx(childidx, child, derived, level, ana) 588 self.lindex[iparent].coupling(self.lindex[idx], derived=derived, 589 duplicate=False) 590 return 591 592 def delrecord(self, record, extern=True): 593 '''remove a record. 594 595 *Parameters* 596 597 - **record** : list - index values to remove to Dataset 598 - **extern** : if True, compare record values to external representation 599 of self.value, else, internal 600 601 *Returns* : row deleted''' 602 self.reindex() 603 reckeys = self.valtokey(record, extern=extern) 604 if None in reckeys: 605 return None 606 row = self.tiindex.index(reckeys) 607 for idx in self: 608 del idx[row] 609 return row 610 611 def _fullindex(self, ind, keysadd, indexname, varname, leng, fillvalue, fillextern): 612 if not varname: 613 varname = [] 614 idx = self.lindex[ind] 615 lenadd = len(keysadd[0]) 616 if len(idx) == leng: 617 return 618 #inf = self.indexinfos() 619 ana = self.anafields 620 parent = ana[ind].p_derived.view('index') 621 # if inf[ind]['cat'] == 'unique': 622 if ana[ind].category == 'unique': 623 idx.set_keys(idx.keys + [0] * lenadd) 624 elif self.lname[ind] in indexname: 625 idx.set_keys(idx.keys + keysadd[indexname.index(self.lname[ind])]) 626 # elif inf[ind]['parent'] == -1 or self.lname[ind] in varname: 627 elif parent == -1 or self.lname[ind] in varname: 628 fillval = fillvalue 629 if fillextern: 630 fillval = self.field.s_to_i(fillvalue) 631 idx.set_keys(idx.keys + [len(idx.codec)] * len(keysadd[0])) 632 idx.set_codec(idx.codec + [fillval]) 633 else: 634 #parent = inf[ind]['parent'] 635 if len(self.lindex[parent]) != leng: 636 self._fullindex(parent, keysadd, indexname, varname, leng, 637 fillvalue, fillextern) 638 # if inf[ind]['cat'] == 'coupled': 639 if ana[ind].category == 'coupled': 640 idx.tocoupled(self.lindex[parent], coupling=True) 641 else: 642 idx.tocoupled(self.lindex[parent], coupling=False) 643 644 def full(self, reindex=False, idxname=None, varname=None, fillvalue='-', 645 fillextern=True, inplace=True, canonical=True): 646 '''tranform a list of indexes in crossed indexes (value extension). 647 648 *Parameters* 649 650 - **idxname** : list of string - name of indexes to transform 651 - **varname** : string - name of indexes to use 652 - **reindex** : boolean (default False) - if True, set default codec 653 before transformation 654 - **fillvalue** : object value used for var extension 655 - **fillextern** : boolean(default True) - if True, fillvalue is converted 656 to internal value 657 - **inplace** : boolean (default True) - if True, filter is apply to self, 658 - **canonical** : boolean (default True) - if True, Field are ordered 659 in canonical order 660 661 *Returns* : self or new Dataset''' 662 ilis = self if inplace else copy(self) 663 if not idxname: 664 idxname = ilis.primaryname 665 if reindex: 666 ilis.reindex() 667 keysadd = Cutil.idxfull([ilis.nindex(name) for name in idxname]) 668 if keysadd and len(keysadd) != 0: 669 newlen = len(keysadd[0]) + len(ilis) 670 for ind in range(ilis.lenindex): 671 ilis._fullindex(ind, keysadd, idxname, varname, newlen, 672 fillvalue, fillextern) 673 if canonical: 674 ilis.setcanonorder() 675 return ilis 676 677 def getduplicates(self, indexname=None, resindex=None, indexview=None): 678 '''check duplicate cod in a list of indexes. Result is add in a new 679 index or returned. 680 681 *Parameters* 682 683 - **indexname** : list of string (default none) - name of indexes to check 684 (if None, all Field) 685 - **resindex** : string (default None) - Add a new index named resindex 686 with check result (False if duplicate) 687 - **indexview** : list of str (default None) - list of fields to return 688 689 *Returns* : list of int - list of rows with duplicate cod ''' 690 if not indexname: 691 indexname = self.lname 692 duplicates = [] 693 for name in indexname: 694 duplicates += self.nindex(name).getduplicates() 695 if resindex and isinstance(resindex, str): 696 newidx = self.field([True] * len(self), name=resindex) 697 for item in duplicates: 698 newidx[item] = False 699 self.addindex(newidx) 700 dupl = tuple(set(duplicates)) 701 if not indexview: 702 return dupl 703 return [tuple(self.record(ind, indexview)) for ind in dupl] 704 705 def iscanonorder(self): 706 '''return True if primary indexes have canonical ordered keys''' 707 primary = self.primary 708 canonorder = Cutil.canonorder( 709 [len(self.lidx[idx].codec) for idx in primary]) 710 return canonorder == [self.lidx[idx].keys for idx in primary] 711 712 def isinrecord(self, record, extern=True): 713 '''Check if record is present in self. 714 715 *Parameters* 716 717 - **record** : list - value for each Field 718 - **extern** : if True, compare record values to external representation 719 of self.value, else, internal 720 721 *Returns boolean* : True if found''' 722 if extern: 723 return record in Cutil.transpose(self.extidxext) 724 return record in Cutil.transpose(self.extidx) 725 726 def idxrecord(self, record): 727 '''return rec array (without variable) from complete record (with variable)''' 728 return [record[self.lidxrow[i]] for i in range(len(self.lidxrow))] 729 730 def keytoval(self, listkey, extern=True): 731 ''' 732 convert a keys list (key for each index) to a values list (value for each index). 733 734 *Parameters* 735 736 - **listkey** : key for each index 737 - **extern** : boolean (default True) - if True, compare rec to val else to values 738 739 *Returns* 740 741 - **list** : value for each index''' 742 return [idx.keytoval(key, extern=extern) for idx, key in zip(self.lindex, listkey)] 743 744 def loc(self, rec, extern=True, row=False): 745 ''' 746 Return record or row corresponding to a list of idx values. 747 748 *Parameters* 749 750 - **rec** : list - value for each idx 751 - **extern** : boolean (default True) - if True, compare rec to val, 752 else to values 753 - **row** : Boolean (default False) - if True, return list of row, 754 else list of records 755 756 *Returns* 757 758 - **object** : variable value or None if not found''' 759 locrow = None 760 try: 761 if len(rec) == self.lenindex: 762 locrow = list(set.intersection(*[set(self.lindex[i].loc(rec[i], extern)) 763 for i in range(self.lenindex)])) 764 elif len(rec) == self.lenidx: 765 locrow = list(set.intersection(*[set(self.lidx[i].loc(rec[i], extern)) 766 for i in range(self.lenidx)])) 767 except: 768 pass 769 if locrow is None: 770 return None 771 if row: 772 return locrow 773 return [self.record(locr, extern=extern) for locr in locrow] 774 775 def mix(self, other, fillvalue=None): 776 '''add other Field not included in self and add other's values''' 777 sname = set(self.lname) 778 oname = set(other.lname) 779 newself = copy(self) 780 copother = copy(other) 781 for nam in oname - sname: 782 newself.addindex({nam: [fillvalue] * len(newself)}) 783 for nam in sname - oname: 784 copother.addindex({nam: [fillvalue] * len(copother)}) 785 return newself.add(copother, name=True, solve=False) 786 787 def merging(self, listname=None): 788 ''' add a new Field build with Field define in listname. 789 Values of the new Field are set of values in listname Field''' 790 #self.addindex(Field.merging([self.nindex(name) for name in listname])) 791 self.addindex(Sfield.merging([self.nindex(name) for name in listname])) 792 793 def orindex(self, other, first=False, merge=False, update=False): 794 ''' Add other's index to self's index (with same length) 795 796 *Parameters* 797 798 - **other** : self class - object to add 799 - **first** : Boolean (default False) - If True insert indexes 800 at the first row, else at the end 801 - **merge** : Boolean (default False) - create a new index 802 if merge is False 803 - **update** : Boolean (default False) - if True, update actual 804 values if index name is present (and merge is True) 805 806 *Returns* : none ''' 807 if len(self) != 0 and len(self) != len(other) and len(other) != 0: 808 raise DatasetError("the sizes are not equal") 809 otherc = copy(other) 810 for idx in otherc.lindex: 811 self.addindex(idx, first=first, merge=merge, update=update) 812 return self 813 814 def record(self, row, indexname=None, extern=True): 815 '''return the record at the row 816 817 *Parameters* 818 819 - **row** : int - row of the record 820 - **extern** : boolean (default True) - if True, return val record else 821 value record 822 - **indexname** : list of str (default None) - list of fields to return 823 *Returns* 824 825 - **list** : val record or value record''' 826 if indexname is None: 827 indexname = self.lname 828 if extern: 829 record = [idx.val[row] for idx in self.lindex] 830 #record = [idx.values[row].to_obj() for idx in self.lindex] 831 #record = [idx.valrow(row) for idx in self.lindex] 832 else: 833 record = [idx.values[row] for idx in self.lindex] 834 return [record[self.lname.index(name)] for name in indexname] 835 836 def recidx(self, row, extern=True): 837 '''return the list of idx val or values at the row 838 839 *Parameters* 840 841 - **row** : int - row of the record 842 - **extern** : boolean (default True) - if True, return val rec else value rec 843 844 *Returns* 845 846 - **list** : val or value for idx''' 847 if extern: 848 return [idx.values[row].to_obj() for idx in self.lidx] 849 # return [idx.valrow(row) for idx in self.lidx] 850 return [idx.values[row] for idx in self.lidx] 851 852 def recvar(self, row, extern=True): 853 '''return the list of var val or values at the row 854 855 *Parameters* 856 857 - **row** : int - row of the record 858 - **extern** : boolean (default True) - if True, return val rec else value rec 859 860 *Returns* 861 862 - **list** : val or value for var''' 863 if extern: 864 return [idx.values[row].to_obj() for idx in self.lvar] 865 # return [idx.valrow(row) for idx in self.lvar] 866 return [idx.values[row] for idx in self.lvar] 867 868 def setcanonorder(self, reindex=False): 869 '''Set the canonical index order : primary - secondary/unique - variable. 870 Set the canonical keys order : ordered keys in the first columns. 871 872 *Parameters* 873 - **reindex** : boolean (default False) - if True, set default codec after 874 transformation 875 876 *Return* : self''' 877 order = self.primaryname 878 order += self.secondaryname 879 order += self.lvarname 880 order += self.lunicname 881 self.swapindex(order) 882 self.sort(reindex=reindex) 883 # self.analysis.actualize() 884 return self 885 886 def setfilter(self, filt=None, first=False, filtname=FILTER, unique=False): 887 '''Add a filter index with boolean values 888 889 - **filt** : list of boolean - values of the filter idx to add 890 - **first** : boolean (default False) - If True insert index at the first row, 891 else at the end 892 - **filtname** : string (default FILTER) - Name of the filter Field added 893 894 *Returns* : self''' 895 if not filt: 896 filt = [True] * len(self) 897 idx = self.field(filt, name=filtname) 898 idx.reindex() 899 if not idx.cod in ([True, False], [False, True], [True], [False]): 900 raise DatasetError('filt is not consistent') 901 if unique: 902 for name in self.lname: 903 if name[:len(FILTER)] == FILTER: 904 self.delindex(FILTER) 905 self.addindex(idx, first=first) 906 return self 907 908 def sort(self, order=None, reverse=False, func=str, reindex=True): 909 '''Sort data following the index order and apply the ascending or descending 910 sort function to values. 911 912 *Parameters* 913 914 - **order** : list (default None)- new order of index to apply. If None or [], 915 the sort function is applied to the existing order of indexes. 916 - **reverse** : boolean (default False)- ascending if True, descending if False 917 - **func** : function (default str) - parameter key used in the sorted function 918 - **reindex** : boolean (default True) - if True, apply a new codec order (key = func) 919 920 *Returns* : self''' 921 if not order: 922 order = list(range(self.lenindex)) 923 orderfull = order + list(set(range(self.lenindex)) - set(order)) 924 if reindex: 925 for i in order: 926 self.lindex[i].reindex(codec=sorted( 927 self.lindex[i].codec, key=func)) 928 newidx = Cutil.transpose(sorted(Cutil.transpose( 929 [self.lindex[orderfull[i]].keys for i in range(self.lenindex)]), 930 reverse=reverse)) 931 for i in range(self.lenindex): 932 self.lindex[orderfull[i]].set_keys(newidx[i]) 933 return self 934 935 """ 936 def swapindex(self, order): 937 ''' 938 Change the order of the index . 939 940 *Parameters* 941 942 - **order** : list of int or list of name - new order of index to apply. 943 944 *Returns* : self ''' 945 if self.lenindex != len(order): 946 raise DatasetError('length of order and Dataset different') 947 if not order or isinstance(order[0], int): 948 self.lindex = [self.lindex[ind] for ind in order] 949 elif isinstance(order[0], str): 950 self.lindex = [self.nindex(name) for name in order] 951 return self 952 """ 953 954 def tostdcodec(self, inplace=False, full=True): 955 '''Transform all codec in full or default codec. 956 957 *Parameters* 958 959 - **inplace** : boolean (default False) - if True apply transformation 960 to self, else to a new Dataset 961 - **full** : boolean (default True)- full codec if True, default if False 962 963 964 *Return Dataset* : self or new Dataset''' 965 lindex = [idx.tostdcodec(inplace=False, full=full) 966 for idx in self.lindex] 967 if inplace: 968 self.lindex = lindex 969 return self 970 return self.__class__(lindex, self.lvarname) 971 972 def updateindex(self, listvalue, index, extern=True): 973 '''update values of an index. 974 975 *Parameters* 976 977 - **listvalue** : list - index values to replace 978 - **index** : integer - index row to update 979 - **extern** : if True, the listvalue has external representation, else internal 980 981 *Returns* : none ''' 982 self.lindex[index].setlistvalue(listvalue, extern=extern) 983 984 def valtokey(self, rec, extern=True): 985 '''convert a record list (value or val for each idx) to a key list 986 (key for each index). 987 988 *Parameters* 989 990 - **rec** : list of value or val for each index 991 - **extern** : if True, the rec value has external representation, else internal 992 993 *Returns* 994 995 - **list of int** : record key for each index''' 996 return [idx.valtokey(val, extern=extern) for idx, val in zip(self.lindex, rec)] 997 998class Ndataset(Sdataset): 999 # %% Ndataset 1000 ''' 1001 `Ndataset` is a child class of Cdataset where internal value are NTV entities. 1002 1003 All the methods are the same as `Sdataset`. 1004 ''' 1005 field_class = Nfield
26class Sdataset(DatasetInterface, Cdataset): 27 # %% intro 28 ''' 29 `Sdataset` is a child class of Cdataset where internal value can be different 30 from external value (list is converted in tuple and dict in json-object). 31 32 One attribute is added: 'field' to define the 'field' class. 33 34 The methods defined in this class are : 35 36 *constructor (@classmethod)* 37 38 - `Sdataset.from_csv` 39 - `Sdataset.from_file` 40 - `Sdataset.merge` 41 - `Sdataset.ext` 42 - `Cdataset.ntv` 43 - `Cdataset.from_ntv` 44 45 *dynamic value - module analysis (getters @property)* 46 47 - `DatasetAnalysis.analysis` 48 - `DatasetAnalysis.anafields` 49 - `Sdataset.extidx` 50 - `Sdataset.extidxext` 51 - `DatasetAnalysis.field_partition` 52 - `Sdataset.idxname` 53 - `Sdataset.idxlen` 54 - `Sdataset.iidx` 55 - `Sdataset.lenidx` 56 - `Sdataset.lidx` 57 - `Sdataset.lidxrow` 58 - `Sdataset.lisvar` 59 - `Sdataset.lvar` 60 - `DatasetAnalysis.lvarname` 61 - `Sdataset.lvarrow` 62 - `Cdataset.lunicname` 63 - `Cdataset.lunicrow` 64 - `DatasetAnalysis.partitions` 65 - `DatasetAnalysis.primaryname` 66 - `DatasetAnalysis.relation` 67 - `DatasetAnalysis.secondaryname` 68 - `Sdataset.setidx` 69 - `Sdataset.zip` 70 71 *dynamic value (getters @property)* 72 73 - `Cdataset.keys` 74 - `Cdataset.iindex` 75 - `Cdataset.indexlen` 76 - `Cdataset.lenindex` 77 - `Cdataset.lname` 78 - `Cdataset.tiindex` 79 80 *global value (getters @property)* 81 82 - `DatasetAnalysis.complete` 83 - `Sdataset.consistent` 84 - `DatasetAnalysis.dimension` 85 - `Sdataset.primary` 86 - `Sdataset.secondary` 87 88 *selecting - infos methods* 89 90 - `Sdataset.idxrecord` 91 - `DatasetAnalysis.indexinfos` 92 - `DatasetAnalysis.indicator` 93 - `Sdataset.iscanonorder` 94 - `Sdataset.isinrecord` 95 - `Sdataset.keytoval` 96 - `Sdataset.loc` 97 - `Cdataset.nindex` 98 - `Sdataset.record` 99 - `Sdataset.recidx` 100 - `Sdataset.recvar` 101 - `Cdataset.to_analysis` 102 - `DatasetAnalysis.tree` 103 - `Sdataset.valtokey` 104 105 *add - update methods* 106 107 - `Cdataset.add` 108 - `Sdataset.addindex` 109 - `Sdataset.append` 110 - `Cdataset.delindex` 111 - `Sdataset.delrecord` 112 - `Sdataset.orindex` 113 - `Cdataset.renameindex` 114 - `Cdataset.setname` 115 - `Sdataset.updateindex` 116 117 *structure management - methods* 118 119 - `Sdataset.applyfilter` 120 - `Cdataset.check_relation` 121 - `Cdataset.check_relationship` 122 - `Sdataset.coupling` 123 - `Sdataset.full` 124 - `Sdataset.getduplicates` 125 - `Sdataset.mix` 126 - `Sdataset.merging` 127 - `Cdataset.reindex` 128 - `Cdataset.reorder` 129 - `Sdataset.setfilter` 130 - `Sdataset.sort` 131 - `Cdataset.swapindex` 132 - `Sdataset.setcanonorder` 133 - `Sdataset.tostdcodec` 134 135 *exports methods (`observation.dataset_interface.DatasetInterface`)* 136 137 - `Dataset.json` 138 - `Dataset.plot` 139 - `Dataset.to_obj` 140 - `Dataset.to_csv` 141 - `Dataset.to_dataframe` 142 - `Dataset.to_file` 143 - `Dataset.to_ntv` 144 - `Dataset.to_obj` 145 - `Dataset.to_xarray` 146 - `Dataset.view` 147 - `Dataset.vlist` 148 - `Dataset.voxel` 149 ''' 150 151 field_class = Sfield 152 153 def __init__(self, listidx=None, name=None, reindex=True): 154 ''' 155 Dataset constructor. 156 157 *Parameters* 158 159 - **listidx** : list (default None) - list of Field data 160 - **name** : string (default None) - name of the dataset 161 - **reindex** : boolean (default True) - if True, default codec for each Field''' 162 163 self.field = self.field_class 164 Cdataset.__init__(self, listidx, name, reindex=reindex) 165 166 @classmethod 167 def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True, 168 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}): 169 ''' 170 Dataset constructor (from a csv file). Each column represents index values. 171 172 *Parameters* 173 174 - **filename** : string (default 'dataset.csv'), name of the file to read 175 - **header** : boolean (default True). If True, the first raw is dedicated to names 176 - **nrow** : integer (default None). Number of row. If None, all the row else nrow 177 - **optcsv** : dict (default : quoting) - see csv.reader options''' 178 if not optcsv: 179 optcsv = {} 180 if not nrow: 181 nrow = -1 182 with open(filename, newline='', encoding="utf-8") as file: 183 reader = csv.reader(file, **optcsv) 184 irow = 0 185 for row in reader: 186 if irow == nrow: 187 break 188 if irow == 0: 189 idxval = [[] for i in range(len(row))] 190 idxname = [''] * len(row) 191 if irow == 0 and header: 192 idxname = row 193 else: 194 for i in range(len(row)): 195 if decode_json: 196 try: 197 idxval[i].append(json.loads(row[i])) 198 except: 199 idxval[i].append(row[i]) 200 else: 201 idxval[i].append(row[i]) 202 irow += 1 203 lindex = [cls.field_class.from_ntv( 204 {name: idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)] 205 return cls(listidx=lindex, reindex=True) 206 207 @classmethod 208 def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False): 209 ''' 210 Generate Object from file storage. 211 212 *Parameters* 213 214 - **filename** : string - file name (with path) 215 - **forcestring** : boolean (default False) - if True, 216 forces the UTF-8 data format, else the format is calculated 217 - **reindex** : boolean (default True) - if True, default codec for each Field 218 - **decode_str**: boolean (default False) - if True, string are loaded in json data 219 220 *Returns* : new Object''' 221 with open(filename, 'rb') as file: 222 btype = file.read(1) 223 if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring: 224 with open(filename, 'r', newline='', encoding="utf-8") as file: 225 bjson = file.read() 226 else: 227 with open(filename, 'rb') as file: 228 bjson = file.read() 229 return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str) 230 231 def merge(self, fillvalue=math.nan, reindex=False, simplename=False): 232 ''' 233 Merge method replaces Dataset objects included into its constituents. 234 235 *Parameters* 236 237 - **fillvalue** : object (default nan) - value used for the additional data 238 - **reindex** : boolean (default False) - if True, set default codec after transformation 239 - **simplename** : boolean (default False) - if True, new Field name are 240 the same as merged Field name else it is a composed name. 241 242 *Returns*: merged Dataset ''' 243 ilc = copy(self) 244 delname = [] 245 row = ilc[0] 246 if not isinstance(row, list): 247 row = [row] 248 merged, oldname, newname = self.__class__._mergerecord( 249 self.ext(row, ilc.lname), simplename=simplename, fillvalue=fillvalue, 250 reindex=reindex) 251 delname.append(oldname) 252 for ind in range(1, len(ilc)): 253 oldidx = ilc.nindex(oldname) 254 for name in newname: 255 ilc.addindex(self.field(oldidx.codec, name, oldidx.keys)) 256 row = ilc[ind] 257 if not isinstance(row, list): 258 row = [row] 259 rec, oldname, newname = self.__class__._mergerecord( 260 self.ext(row, ilc.lname), simplename=simplename) 261 if oldname and newname != [oldname]: 262 delname.append(oldname) 263 for name in newname: 264 oldidx = merged.nindex(oldname) 265 fillval = self.field.s_to_i(fillvalue) 266 merged.addindex( 267 self.field([fillval] * len(merged), name, oldidx.keys)) 268 merged += rec 269 for name in set(delname): 270 if name: 271 merged.delindex(name) 272 if reindex: 273 merged.reindex() 274 ilc.lindex = merged.lindex 275 return ilc 276 277 @classmethod 278 def ext(cls, idxval=None, idxname=None, reindex=True, fast=False): 279 ''' 280 Dataset constructor (external index). 281 282 *Parameters* 283 284 - **idxval** : list of Field or list of values (see data model) 285 - **idxname** : list of string (default None) - list of Field name (see data model)''' 286 if idxval is None: 287 idxval = [] 288 if not isinstance(idxval, list): 289 return None 290 val = [] 291 for idx in idxval: 292 if not isinstance(idx, list): 293 val.append([idx]) 294 else: 295 val.append(idx) 296 lenval = [len(idx) for idx in val] 297 if lenval and max(lenval) != min(lenval): 298 raise DatasetError('the length of Iindex are different') 299 length = lenval[0] if lenval else 0 300 idxname = [None] * len(val) if idxname is None else idxname 301 for ind, name in enumerate(idxname): 302 if name is None or name == '$default': 303 idxname[ind] = 'i'+str(ind) 304 lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex, 305 fast=fast) for codec, name in zip(val, idxname)] 306 return cls(lindex, reindex=False) 307 308# %% internal 309 @staticmethod 310 def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False, 311 fillvalue=math.nan, reindex=False): 312 row = rec[0] 313 if not isinstance(row, list): 314 row = [row] 315 var = -1 316 for ind, val in enumerate(row): 317 if val.__class__.__name__ in ['Sdataset', 'Ndataset']: 318 var = ind 319 break 320 if var < 0: 321 return (rec, None, []) 322 #ilis = row[var] 323 ilis = row[var].merge(simplename=simplename, fillvalue=fillvalue, reindex=reindex) 324 oldname = rec.lname[var] 325 if ilis.lname == ['i0']: 326 newname = [oldname] 327 ilis.setname(newname) 328 elif not simplename: 329 newname = [oldname + '_' + name for name in ilis.lname] 330 ilis.setname(newname) 331 else: 332 newname = copy(ilis.lname) 333 for name in rec.lname: 334 if name in newname: 335 newname.remove(name) 336 else: 337 updidx = name in ilis.lname and not updateidx 338 #ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)}, 339 ilis.addindex(ilis.field([rec.nindex(name)[0]] * len(ilis), name), 340 merge=mergeidx, update=updidx) 341 return (ilis, oldname, newname) 342 343# %% special 344 def __str__(self): 345 '''return string format for var and lidx''' 346 stri = '' 347 if self.lvar: 348 stri += 'variables :\n' 349 for idx in self.lvar: 350 stri += ' ' + str(idx) + '\n' 351 if self.lidx: 352 stri += 'index :\n' 353 for idx in self.lidx: 354 stri += ' ' + str(idx) + '\n' 355 return stri 356 357 def __add__(self, other): 358 ''' Add other's values to self's values in a new Dataset''' 359 newil = copy(self) 360 newil.__iadd__(other) 361 return newil 362 363 def __iadd__(self, other): 364 ''' Add other's values to self's values''' 365 return self.add(other, name=True, solve=False) 366 367 def __or__(self, other): 368 ''' Add other's index to self's index in a new Dataset''' 369 newil = copy(self) 370 newil.__ior__(other) 371 return newil 372 373 def __ior__(self, other): 374 ''' Add other's index to self's index''' 375 return self.orindex(other, first=False, merge=True, update=False) 376 377# %% property 378 @property 379 def consistent(self): 380 ''' True if all the record are different''' 381 selfiidx = self.iidx 382 if not selfiidx: 383 return True 384 return max(Counter(zip(*selfiidx)).values()) == 1 385 386 @property 387 def extidx(self): 388 '''idx values (see data model)''' 389 return [idx.values for idx in self.lidx] 390 391 @property 392 def extidxext(self): 393 '''idx val (see data model)''' 394 return [idx.val for idx in self.lidx] 395 396 @property 397 def idxname(self): 398 ''' list of idx name''' 399 return [idx.name for idx in self.lidx] 400 401 @property 402 def idxlen(self): 403 ''' list of idx codec length''' 404 return [len(idx.codec) for idx in self.lidx] 405 406 @property 407 def iidx(self): 408 ''' list of keys for each idx''' 409 return [idx.keys for idx in self.lidx] 410 411 @property 412 def lenidx(self): 413 ''' number of idx''' 414 return len(self.lidx) 415 416 @property 417 def lidx(self): 418 '''list of idx''' 419 return [self.lindex[i] for i in self.lidxrow] 420 421 @property 422 def lisvar(self): 423 '''list of boolean : True if Field is var''' 424 return [name in self.lvarname for name in self.lname] 425 426 @property 427 def lvar(self): 428 '''list of var''' 429 return [self.lindex[i] for i in self.lvarrow] 430 431 @property 432 def lvarrow(self): 433 '''list of var row''' 434 return [self.lname.index(name) for name in self.lvarname] 435 436 @property 437 def lidxrow(self): 438 '''list of idx row''' 439 return [i for i in range(self.lenindex) if i not in self.lvarrow] 440 441 @property 442 def primary(self): 443 ''' list of primary idx''' 444 return [self.lidxrow.index(self.lname.index(name)) for name in self.primaryname] 445 446 @property 447 def secondary(self): 448 ''' list of secondary idx''' 449 return [self.lidxrow.index(self.lname.index(name)) for name in self.secondaryname] 450 451 @property 452 def setidx(self): 453 '''list of codec for each idx''' 454 return [idx.codec for idx in self.lidx] 455 456 @property 457 def zip(self): 458 '''return a zip format for transpose(extidx) : tuple(tuple(rec))''' 459 textidx = Cutil.transpose(self.extidx) 460 if not textidx: 461 return None 462 return tuple(tuple(idx) for idx in textidx) 463 464 # %% structure 465 def addindex(self, index, first=False, merge=False, update=False): 466 '''add a new index. 467 468 *Parameters* 469 470 - **index** : Field - index to add (can be index Ntv representation) 471 - **first** : If True insert index at the first row, else at the end 472 - **merge** : create a new index if merge is False 473 - **update** : if True, update actual values if index name is present (and merge is True) 474 475 *Returns* : none ''' 476 idx = self.field.ntv(index) 477 idxname = self.lname 478 if len(idx) != len(self) and len(self) > 0: 479 raise DatasetError('sizes are different') 480 if not idx.name in idxname: 481 if first: 482 self.lindex.insert(0, idx) 483 else: 484 self.lindex.append(idx) 485 elif not merge: # si idx.name in idxname 486 while idx.name in idxname: 487 idx.name += '(2)' 488 if first: 489 self.lindex.insert(0, idx) 490 else: 491 self.lindex.append(idx) 492 elif update: # si merge et si idx.name in idxname 493 self.lindex[idxname.index(idx.name)].setlistvalue(idx.values) 494 495 def append(self, record, unique=False): 496 '''add a new record. 497 498 *Parameters* 499 500 - **record** : list of new index values to add to Dataset 501 - **unique** : boolean (default False) - Append isn't done if unique 502 is True and record present 503 504 *Returns* : list - key record''' 505 if self.lenindex != len(record): 506 raise DatasetError('len(record) not consistent') 507 record = self.field.l_to_i(record) 508 if self.isinrecord(self.idxrecord(record), False) and unique: 509 return None 510 return [self.lindex[i].append(record[i]) for i in range(self.lenindex)] 511 512 def applyfilter(self, reverse=False, filtname=FILTER, delfilter=True, inplace=True): 513 '''delete records with defined filter value. 514 Filter is deleted after record filtering. 515 516 *Parameters* 517 518 - **reverse** : boolean (default False) - delete record with filter's 519 value is reverse 520 - **filtname** : string (default FILTER) - Name of the filter Field added 521 - **delfilter** : boolean (default True) - If True, delete filter's Field 522 - **inplace** : boolean (default True) - if True, filter is apply to self, 523 524 *Returns* : self or new Dataset''' 525 if not filtname in self.lname: 526 return None 527 if inplace: 528 ilis = self 529 else: 530 ilis = copy(self) 531 ifilt = ilis.lname.index(filtname) 532 ilis.sort([ifilt], reverse=not reverse, func=None) 533 lisind = ilis.lindex[ifilt].recordfromvalue(reverse) 534 if lisind: 535 minind = min(lisind) 536 for idx in ilis.lindex: 537 del idx.keys[minind:] 538 if inplace: 539 self.delindex(filtname) 540 else: 541 ilis.delindex(filtname) 542 if delfilter: 543 self.delindex(filtname) 544 ilis.reindex() 545 return ilis 546 547 def coupling(self, derived=True, level=0.1): 548 '''Transform idx with low dist in coupled or derived indexes (codec extension). 549 550 *Parameters* 551 552 - **level** : float (default 0.1) - param threshold to apply coupling. 553 - **derived** : boolean (default : True). If True, indexes are derived, 554 else coupled. 555 556 *Returns* : None''' 557 ana = self.analysis 558 child = [[]] * len(ana) 559 childroot = [] 560 level = level * len(self) 561 for idx in range(self.lenindex): 562 if derived: 563 iparent = ana.fields[idx].p_distomin.index 564 else: 565 iparent = ana.fields[idx].p_distance.index 566 if iparent == -1: 567 childroot.append(idx) 568 else: 569 child[iparent].append(idx) 570 for idx in childroot: 571 self._couplingidx(idx, child, derived, level, ana) 572 573 def _couplingidx(self, idx, child, derived, level, ana): 574 ''' Field coupling (included childrens of the Field)''' 575 fields = ana.fields 576 if derived: 577 iparent = fields[idx].p_distomin.index 578 dparent = ana.get_relation(*sorted([idx, iparent])).distomin 579 else: 580 iparent = fields[idx].p_distance.index 581 dparent = ana.get_relation(*sorted([idx, iparent])).distance 582 # if fields[idx].category in ('coupled', 'unique') or iparent == -1\ 583 if fields[idx].category in ('coupled', 'unique') \ 584 or dparent >= level or dparent == 0: 585 return 586 if child[idx]: 587 for childidx in child[idx]: 588 self._couplingidx(childidx, child, derived, level, ana) 589 self.lindex[iparent].coupling(self.lindex[idx], derived=derived, 590 duplicate=False) 591 return 592 593 def delrecord(self, record, extern=True): 594 '''remove a record. 595 596 *Parameters* 597 598 - **record** : list - index values to remove to Dataset 599 - **extern** : if True, compare record values to external representation 600 of self.value, else, internal 601 602 *Returns* : row deleted''' 603 self.reindex() 604 reckeys = self.valtokey(record, extern=extern) 605 if None in reckeys: 606 return None 607 row = self.tiindex.index(reckeys) 608 for idx in self: 609 del idx[row] 610 return row 611 612 def _fullindex(self, ind, keysadd, indexname, varname, leng, fillvalue, fillextern): 613 if not varname: 614 varname = [] 615 idx = self.lindex[ind] 616 lenadd = len(keysadd[0]) 617 if len(idx) == leng: 618 return 619 #inf = self.indexinfos() 620 ana = self.anafields 621 parent = ana[ind].p_derived.view('index') 622 # if inf[ind]['cat'] == 'unique': 623 if ana[ind].category == 'unique': 624 idx.set_keys(idx.keys + [0] * lenadd) 625 elif self.lname[ind] in indexname: 626 idx.set_keys(idx.keys + keysadd[indexname.index(self.lname[ind])]) 627 # elif inf[ind]['parent'] == -1 or self.lname[ind] in varname: 628 elif parent == -1 or self.lname[ind] in varname: 629 fillval = fillvalue 630 if fillextern: 631 fillval = self.field.s_to_i(fillvalue) 632 idx.set_keys(idx.keys + [len(idx.codec)] * len(keysadd[0])) 633 idx.set_codec(idx.codec + [fillval]) 634 else: 635 #parent = inf[ind]['parent'] 636 if len(self.lindex[parent]) != leng: 637 self._fullindex(parent, keysadd, indexname, varname, leng, 638 fillvalue, fillextern) 639 # if inf[ind]['cat'] == 'coupled': 640 if ana[ind].category == 'coupled': 641 idx.tocoupled(self.lindex[parent], coupling=True) 642 else: 643 idx.tocoupled(self.lindex[parent], coupling=False) 644 645 def full(self, reindex=False, idxname=None, varname=None, fillvalue='-', 646 fillextern=True, inplace=True, canonical=True): 647 '''tranform a list of indexes in crossed indexes (value extension). 648 649 *Parameters* 650 651 - **idxname** : list of string - name of indexes to transform 652 - **varname** : string - name of indexes to use 653 - **reindex** : boolean (default False) - if True, set default codec 654 before transformation 655 - **fillvalue** : object value used for var extension 656 - **fillextern** : boolean(default True) - if True, fillvalue is converted 657 to internal value 658 - **inplace** : boolean (default True) - if True, filter is apply to self, 659 - **canonical** : boolean (default True) - if True, Field are ordered 660 in canonical order 661 662 *Returns* : self or new Dataset''' 663 ilis = self if inplace else copy(self) 664 if not idxname: 665 idxname = ilis.primaryname 666 if reindex: 667 ilis.reindex() 668 keysadd = Cutil.idxfull([ilis.nindex(name) for name in idxname]) 669 if keysadd and len(keysadd) != 0: 670 newlen = len(keysadd[0]) + len(ilis) 671 for ind in range(ilis.lenindex): 672 ilis._fullindex(ind, keysadd, idxname, varname, newlen, 673 fillvalue, fillextern) 674 if canonical: 675 ilis.setcanonorder() 676 return ilis 677 678 def getduplicates(self, indexname=None, resindex=None, indexview=None): 679 '''check duplicate cod in a list of indexes. Result is add in a new 680 index or returned. 681 682 *Parameters* 683 684 - **indexname** : list of string (default none) - name of indexes to check 685 (if None, all Field) 686 - **resindex** : string (default None) - Add a new index named resindex 687 with check result (False if duplicate) 688 - **indexview** : list of str (default None) - list of fields to return 689 690 *Returns* : list of int - list of rows with duplicate cod ''' 691 if not indexname: 692 indexname = self.lname 693 duplicates = [] 694 for name in indexname: 695 duplicates += self.nindex(name).getduplicates() 696 if resindex and isinstance(resindex, str): 697 newidx = self.field([True] * len(self), name=resindex) 698 for item in duplicates: 699 newidx[item] = False 700 self.addindex(newidx) 701 dupl = tuple(set(duplicates)) 702 if not indexview: 703 return dupl 704 return [tuple(self.record(ind, indexview)) for ind in dupl] 705 706 def iscanonorder(self): 707 '''return True if primary indexes have canonical ordered keys''' 708 primary = self.primary 709 canonorder = Cutil.canonorder( 710 [len(self.lidx[idx].codec) for idx in primary]) 711 return canonorder == [self.lidx[idx].keys for idx in primary] 712 713 def isinrecord(self, record, extern=True): 714 '''Check if record is present in self. 715 716 *Parameters* 717 718 - **record** : list - value for each Field 719 - **extern** : if True, compare record values to external representation 720 of self.value, else, internal 721 722 *Returns boolean* : True if found''' 723 if extern: 724 return record in Cutil.transpose(self.extidxext) 725 return record in Cutil.transpose(self.extidx) 726 727 def idxrecord(self, record): 728 '''return rec array (without variable) from complete record (with variable)''' 729 return [record[self.lidxrow[i]] for i in range(len(self.lidxrow))] 730 731 def keytoval(self, listkey, extern=True): 732 ''' 733 convert a keys list (key for each index) to a values list (value for each index). 734 735 *Parameters* 736 737 - **listkey** : key for each index 738 - **extern** : boolean (default True) - if True, compare rec to val else to values 739 740 *Returns* 741 742 - **list** : value for each index''' 743 return [idx.keytoval(key, extern=extern) for idx, key in zip(self.lindex, listkey)] 744 745 def loc(self, rec, extern=True, row=False): 746 ''' 747 Return record or row corresponding to a list of idx values. 748 749 *Parameters* 750 751 - **rec** : list - value for each idx 752 - **extern** : boolean (default True) - if True, compare rec to val, 753 else to values 754 - **row** : Boolean (default False) - if True, return list of row, 755 else list of records 756 757 *Returns* 758 759 - **object** : variable value or None if not found''' 760 locrow = None 761 try: 762 if len(rec) == self.lenindex: 763 locrow = list(set.intersection(*[set(self.lindex[i].loc(rec[i], extern)) 764 for i in range(self.lenindex)])) 765 elif len(rec) == self.lenidx: 766 locrow = list(set.intersection(*[set(self.lidx[i].loc(rec[i], extern)) 767 for i in range(self.lenidx)])) 768 except: 769 pass 770 if locrow is None: 771 return None 772 if row: 773 return locrow 774 return [self.record(locr, extern=extern) for locr in locrow] 775 776 def mix(self, other, fillvalue=None): 777 '''add other Field not included in self and add other's values''' 778 sname = set(self.lname) 779 oname = set(other.lname) 780 newself = copy(self) 781 copother = copy(other) 782 for nam in oname - sname: 783 newself.addindex({nam: [fillvalue] * len(newself)}) 784 for nam in sname - oname: 785 copother.addindex({nam: [fillvalue] * len(copother)}) 786 return newself.add(copother, name=True, solve=False) 787 788 def merging(self, listname=None): 789 ''' add a new Field build with Field define in listname. 790 Values of the new Field are set of values in listname Field''' 791 #self.addindex(Field.merging([self.nindex(name) for name in listname])) 792 self.addindex(Sfield.merging([self.nindex(name) for name in listname])) 793 794 def orindex(self, other, first=False, merge=False, update=False): 795 ''' Add other's index to self's index (with same length) 796 797 *Parameters* 798 799 - **other** : self class - object to add 800 - **first** : Boolean (default False) - If True insert indexes 801 at the first row, else at the end 802 - **merge** : Boolean (default False) - create a new index 803 if merge is False 804 - **update** : Boolean (default False) - if True, update actual 805 values if index name is present (and merge is True) 806 807 *Returns* : none ''' 808 if len(self) != 0 and len(self) != len(other) and len(other) != 0: 809 raise DatasetError("the sizes are not equal") 810 otherc = copy(other) 811 for idx in otherc.lindex: 812 self.addindex(idx, first=first, merge=merge, update=update) 813 return self 814 815 def record(self, row, indexname=None, extern=True): 816 '''return the record at the row 817 818 *Parameters* 819 820 - **row** : int - row of the record 821 - **extern** : boolean (default True) - if True, return val record else 822 value record 823 - **indexname** : list of str (default None) - list of fields to return 824 *Returns* 825 826 - **list** : val record or value record''' 827 if indexname is None: 828 indexname = self.lname 829 if extern: 830 record = [idx.val[row] for idx in self.lindex] 831 #record = [idx.values[row].to_obj() for idx in self.lindex] 832 #record = [idx.valrow(row) for idx in self.lindex] 833 else: 834 record = [idx.values[row] for idx in self.lindex] 835 return [record[self.lname.index(name)] for name in indexname] 836 837 def recidx(self, row, extern=True): 838 '''return the list of idx val or values at the row 839 840 *Parameters* 841 842 - **row** : int - row of the record 843 - **extern** : boolean (default True) - if True, return val rec else value rec 844 845 *Returns* 846 847 - **list** : val or value for idx''' 848 if extern: 849 return [idx.values[row].to_obj() for idx in self.lidx] 850 # return [idx.valrow(row) for idx in self.lidx] 851 return [idx.values[row] for idx in self.lidx] 852 853 def recvar(self, row, extern=True): 854 '''return the list of var val or values at the row 855 856 *Parameters* 857 858 - **row** : int - row of the record 859 - **extern** : boolean (default True) - if True, return val rec else value rec 860 861 *Returns* 862 863 - **list** : val or value for var''' 864 if extern: 865 return [idx.values[row].to_obj() for idx in self.lvar] 866 # return [idx.valrow(row) for idx in self.lvar] 867 return [idx.values[row] for idx in self.lvar] 868 869 def setcanonorder(self, reindex=False): 870 '''Set the canonical index order : primary - secondary/unique - variable. 871 Set the canonical keys order : ordered keys in the first columns. 872 873 *Parameters* 874 - **reindex** : boolean (default False) - if True, set default codec after 875 transformation 876 877 *Return* : self''' 878 order = self.primaryname 879 order += self.secondaryname 880 order += self.lvarname 881 order += self.lunicname 882 self.swapindex(order) 883 self.sort(reindex=reindex) 884 # self.analysis.actualize() 885 return self 886 887 def setfilter(self, filt=None, first=False, filtname=FILTER, unique=False): 888 '''Add a filter index with boolean values 889 890 - **filt** : list of boolean - values of the filter idx to add 891 - **first** : boolean (default False) - If True insert index at the first row, 892 else at the end 893 - **filtname** : string (default FILTER) - Name of the filter Field added 894 895 *Returns* : self''' 896 if not filt: 897 filt = [True] * len(self) 898 idx = self.field(filt, name=filtname) 899 idx.reindex() 900 if not idx.cod in ([True, False], [False, True], [True], [False]): 901 raise DatasetError('filt is not consistent') 902 if unique: 903 for name in self.lname: 904 if name[:len(FILTER)] == FILTER: 905 self.delindex(FILTER) 906 self.addindex(idx, first=first) 907 return self 908 909 def sort(self, order=None, reverse=False, func=str, reindex=True): 910 '''Sort data following the index order and apply the ascending or descending 911 sort function to values. 912 913 *Parameters* 914 915 - **order** : list (default None)- new order of index to apply. If None or [], 916 the sort function is applied to the existing order of indexes. 917 - **reverse** : boolean (default False)- ascending if True, descending if False 918 - **func** : function (default str) - parameter key used in the sorted function 919 - **reindex** : boolean (default True) - if True, apply a new codec order (key = func) 920 921 *Returns* : self''' 922 if not order: 923 order = list(range(self.lenindex)) 924 orderfull = order + list(set(range(self.lenindex)) - set(order)) 925 if reindex: 926 for i in order: 927 self.lindex[i].reindex(codec=sorted( 928 self.lindex[i].codec, key=func)) 929 newidx = Cutil.transpose(sorted(Cutil.transpose( 930 [self.lindex[orderfull[i]].keys for i in range(self.lenindex)]), 931 reverse=reverse)) 932 for i in range(self.lenindex): 933 self.lindex[orderfull[i]].set_keys(newidx[i]) 934 return self 935 936 """ 937 def swapindex(self, order): 938 ''' 939 Change the order of the index . 940 941 *Parameters* 942 943 - **order** : list of int or list of name - new order of index to apply. 944 945 *Returns* : self ''' 946 if self.lenindex != len(order): 947 raise DatasetError('length of order and Dataset different') 948 if not order or isinstance(order[0], int): 949 self.lindex = [self.lindex[ind] for ind in order] 950 elif isinstance(order[0], str): 951 self.lindex = [self.nindex(name) for name in order] 952 return self 953 """ 954 955 def tostdcodec(self, inplace=False, full=True): 956 '''Transform all codec in full or default codec. 957 958 *Parameters* 959 960 - **inplace** : boolean (default False) - if True apply transformation 961 to self, else to a new Dataset 962 - **full** : boolean (default True)- full codec if True, default if False 963 964 965 *Return Dataset* : self or new Dataset''' 966 lindex = [idx.tostdcodec(inplace=False, full=full) 967 for idx in self.lindex] 968 if inplace: 969 self.lindex = lindex 970 return self 971 return self.__class__(lindex, self.lvarname) 972 973 def updateindex(self, listvalue, index, extern=True): 974 '''update values of an index. 975 976 *Parameters* 977 978 - **listvalue** : list - index values to replace 979 - **index** : integer - index row to update 980 - **extern** : if True, the listvalue has external representation, else internal 981 982 *Returns* : none ''' 983 self.lindex[index].setlistvalue(listvalue, extern=extern) 984 985 def valtokey(self, rec, extern=True): 986 '''convert a record list (value or val for each idx) to a key list 987 (key for each index). 988 989 *Parameters* 990 991 - **rec** : list of value or val for each index 992 - **extern** : if True, the rec value has external representation, else internal 993 994 *Returns* 995 996 - **list of int** : record key for each index''' 997 return [idx.valtokey(val, extern=extern) for idx, val in zip(self.lindex, rec)]
Sdataset is a child class of Cdataset where internal value can be different
from external value (list is converted in tuple and dict in json-object).
One attribute is added: 'field' to define the 'field' class.
The methods defined in this class are :
constructor (@classmethod)
Sdataset.from_csvSdataset.from_fileSdataset.mergeSdataset.extCdataset.ntvCdataset.from_ntv
dynamic value - module analysis (getters @property)
DatasetAnalysis.analysisDatasetAnalysis.anafieldsSdataset.extidxSdataset.extidxextDatasetAnalysis.field_partitionSdataset.idxnameSdataset.idxlenSdataset.iidxSdataset.lenidxSdataset.lidxSdataset.lidxrowSdataset.lisvarSdataset.lvarDatasetAnalysis.lvarnameSdataset.lvarrowCdataset.lunicnameCdataset.lunicrowDatasetAnalysis.partitionsDatasetAnalysis.primarynameDatasetAnalysis.relationDatasetAnalysis.secondarynameSdataset.setidxSdataset.zip
dynamic value (getters @property)
Cdataset.keysCdataset.iindexCdataset.indexlenCdataset.lenindexCdataset.lnameCdataset.tiindex
global value (getters @property)
DatasetAnalysis.completeSdataset.consistentDatasetAnalysis.dimensionSdataset.primarySdataset.secondary
selecting - infos methods
Sdataset.idxrecordDatasetAnalysis.indexinfosDatasetAnalysis.indicatorSdataset.iscanonorderSdataset.isinrecordSdataset.keytovalSdataset.locCdataset.nindexSdataset.recordSdataset.recidxSdataset.recvarCdataset.to_analysisDatasetAnalysis.treeSdataset.valtokey
add - update methods
Cdataset.addSdataset.addindexSdataset.appendCdataset.delindexSdataset.delrecordSdataset.orindexCdataset.renameindexCdataset.setnameSdataset.updateindex
structure management - methods
Sdataset.applyfilterCdataset.check_relationCdataset.check_relationshipSdataset.couplingSdataset.fullSdataset.getduplicatesSdataset.mixSdataset.mergingCdataset.reindexCdataset.reorderSdataset.setfilterSdataset.sortCdataset.swapindexSdataset.setcanonorderSdataset.tostdcodec
exports methods (observation.dataset_interface.DatasetInterface)
Dataset.jsonDataset.plotDataset.to_objDataset.to_csvDataset.to_dataframeDataset.to_fileDataset.to_ntvDataset.to_objDataset.to_xarrayDataset.viewDataset.vlistDataset.voxel
153 def __init__(self, listidx=None, name=None, reindex=True): 154 ''' 155 Dataset constructor. 156 157 *Parameters* 158 159 - **listidx** : list (default None) - list of Field data 160 - **name** : string (default None) - name of the dataset 161 - **reindex** : boolean (default True) - if True, default codec for each Field''' 162 163 self.field = self.field_class 164 Cdataset.__init__(self, listidx, name, reindex=reindex)
Dataset constructor.
Parameters
- listidx : list (default None) - list of Field data
- name : string (default None) - name of the dataset
- reindex : boolean (default True) - if True, default codec for each Field
166 @classmethod 167 def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True, 168 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}): 169 ''' 170 Dataset constructor (from a csv file). Each column represents index values. 171 172 *Parameters* 173 174 - **filename** : string (default 'dataset.csv'), name of the file to read 175 - **header** : boolean (default True). If True, the first raw is dedicated to names 176 - **nrow** : integer (default None). Number of row. If None, all the row else nrow 177 - **optcsv** : dict (default : quoting) - see csv.reader options''' 178 if not optcsv: 179 optcsv = {} 180 if not nrow: 181 nrow = -1 182 with open(filename, newline='', encoding="utf-8") as file: 183 reader = csv.reader(file, **optcsv) 184 irow = 0 185 for row in reader: 186 if irow == nrow: 187 break 188 if irow == 0: 189 idxval = [[] for i in range(len(row))] 190 idxname = [''] * len(row) 191 if irow == 0 and header: 192 idxname = row 193 else: 194 for i in range(len(row)): 195 if decode_json: 196 try: 197 idxval[i].append(json.loads(row[i])) 198 except: 199 idxval[i].append(row[i]) 200 else: 201 idxval[i].append(row[i]) 202 irow += 1 203 lindex = [cls.field_class.from_ntv( 204 {name: idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)] 205 return cls(listidx=lindex, reindex=True)
Dataset constructor (from a csv file). Each column represents index values.
Parameters
- filename : string (default 'dataset.csv'), name of the file to read
- header : boolean (default True). If True, the first raw is dedicated to names
- nrow : integer (default None). Number of row. If None, all the row else nrow
- optcsv : dict (default : quoting) - see csv.reader options
207 @classmethod 208 def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False): 209 ''' 210 Generate Object from file storage. 211 212 *Parameters* 213 214 - **filename** : string - file name (with path) 215 - **forcestring** : boolean (default False) - if True, 216 forces the UTF-8 data format, else the format is calculated 217 - **reindex** : boolean (default True) - if True, default codec for each Field 218 - **decode_str**: boolean (default False) - if True, string are loaded in json data 219 220 *Returns* : new Object''' 221 with open(filename, 'rb') as file: 222 btype = file.read(1) 223 if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring: 224 with open(filename, 'r', newline='', encoding="utf-8") as file: 225 bjson = file.read() 226 else: 227 with open(filename, 'rb') as file: 228 bjson = file.read() 229 return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)
Generate Object from file storage.
Parameters
- filename : string - file name (with path)
- forcestring : boolean (default False) - if True, forces the UTF-8 data format, else the format is calculated
- reindex : boolean (default True) - if True, default codec for each Field
- decode_str: boolean (default False) - if True, string are loaded in json data
Returns : new Object
231 def merge(self, fillvalue=math.nan, reindex=False, simplename=False): 232 ''' 233 Merge method replaces Dataset objects included into its constituents. 234 235 *Parameters* 236 237 - **fillvalue** : object (default nan) - value used for the additional data 238 - **reindex** : boolean (default False) - if True, set default codec after transformation 239 - **simplename** : boolean (default False) - if True, new Field name are 240 the same as merged Field name else it is a composed name. 241 242 *Returns*: merged Dataset ''' 243 ilc = copy(self) 244 delname = [] 245 row = ilc[0] 246 if not isinstance(row, list): 247 row = [row] 248 merged, oldname, newname = self.__class__._mergerecord( 249 self.ext(row, ilc.lname), simplename=simplename, fillvalue=fillvalue, 250 reindex=reindex) 251 delname.append(oldname) 252 for ind in range(1, len(ilc)): 253 oldidx = ilc.nindex(oldname) 254 for name in newname: 255 ilc.addindex(self.field(oldidx.codec, name, oldidx.keys)) 256 row = ilc[ind] 257 if not isinstance(row, list): 258 row = [row] 259 rec, oldname, newname = self.__class__._mergerecord( 260 self.ext(row, ilc.lname), simplename=simplename) 261 if oldname and newname != [oldname]: 262 delname.append(oldname) 263 for name in newname: 264 oldidx = merged.nindex(oldname) 265 fillval = self.field.s_to_i(fillvalue) 266 merged.addindex( 267 self.field([fillval] * len(merged), name, oldidx.keys)) 268 merged += rec 269 for name in set(delname): 270 if name: 271 merged.delindex(name) 272 if reindex: 273 merged.reindex() 274 ilc.lindex = merged.lindex 275 return ilc
Merge method replaces Dataset objects included into its constituents.
Parameters
- fillvalue : object (default nan) - value used for the additional data
- reindex : boolean (default False) - if True, set default codec after transformation
- simplename : boolean (default False) - if True, new Field name are the same as merged Field name else it is a composed name.
Returns: merged Dataset
277 @classmethod 278 def ext(cls, idxval=None, idxname=None, reindex=True, fast=False): 279 ''' 280 Dataset constructor (external index). 281 282 *Parameters* 283 284 - **idxval** : list of Field or list of values (see data model) 285 - **idxname** : list of string (default None) - list of Field name (see data model)''' 286 if idxval is None: 287 idxval = [] 288 if not isinstance(idxval, list): 289 return None 290 val = [] 291 for idx in idxval: 292 if not isinstance(idx, list): 293 val.append([idx]) 294 else: 295 val.append(idx) 296 lenval = [len(idx) for idx in val] 297 if lenval and max(lenval) != min(lenval): 298 raise DatasetError('the length of Iindex are different') 299 length = lenval[0] if lenval else 0 300 idxname = [None] * len(val) if idxname is None else idxname 301 for ind, name in enumerate(idxname): 302 if name is None or name == '$default': 303 idxname[ind] = 'i'+str(ind) 304 lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex, 305 fast=fast) for codec, name in zip(val, idxname)] 306 return cls(lindex, reindex=False)
Dataset constructor (external index).
Parameters
- idxval : list of Field or list of values (see data model)
- idxname : list of string (default None) - list of Field name (see data model)
465 def addindex(self, index, first=False, merge=False, update=False): 466 '''add a new index. 467 468 *Parameters* 469 470 - **index** : Field - index to add (can be index Ntv representation) 471 - **first** : If True insert index at the first row, else at the end 472 - **merge** : create a new index if merge is False 473 - **update** : if True, update actual values if index name is present (and merge is True) 474 475 *Returns* : none ''' 476 idx = self.field.ntv(index) 477 idxname = self.lname 478 if len(idx) != len(self) and len(self) > 0: 479 raise DatasetError('sizes are different') 480 if not idx.name in idxname: 481 if first: 482 self.lindex.insert(0, idx) 483 else: 484 self.lindex.append(idx) 485 elif not merge: # si idx.name in idxname 486 while idx.name in idxname: 487 idx.name += '(2)' 488 if first: 489 self.lindex.insert(0, idx) 490 else: 491 self.lindex.append(idx) 492 elif update: # si merge et si idx.name in idxname 493 self.lindex[idxname.index(idx.name)].setlistvalue(idx.values)
add a new index.
Parameters
- index : Field - index to add (can be index Ntv representation)
- first : If True insert index at the first row, else at the end
- merge : create a new index if merge is False
- update : if True, update actual values if index name is present (and merge is True)
Returns : none
495 def append(self, record, unique=False): 496 '''add a new record. 497 498 *Parameters* 499 500 - **record** : list of new index values to add to Dataset 501 - **unique** : boolean (default False) - Append isn't done if unique 502 is True and record present 503 504 *Returns* : list - key record''' 505 if self.lenindex != len(record): 506 raise DatasetError('len(record) not consistent') 507 record = self.field.l_to_i(record) 508 if self.isinrecord(self.idxrecord(record), False) and unique: 509 return None 510 return [self.lindex[i].append(record[i]) for i in range(self.lenindex)]
add a new record.
Parameters
- record : list of new index values to add to Dataset
- unique : boolean (default False) - Append isn't done if unique is True and record present
Returns : list - key record
512 def applyfilter(self, reverse=False, filtname=FILTER, delfilter=True, inplace=True): 513 '''delete records with defined filter value. 514 Filter is deleted after record filtering. 515 516 *Parameters* 517 518 - **reverse** : boolean (default False) - delete record with filter's 519 value is reverse 520 - **filtname** : string (default FILTER) - Name of the filter Field added 521 - **delfilter** : boolean (default True) - If True, delete filter's Field 522 - **inplace** : boolean (default True) - if True, filter is apply to self, 523 524 *Returns* : self or new Dataset''' 525 if not filtname in self.lname: 526 return None 527 if inplace: 528 ilis = self 529 else: 530 ilis = copy(self) 531 ifilt = ilis.lname.index(filtname) 532 ilis.sort([ifilt], reverse=not reverse, func=None) 533 lisind = ilis.lindex[ifilt].recordfromvalue(reverse) 534 if lisind: 535 minind = min(lisind) 536 for idx in ilis.lindex: 537 del idx.keys[minind:] 538 if inplace: 539 self.delindex(filtname) 540 else: 541 ilis.delindex(filtname) 542 if delfilter: 543 self.delindex(filtname) 544 ilis.reindex() 545 return ilis
delete records with defined filter value. Filter is deleted after record filtering.
Parameters
- reverse : boolean (default False) - delete record with filter's value is reverse
- filtname : string (default FILTER) - Name of the filter Field added
- delfilter : boolean (default True) - If True, delete filter's Field
- inplace : boolean (default True) - if True, filter is apply to self,
Returns : self or new Dataset
547 def coupling(self, derived=True, level=0.1): 548 '''Transform idx with low dist in coupled or derived indexes (codec extension). 549 550 *Parameters* 551 552 - **level** : float (default 0.1) - param threshold to apply coupling. 553 - **derived** : boolean (default : True). If True, indexes are derived, 554 else coupled. 555 556 *Returns* : None''' 557 ana = self.analysis 558 child = [[]] * len(ana) 559 childroot = [] 560 level = level * len(self) 561 for idx in range(self.lenindex): 562 if derived: 563 iparent = ana.fields[idx].p_distomin.index 564 else: 565 iparent = ana.fields[idx].p_distance.index 566 if iparent == -1: 567 childroot.append(idx) 568 else: 569 child[iparent].append(idx) 570 for idx in childroot: 571 self._couplingidx(idx, child, derived, level, ana)
Transform idx with low dist in coupled or derived indexes (codec extension).
Parameters
- level : float (default 0.1) - param threshold to apply coupling.
- derived : boolean (default : True). If True, indexes are derived, else coupled.
Returns : None
593 def delrecord(self, record, extern=True): 594 '''remove a record. 595 596 *Parameters* 597 598 - **record** : list - index values to remove to Dataset 599 - **extern** : if True, compare record values to external representation 600 of self.value, else, internal 601 602 *Returns* : row deleted''' 603 self.reindex() 604 reckeys = self.valtokey(record, extern=extern) 605 if None in reckeys: 606 return None 607 row = self.tiindex.index(reckeys) 608 for idx in self: 609 del idx[row] 610 return row
remove a record.
Parameters
- record : list - index values to remove to Dataset
- extern : if True, compare record values to external representation of self.value, else, internal
Returns : row deleted
645 def full(self, reindex=False, idxname=None, varname=None, fillvalue='-', 646 fillextern=True, inplace=True, canonical=True): 647 '''tranform a list of indexes in crossed indexes (value extension). 648 649 *Parameters* 650 651 - **idxname** : list of string - name of indexes to transform 652 - **varname** : string - name of indexes to use 653 - **reindex** : boolean (default False) - if True, set default codec 654 before transformation 655 - **fillvalue** : object value used for var extension 656 - **fillextern** : boolean(default True) - if True, fillvalue is converted 657 to internal value 658 - **inplace** : boolean (default True) - if True, filter is apply to self, 659 - **canonical** : boolean (default True) - if True, Field are ordered 660 in canonical order 661 662 *Returns* : self or new Dataset''' 663 ilis = self if inplace else copy(self) 664 if not idxname: 665 idxname = ilis.primaryname 666 if reindex: 667 ilis.reindex() 668 keysadd = Cutil.idxfull([ilis.nindex(name) for name in idxname]) 669 if keysadd and len(keysadd) != 0: 670 newlen = len(keysadd[0]) + len(ilis) 671 for ind in range(ilis.lenindex): 672 ilis._fullindex(ind, keysadd, idxname, varname, newlen, 673 fillvalue, fillextern) 674 if canonical: 675 ilis.setcanonorder() 676 return ilis
tranform a list of indexes in crossed indexes (value extension).
Parameters
- idxname : list of string - name of indexes to transform
- varname : string - name of indexes to use
- reindex : boolean (default False) - if True, set default codec before transformation
- fillvalue : object value used for var extension
- fillextern : boolean(default True) - if True, fillvalue is converted to internal value
- inplace : boolean (default True) - if True, filter is apply to self,
- canonical : boolean (default True) - if True, Field are ordered in canonical order
Returns : self or new Dataset
678 def getduplicates(self, indexname=None, resindex=None, indexview=None): 679 '''check duplicate cod in a list of indexes. Result is add in a new 680 index or returned. 681 682 *Parameters* 683 684 - **indexname** : list of string (default none) - name of indexes to check 685 (if None, all Field) 686 - **resindex** : string (default None) - Add a new index named resindex 687 with check result (False if duplicate) 688 - **indexview** : list of str (default None) - list of fields to return 689 690 *Returns* : list of int - list of rows with duplicate cod ''' 691 if not indexname: 692 indexname = self.lname 693 duplicates = [] 694 for name in indexname: 695 duplicates += self.nindex(name).getduplicates() 696 if resindex and isinstance(resindex, str): 697 newidx = self.field([True] * len(self), name=resindex) 698 for item in duplicates: 699 newidx[item] = False 700 self.addindex(newidx) 701 dupl = tuple(set(duplicates)) 702 if not indexview: 703 return dupl 704 return [tuple(self.record(ind, indexview)) for ind in dupl]
check duplicate cod in a list of indexes. Result is add in a new index or returned.
Parameters
- indexname : list of string (default none) - name of indexes to check (if None, all Field)
- resindex : string (default None) - Add a new index named resindex with check result (False if duplicate)
- indexview : list of str (default None) - list of fields to return
Returns : list of int - list of rows with duplicate cod
706 def iscanonorder(self): 707 '''return True if primary indexes have canonical ordered keys''' 708 primary = self.primary 709 canonorder = Cutil.canonorder( 710 [len(self.lidx[idx].codec) for idx in primary]) 711 return canonorder == [self.lidx[idx].keys for idx in primary]
return True if primary indexes have canonical ordered keys
713 def isinrecord(self, record, extern=True): 714 '''Check if record is present in self. 715 716 *Parameters* 717 718 - **record** : list - value for each Field 719 - **extern** : if True, compare record values to external representation 720 of self.value, else, internal 721 722 *Returns boolean* : True if found''' 723 if extern: 724 return record in Cutil.transpose(self.extidxext) 725 return record in Cutil.transpose(self.extidx)
Check if record is present in self.
Parameters
- record : list - value for each Field
- extern : if True, compare record values to external representation of self.value, else, internal
Returns boolean : True if found
727 def idxrecord(self, record): 728 '''return rec array (without variable) from complete record (with variable)''' 729 return [record[self.lidxrow[i]] for i in range(len(self.lidxrow))]
return rec array (without variable) from complete record (with variable)
731 def keytoval(self, listkey, extern=True): 732 ''' 733 convert a keys list (key for each index) to a values list (value for each index). 734 735 *Parameters* 736 737 - **listkey** : key for each index 738 - **extern** : boolean (default True) - if True, compare rec to val else to values 739 740 *Returns* 741 742 - **list** : value for each index''' 743 return [idx.keytoval(key, extern=extern) for idx, key in zip(self.lindex, listkey)]
convert a keys list (key for each index) to a values list (value for each index).
Parameters
- listkey : key for each index
- extern : boolean (default True) - if True, compare rec to val else to values
Returns
- list : value for each index
745 def loc(self, rec, extern=True, row=False): 746 ''' 747 Return record or row corresponding to a list of idx values. 748 749 *Parameters* 750 751 - **rec** : list - value for each idx 752 - **extern** : boolean (default True) - if True, compare rec to val, 753 else to values 754 - **row** : Boolean (default False) - if True, return list of row, 755 else list of records 756 757 *Returns* 758 759 - **object** : variable value or None if not found''' 760 locrow = None 761 try: 762 if len(rec) == self.lenindex: 763 locrow = list(set.intersection(*[set(self.lindex[i].loc(rec[i], extern)) 764 for i in range(self.lenindex)])) 765 elif len(rec) == self.lenidx: 766 locrow = list(set.intersection(*[set(self.lidx[i].loc(rec[i], extern)) 767 for i in range(self.lenidx)])) 768 except: 769 pass 770 if locrow is None: 771 return None 772 if row: 773 return locrow 774 return [self.record(locr, extern=extern) for locr in locrow]
Return record or row corresponding to a list of idx values.
Parameters
- rec : list - value for each idx
- extern : boolean (default True) - if True, compare rec to val, else to values
- row : Boolean (default False) - if True, return list of row, else list of records
Returns
- object : variable value or None if not found
776 def mix(self, other, fillvalue=None): 777 '''add other Field not included in self and add other's values''' 778 sname = set(self.lname) 779 oname = set(other.lname) 780 newself = copy(self) 781 copother = copy(other) 782 for nam in oname - sname: 783 newself.addindex({nam: [fillvalue] * len(newself)}) 784 for nam in sname - oname: 785 copother.addindex({nam: [fillvalue] * len(copother)}) 786 return newself.add(copother, name=True, solve=False)
add other Field not included in self and add other's values
788 def merging(self, listname=None): 789 ''' add a new Field build with Field define in listname. 790 Values of the new Field are set of values in listname Field''' 791 #self.addindex(Field.merging([self.nindex(name) for name in listname])) 792 self.addindex(Sfield.merging([self.nindex(name) for name in listname]))
add a new Field build with Field define in listname. Values of the new Field are set of values in listname Field
794 def orindex(self, other, first=False, merge=False, update=False): 795 ''' Add other's index to self's index (with same length) 796 797 *Parameters* 798 799 - **other** : self class - object to add 800 - **first** : Boolean (default False) - If True insert indexes 801 at the first row, else at the end 802 - **merge** : Boolean (default False) - create a new index 803 if merge is False 804 - **update** : Boolean (default False) - if True, update actual 805 values if index name is present (and merge is True) 806 807 *Returns* : none ''' 808 if len(self) != 0 and len(self) != len(other) and len(other) != 0: 809 raise DatasetError("the sizes are not equal") 810 otherc = copy(other) 811 for idx in otherc.lindex: 812 self.addindex(idx, first=first, merge=merge, update=update) 813 return self
Add other's index to self's index (with same length)
Parameters
- other : self class - object to add
- first : Boolean (default False) - If True insert indexes at the first row, else at the end
- merge : Boolean (default False) - create a new index if merge is False
- update : Boolean (default False) - if True, update actual values if index name is present (and merge is True)
Returns : none
815 def record(self, row, indexname=None, extern=True): 816 '''return the record at the row 817 818 *Parameters* 819 820 - **row** : int - row of the record 821 - **extern** : boolean (default True) - if True, return val record else 822 value record 823 - **indexname** : list of str (default None) - list of fields to return 824 *Returns* 825 826 - **list** : val record or value record''' 827 if indexname is None: 828 indexname = self.lname 829 if extern: 830 record = [idx.val[row] for idx in self.lindex] 831 #record = [idx.values[row].to_obj() for idx in self.lindex] 832 #record = [idx.valrow(row) for idx in self.lindex] 833 else: 834 record = [idx.values[row] for idx in self.lindex] 835 return [record[self.lname.index(name)] for name in indexname]
return the record at the row
Parameters
- row : int - row of the record
- extern : boolean (default True) - if True, return val record else value record
indexname : list of str (default None) - list of fields to return Returns
list : val record or value record
837 def recidx(self, row, extern=True): 838 '''return the list of idx val or values at the row 839 840 *Parameters* 841 842 - **row** : int - row of the record 843 - **extern** : boolean (default True) - if True, return val rec else value rec 844 845 *Returns* 846 847 - **list** : val or value for idx''' 848 if extern: 849 return [idx.values[row].to_obj() for idx in self.lidx] 850 # return [idx.valrow(row) for idx in self.lidx] 851 return [idx.values[row] for idx in self.lidx]
return the list of idx val or values at the row
Parameters
- row : int - row of the record
- extern : boolean (default True) - if True, return val rec else value rec
Returns
- list : val or value for idx
853 def recvar(self, row, extern=True): 854 '''return the list of var val or values at the row 855 856 *Parameters* 857 858 - **row** : int - row of the record 859 - **extern** : boolean (default True) - if True, return val rec else value rec 860 861 *Returns* 862 863 - **list** : val or value for var''' 864 if extern: 865 return [idx.values[row].to_obj() for idx in self.lvar] 866 # return [idx.valrow(row) for idx in self.lvar] 867 return [idx.values[row] for idx in self.lvar]
return the list of var val or values at the row
Parameters
- row : int - row of the record
- extern : boolean (default True) - if True, return val rec else value rec
Returns
- list : val or value for var
869 def setcanonorder(self, reindex=False): 870 '''Set the canonical index order : primary - secondary/unique - variable. 871 Set the canonical keys order : ordered keys in the first columns. 872 873 *Parameters* 874 - **reindex** : boolean (default False) - if True, set default codec after 875 transformation 876 877 *Return* : self''' 878 order = self.primaryname 879 order += self.secondaryname 880 order += self.lvarname 881 order += self.lunicname 882 self.swapindex(order) 883 self.sort(reindex=reindex) 884 # self.analysis.actualize() 885 return self
Set the canonical index order : primary - secondary/unique - variable. Set the canonical keys order : ordered keys in the first columns.
Parameters
- reindex : boolean (default False) - if True, set default codec after transformation
Return : self
887 def setfilter(self, filt=None, first=False, filtname=FILTER, unique=False): 888 '''Add a filter index with boolean values 889 890 - **filt** : list of boolean - values of the filter idx to add 891 - **first** : boolean (default False) - If True insert index at the first row, 892 else at the end 893 - **filtname** : string (default FILTER) - Name of the filter Field added 894 895 *Returns* : self''' 896 if not filt: 897 filt = [True] * len(self) 898 idx = self.field(filt, name=filtname) 899 idx.reindex() 900 if not idx.cod in ([True, False], [False, True], [True], [False]): 901 raise DatasetError('filt is not consistent') 902 if unique: 903 for name in self.lname: 904 if name[:len(FILTER)] == FILTER: 905 self.delindex(FILTER) 906 self.addindex(idx, first=first) 907 return self
Add a filter index with boolean values
- filt : list of boolean - values of the filter idx to add
- first : boolean (default False) - If True insert index at the first row, else at the end
- filtname : string (default FILTER) - Name of the filter Field added
Returns : self
909 def sort(self, order=None, reverse=False, func=str, reindex=True): 910 '''Sort data following the index order and apply the ascending or descending 911 sort function to values. 912 913 *Parameters* 914 915 - **order** : list (default None)- new order of index to apply. If None or [], 916 the sort function is applied to the existing order of indexes. 917 - **reverse** : boolean (default False)- ascending if True, descending if False 918 - **func** : function (default str) - parameter key used in the sorted function 919 - **reindex** : boolean (default True) - if True, apply a new codec order (key = func) 920 921 *Returns* : self''' 922 if not order: 923 order = list(range(self.lenindex)) 924 orderfull = order + list(set(range(self.lenindex)) - set(order)) 925 if reindex: 926 for i in order: 927 self.lindex[i].reindex(codec=sorted( 928 self.lindex[i].codec, key=func)) 929 newidx = Cutil.transpose(sorted(Cutil.transpose( 930 [self.lindex[orderfull[i]].keys for i in range(self.lenindex)]), 931 reverse=reverse)) 932 for i in range(self.lenindex): 933 self.lindex[orderfull[i]].set_keys(newidx[i]) 934 return self
Sort data following the index order and apply the ascending or descending sort function to values.
Parameters
- order : list (default None)- new order of index to apply. If None or [], the sort function is applied to the existing order of indexes.
- reverse : boolean (default False)- ascending if True, descending if False
- func : function (default str) - parameter key used in the sorted function
- reindex : boolean (default True) - if True, apply a new codec order (key = func)
Returns : self
955 def tostdcodec(self, inplace=False, full=True): 956 '''Transform all codec in full or default codec. 957 958 *Parameters* 959 960 - **inplace** : boolean (default False) - if True apply transformation 961 to self, else to a new Dataset 962 - **full** : boolean (default True)- full codec if True, default if False 963 964 965 *Return Dataset* : self or new Dataset''' 966 lindex = [idx.tostdcodec(inplace=False, full=full) 967 for idx in self.lindex] 968 if inplace: 969 self.lindex = lindex 970 return self 971 return self.__class__(lindex, self.lvarname)
Transform all codec in full or default codec.
Parameters
- inplace : boolean (default False) - if True apply transformation to self, else to a new Dataset
- full : boolean (default True)- full codec if True, default if False
Return Dataset : self or new Dataset
973 def updateindex(self, listvalue, index, extern=True): 974 '''update values of an index. 975 976 *Parameters* 977 978 - **listvalue** : list - index values to replace 979 - **index** : integer - index row to update 980 - **extern** : if True, the listvalue has external representation, else internal 981 982 *Returns* : none ''' 983 self.lindex[index].setlistvalue(listvalue, extern=extern)
update values of an index.
Parameters
- listvalue : list - index values to replace
- index : integer - index row to update
- extern : if True, the listvalue has external representation, else internal
Returns : none
985 def valtokey(self, rec, extern=True): 986 '''convert a record list (value or val for each idx) to a key list 987 (key for each index). 988 989 *Parameters* 990 991 - **rec** : list of value or val for each index 992 - **extern** : if True, the rec value has external representation, else internal 993 994 *Returns* 995 996 - **list of int** : record key for each index''' 997 return [idx.valtokey(val, extern=extern) for idx, val in zip(self.lindex, rec)]
convert a record list (value or val for each idx) to a key list (key for each index).
Parameters
- rec : list of value or val for each index
- extern : if True, the rec value has external representation, else internal
Returns
- list of int : record key for each index
Inherited Members
- tab_dataset.dataset_interface.DatasetInterface
- json
- plot
- to_csv
- to_dataframe
- to_file
- to_ntv
- to_xarray
- voxel
- view
- vlist
- tab_dataset.cdataset.Cdataset
- indexlen
- iindex
- keys
- lenindex
- lunicname
- lunicrow
- lname
- tiindex
- ntv
- from_ntv
- add
- to_analysis
- reindex
- delindex
- nindex
- renameindex
- reorder
- setname
- swapindex
- check_relation
- check_relationship
- tab_dataset.cdataset.DatasetAnalysis
- analysis
- anafields
- partitions
- complete
- dimension
- lvarname
- primaryname
- secondaryname
- indexinfos
- field_partition
- relation
- tree
- indicator
999class Ndataset(Sdataset): 1000 # %% Ndataset 1001 ''' 1002 `Ndataset` is a child class of Cdataset where internal value are NTV entities. 1003 1004 All the methods are the same as `Sdataset`. 1005 ''' 1006 field_class = Nfield
Ndataset is a child class of Cdataset where internal value are NTV entities.
All the methods are the same as Sdataset.
Inherited Members
- Sdataset
- Sdataset
- from_csv
- from_file
- merge
- ext
- consistent
- extidx
- extidxext
- idxname
- idxlen
- iidx
- lenidx
- lidx
- lisvar
- lvar
- lvarrow
- lidxrow
- primary
- secondary
- setidx
- zip
- addindex
- append
- applyfilter
- coupling
- delrecord
- full
- getduplicates
- iscanonorder
- isinrecord
- idxrecord
- keytoval
- loc
- mix
- merging
- orindex
- record
- recidx
- recvar
- setcanonorder
- setfilter
- sort
- tostdcodec
- updateindex
- valtokey
- tab_dataset.dataset_interface.DatasetInterface
- json
- plot
- to_csv
- to_dataframe
- to_file
- to_ntv
- to_xarray
- voxel
- view
- vlist
- tab_dataset.cdataset.Cdataset
- indexlen
- iindex
- keys
- lenindex
- lunicname
- lunicrow
- lname
- tiindex
- ntv
- from_ntv
- add
- to_analysis
- reindex
- delindex
- nindex
- renameindex
- reorder
- setname
- swapindex
- check_relation
- check_relationship
- tab_dataset.cdataset.DatasetAnalysis
- analysis
- anafields
- partitions
- complete
- dimension
- lvarname
- primaryname
- secondaryname
- indexinfos
- field_partition
- relation
- tree
- indicator