wiki:Dev/Technical/DB

Version 71 (modified by dgalimbe, 3 years ago) (diff)

--

Under Construction

1.I Introduction

1.1. What is the ZODB

2. Related Modules

2.1. Persistent Mapping

2.2. Persistent List

2.3. BTrees

3. Persistency

With persistency, the programmer doesn't have to worry about the life's cycle of objects. A persistent object is an object which, once created, will be stored persistently in the Zope object database with its own OID - a unique persistent object identifier. If the value of an attribute is changed, what will be stored in the ZODB is a new complete serialization of the Class instance (containing, amongst other things, all the non-persistence attributes), with the same OID. The change is detected because we call setattr defined in the Persistent class.

Theoretically each object can be persistent, but sometimes it is useful to keep some objects non persistent, i.e. volatile. These attributes are used in classes in order to compute logics and they don't need to be stored in the database.

The name of a volatile attribute has to start with _v_:

def __init__(self, id):
    self._v_config = self.getSomeTmpConfig()

When the persistent object is retrieved from the ZODB the init method is not called any more, thus the programmer has to make sure that all volatile attributes have a value. This can be done through the setstate method which is called each time the object is charged in the memory.

from Globals import Peristent
def __setstate__(self, state):
    Globals.Persistent.__setstate__(self, state)
    self._v_config = self.getSomeTmpConfig()

As already mentioned, the serialization process serialize the object with (recursively) all its attributes that are not persistent, this means that a simply modification of an instance causes the copy of the entire object. Thus a careful reflection has to be made in order to define which attributes will be persistent and which don't.

4.1 When to inherit from Persistent

When implemeting a class which instances will often be modified you should make it persistent, to avoid impacting the container at each change.

4.2 When not to inherit from Persistent

When implementing a class which instances will be and stay small (only reading the pickle from ZODB can tell you if the object is small) compared to the size of ZODB object header (which is basically the class name). Otherwise it will hurt information density, and the ZODB will contain more object header data than actual object payload.

5. Transactions

The ZODB is Transactional.

Transactions in a database environment have two main purposes:

1. To provide reliable units of work that allow correct recovery from failures and keep a database 
consistent even in cases of system failure, when execution stops (completely or partially) and many
operations upon a database remain uncompleted, with unclear status.
2. To provide isolation between programs accessing a database concurrently. Without isolation the 
programs' outcomes are possibly erroneous.

The data.fs zodb's fileStorage is roughly a big pile of transactions, everytime an object (or a small set of objects TODO) is modified, a connection with the ZODB is opened and a new transaction, containing all the modified serialized instances, is put in the top of the pile. The simply retrieval of an object will not create a new transaction. With this implementation choice, the all history of each object is stored in the database.

6. Tools

6.1 Stats on Objects

This script returns the list of the n-biggest objects present in the ZODB (I strongly recommend to set the 'n' option because it increases a lot the performance of the script and the readability of the output).

Usage:

python objects_stats.py -f data.fs -n 100

If one wants to save the output list in a file results.out simply use redirection:

python objects_stats.py -f data.fs -n 100 > results.out

Options:

  -h, --help            show this help message and exit
  -n NUM, --number=NUM  display only the n biggest objects
  -f FILENAME, --file=FILENAME
                        the FileStorage

Output:

#The list is ordered by decreasing size
Module.ClassName | Oid | Percentage | Size

# - Oid: the object identification 
# - Percentage: the percentage that the class intances take of the entire database
# - Size: the size of the object

6.2 Stats on Classes

This script returns a list of n classes classed by the whole objects' size sum. The output shows for each class the className, the percentage of space that it takes, the min, max and average object's size.

Usage:

python class_stats.py -f data.fs -n 100   

If one wants to save the output list in a file results.out simply use redirection:

python class_stats.py -f data.fs -n 100 > results.out

Options:

  -h, --help            show this help message and exit
  -n NUM, --number=NUM  display only the n biggest classes
  -f FILENAME, --file=FILENAME
                        your FileStorage

Output:

#The list is ordered by decreasing size
Module.ClassName | Percentage | Min | Max | Size

# - Percentage: the percentage that the class intances take of the entire database
# - Min: the min instance's size
# - Max: the max instance's size
# - Size: total size of all instances of the class ClassName
# - Objects: The total number of instances of that class 

6.4 Stats on Transactions

This script outputs the list of the 'n' most busy days from the point of view of the number of transactions. The output shows for each day the number of transactions and the average time between two of them. If one wants to search for information of a specific day use the option -d dd-mm-yyy

Usage:

python transactions_stas.py -f data.fs -n 100 

If one wants to save the output list in a file results.out simply use redirection:

python transactions_stas.py -f data.fs -n 100 > results.out

If one wants to show statistics for one precise date (e.g. 01-01-2010):

python transactions_stats.py -f data.fs -d 01-01-2010

Options:

  -h, --help            show this help message and exit
  -n NUM, --number=NUM  display only the n busiest days
  -f FILENAME, --file=FILENAME
                        your FileStorage
  -d DATE, --date=DATE  show the transiction only for the date d (format dd-
                        mm-yyyy)

Output:

#The list is ordered by decreasing size
Date | Transactions | Average interval

# - Transactions: the number of transactions made during the day Date
# - Average interval: average time between two transactions in that day

6.3 ZODB Viewer

This script allows a graphical tree view of the content of the database. It is possible to surf through OOTree, IOTree and Dictionaries. Useful for small databases or a bunch of transactions belonging to a bigger one (i.e. Indico ZODB).

Usage:

sudo python zodb_viewer.py

and File -> Open and choose your data.fs FileStorage.  

6.5 Show information for the last 'n' transactions

This script outputs the list of the 'n' last transactions present in the FileStorage?. For each of them a list of objects that it modified is also shown. By default the list in ordered by decreasing date, but by setting the '-o' option the last 'n' transactions are ordered by size.

Usage:

python last_transactions_information.py -n 100 data.fs 

If one wants to save the output list in a file results.out simply use redirection:

python last_transactions_information.py -n 100 data.fs  > results.out

Options:

  -h, --help            show this help message and exit
  -n NUM, --number=NUM  display the last n transactions (Default 100)
  -o, --order           order the transactions by size

Output:

#By default the list is ordered by decreasing date
TRANSACTION: ${Tid} ${Size}
#The list of objects modified by the transaction
 - ${counter1} ${Object1} ${Object1Size}
 - ${counter2} ${Object2} ${Object2Size}
...

TRANSACTION: ${Tid} ${Size}
...

# - Tid: the transaction id, TimeStamp string holding the time at which the transaction was committed. 
# - Size: the size of the transaction
# - counter: the number of objects belonging to the same class that have been modified by this transaction

6.6 Show details for one single transaction

Once one has the 'tid' (found with the script "Show information for the last 'n' transactions" in §6.5), this script allows to see in details every object that this transaction modified.

Usage:

In order to use this script one has to make two things:

Add Indico in the Pythonpath:

export PYTHONPATH=${PYTHONPATH}:/home/davide/indico/cds-indico/indico

Modify the protection of the file "indico.log"

sudo chmod a+w /home/davide/indico/log/indico.log

And now the script can be run:

python showTransactionDetails.py -t '2010-04-12 14:16:12.121825' data.fs 

Options:

  -h, --help         show this help message and exit
  -t TID, --tid=TID  the researched trancation's tid

6.5 Find path from oid

Sometimes the oid itself is meaningless. This script returns the list of objects met in the path from the root to the searched object in the ZODB.

Usage:

python last_transactions_information.py -n 100 data.fs 

6.6 Consistency Checker

The fstest tool will scan all the data in a FileStorage? and report an error if it finds any corrupt transaction data. The tool will print a message when the first error is detected, then exit.

The tool accepts one or more -v arguments. If a single -v is used, it will print a line of text for each transaction record it encounters. If two -v arguments are used, it will also print a line of text for each object. The objects for a transaction will be printed before the transaction itself.

Note: It does not check the consistency of the object pickles. It is possible for the damage to occur only in the part of the file that stores object pickles. Those errors will go undetected.

Usage:

python fstest.py [-v] data.fs

10 References

Some links

Attachments (1)

Download all attachments as: .zip