Substantiation of necessity of Storehouse of the data
Let's consider the principal causes compelling the enterprises to realize technology of Storehouses of the data. In the literature these reasons very much often confuse to " secondary advantages " which are given with this technology. In the fair brochures devoted to Storehouses necessarily there will be a phrase that they are used for " transformations of the data for the business - analysis ", " help in decision-making on the basis of the facts, instead of intuition ", " enable to learn{find out} closer the client " and, certainly, the phrase about achievement of " competitive advantages " everywhere is inserted. But in 99 % of cases of Storehouse of the data - only a first step in realization of all these far-reaching purposes.
And now we list, for what the Storehouse can be necessary for the company:
For performance of the server / disk problems{tasks} connected to creation of searches and reports on servers / disks, not used in systems of processing of transactions (OLTP - online transaction processing)
Systems of processing of transactions aspire to adjust the majority of firms so that all operations were carried out for comprehensible time. For example, reports and the searches demanding much more volume of limited resources, than processing of transactions, nevertheless are carried out on servers / disks, and therefore prevent duly performance of transactions. Or for performance of searches and reports the servers / disks removed{assigned} under systems of processing of transactions are used - resource management in this case can become complicated, and desirable time of the answer for search hardly will be received. In this connection it is recommended to realize architecture of Storehouse of the data, using separate servers / disks for creation of searches / reports that will allow to achieve comprehensible time of processing of transactions and it will be reasonable both with financial and from the organizational point of view.
For use of models of the given and-or server technologies accelerating creation of searches and reports, but the transactions not intended for processing
There are methods of modelling of the data, essentially reducing execution time of searches and reports (for example the circuit "star"), but not intended for processing transactions as technologies laying in their basis slow down and is complicated with OLTP-processes. Besides some server technologies though raise efficiency of transaction processing and reports, but slow down processing transactions (for example bit indexing - bit-mapped indexing), and on the contrary (for example restoration of transactions). And influence of this or that method of modelling or server technology varies from the supplier to the supplier, and also depending on in what situation they are applied.
For creation of environment{Wednesday} in which the spelling and support of searches and reports does not demand the big knowledge in the field of technologies of databases. And also for maintenance of the means allowing technical experts to speed up process of a spelling and support of searches and reports
Often the Storehouse of the data is adjusted in such a manner that it is possible to write simple searches and reports, at all not having serious technical knowledge. Nevertheless, such users all the same meet difficulties and are compelled to address for the help to employees of a department of information systems. The last, probably, too it is more convenient to work with Storehouse. It is necessary to note, that conducting reports and searches in Storehouse of the data reduces quantity{amount} of bureaucratic procedures, and it too raises productivity of job of technicians.
For creation of a repository of the "cleared" data of system of processing of transactions and the subsequent reception of reports from these data without change itself OLTP-cistemy
Types of mistakes which need to be removed for "clearing" the data are described in small clause{article} " Informal systematization of mistakes in storehouse of the data " (An informal taxonomy of data warehouse data errors). The storehouse enables clearing of the systems of processing of transactions given without change. Nevertheless, it is necessary to pay attention that in some realizations of this technology the opportunity to fix correction is stipulated and then to transfer them back to OLTP-systems. Sometimes such way of correction of mistakes is more convenient, than direct changes in system of processing of transactions.
For simplification of formation of searches and reports on the data from several systems of processing of transactions, and also from external sources of the data and-or on the data which are stored{kept} only for the reporting
For drawing up of reports on the data from several systems of the organization special procedures of extraction of the data have been compelled to write long time and to carry out operations of sorting and association, and then to make reports on sorted (and-or incorporated) to samples of the data. In many cases this strategy is quite adequate. But if the organization stores{keeps} great volume of the data demanding often sorting (association), and also "clearing" in the best way to realize Storehouse.
For creation of a repository of the data of the OLTP-system, containing the long-term information, which storage in system of processing of transactions not effectively. Or for generation of the reports reflecting a situation during the previous periods
To not slow down performance of operations, the old data often leave from systems of processing of transactions. However for the reporting and drawing up of searches are sense to hold this information in Storehouse where time of the response not so is critical. As to reports on the last periods their creation often is complicated, and even at all it is impossible. Let, for example, it is necessary to receive the information on the salary of employees of the third category according to a grid of a payment by the beginning of each month 1997. It appears impracticable as recordings are stored{kept} in a database only about the current category of employees. For the decision of a similar sort of problems it is convenient to use Storehouse of the data, " slowly changing measurement " is supported so-called.
For restriction of access to a database of system and program logic of its{her} management to the persons using the data of OLTP-systems only for drawing up of reports and searches
In this case the basic purpose - protection of the information. For example, if the organization gives an opportunity of formation of reports and searches through the Internet it is meaningful to use Storehouse of the data.
One firms develop Storehouses of the data by virtue of all reasons described above, anothers have enough only one of them.
It is not necessary to assert{approve}, that realization of technology of Storehouses does not pursue commercial objectives. However they can be reached{achieved} only at the decision of one or several of the set forth above problems{tasks}.
If to look narrowly at them it is close{attentive}, it becomes clear, that necessity of creation of Storehouses often is connected to imperfection of systems of processing of transactions. However similar restrictions are inherent not in all systems of this class, and besides they are not always critical.
In the conclusion we shall repeat said above. To realize opportunities Business Intelligence, and also to receive more detailed information on clients and to have good " competitive advantages ", the organizations not simply enough to develop Storehouse of the data. It is necessary to solve (as a rule a trial and error method) not less a challenge about optimum use of Storehouse and the subsequent change of practice of business relations.
How to prevent transformation of storehouse of the data into a dump of dust
At creation of storehouses of the data not enough attention is given clearing of the information acting in him{it}. It is probably considered, that the more the size of storehouse, the better. It is vicious practice and the best way to transform storehouse of the data in a dump of dust. Given to clear it is necessary. In fact the information is diverse and is going to from various sources. Presence of sets of points of gathering of the information does{makes} process of clearing especially actual.
By and large, mistakes are supposed always, and completely will get rid of them will fail. Probably, sometimes there is a reason to reconcile to them, than to spend money and time for disposal of them. But, generally, it is necessary aspires any way to lower quantity{amount} of mistakes to a comprehensible level. Methods used for the analysis and so sin discrepancies so what for to aggravate a situation?
Besides it is necessary to take into account psychological aspect of a problem. If analitik it will not be confident figures which receives from storehouse of the data will try to not use them and will use the data received from other sources. It is asked, what then in general needs such storehouse?
Types of mistakes
We shall not consider{examine} such mistake as discrepancy of types, distinctions in formats of input and codings. I.e. cases when the information acts from various sources where for a designation of the same fact various agreements are accepted. A characteristic example of such mistake - a designation of a floor of the person. Somewhere he is designated as M/ZH, somewhere as 1/0, somewhere as True/False. With such mistakes struggle by means of the task of rules of code conversion and reduction of types. Such problems at the very least today are solved. Us problems of higher order, what are not solved in such elementary ways interest.
It is a lot of such variants of mistakes. Is besides mistakes characteristic only for any concrete subject domain or a problem{task}. But let's consider such which do not depend on a problem{task}:
1. Discrepancy of the information;
2. Misses{Passings} in the data;
3. Abnormal values;
4. Noise;
5. Mistakes of data input.
For the decision of each of these problems there are fulfilled methods. Certainly, mistakes can be corrected and manually, but at great volumes of the data it becomes rather problematic. Therefore we shall consider variants of the decision of these problems{tasks} in an automatic mode at the minimal participation of the person.
Discrepancy of the information
For the beginning it is necessary to solve, what exactly to consider{count} the contradiction. Strangely enough, it is a problem{task} not trivial. For example, the pension card in Russia needs to be changed in case of change of a surname, a name, a patronymic and a floor. So! It appears, that the person was born the woman, and has left on pension the man, the contradiction no!
After we shall be defined{determined} with that what to consider the contradiction and we shall find them, there are some variants of actions.
1. At detection of several inconsistent recordings to delete them. A method idle time, that is why easily sold. Sometimes it it happens quite enough. Here it is important to not be overzealous, differently we can splash out the baby together with water.
2. To correct the inconsistent data. It is possible to calculate probability of occurrence of each of inconsistent events and to choose the most probable. It is the most competent and correct method of job with contradictions.

|