Home
Home Page
The menu of a choice in forms
The loader in HTML
Konfigurirovanie a site with the help of a file .htaccess
Options-Indexes
Perenapravlenie on other address
Replacement with the image. Method Farnera
Example the first. Headings of pages
Replace cautiously
Method of overlapping by a background (Cover-up Method)
WAP (Wireless Application Protocol)
Reduction of time of loading of webs - pages with help CSS
Mnogozadachnost` in JavaMe at HTTP connection
Interaction of streams.
Nine advice for job with Web 2.0
Beginning php to the programmer or how to start to earn
Substantiation of necessity of Storehouse of the data
Misses in the data
The directory by mistakes Oracle
Forms in HTML documents
Links
 

Misses in the data


Very serious problem. It in general a scourge for the majority of storehouses of the data. The majority of methods of forecasting proceed from the assumption, that the data act a uniform constant stream. In practice such meets extremely seldom. Therefore one of the most claimed scopes of storehouses of the data - forecasting - appears realized not qualitatively or with significant restrictions. For struggle against this phenomenon it is possible to use the following methods:


1. Approximation and extrapolation. I.e. if there are no data in any point, we take its{her} vicinity and we calculate under known formulas value in this point, adding corresponding recording in storehouse. Well it works for the ordered data. For example, data on daily sales of products.

2. Definition of the most plausible value. For this purpose the vicinity of a point, and all data undertakes not. This method is applied to the disorder information, i.e. a case when we are not capable to define{determine} that is a vicinity of a researched point.



Abnormal values


Rather often there are events which are strongly beaten out from the general{common} picture. And it is the best way to modify such values. It is connected to that, that means of forecasting know nothing about the nature of processes. Therefore any anomaly will be perceived as completely normal value. Because of it the picture of the future will be strongly deformed. Any casual failure or success will be considered as law.


There is a method of struggle and against this misfortune is robastnye ratings. These are methods steady against strong indignations. We estimate the available data to everything, that leaves for allowable borders, and we use one of the following actions:


1. Value leaves;

2. It is replaced with the nearest boundary value.



Noise


Almost always at the analysis we collide{face} with noise. Noise does not bear{carry} any helpful information, and only prevents to make out a picture precisely. Methods of struggle against this phenomenon a little.


1. The spectral analysis. By means of him{it} we can otsech` high-frequency components of the data. Easier speaking, it is often and insignificant fluctuations about the basic signal. And, changing width of a spectrum, it is possible to choose what sort noise we want to clean{remove}.

2. Avtoregressionye methods. This enough widespread method is actively applied at the analysis of time numbers{lines} and reduced to a presence{finding} of function which describes process plus noise. Actually noise after that it is possible to remove and leave the basic signal.



Mistakes of data input


In general it is a subject for separate conversation since the quantity{amount} of such types of mistakes is too great, for example, typing errors, conscious distortion of the data, discrepancy of formats, and it yet not considering the typical mistakes connected to features of job of the application on data input. For struggle against the majority of them there are fulfilled methods. Some things are obvious, for example, before entering of the data into storehouse it is possible to inspect formats. Some more refined. For example, it is possible to correct typing errors on the basis of a various sort of thesauruses. But, in any case to clear it is necessary and from such mistakes.



The resume


The dirty data represent very big problem. Actually they can bring to nothing all efforts on creation of storehouse of the data. And, the question is not single operation, and about a permanent job in this direction. Only not there where do not litter, and there where clean{remove}. An ideal variant would be creation of a gateway through which pass all data getting in storehouse.


The variants of the decision of problems described above not unique. There are still more many methods of processing, starting{beginning} from expert systems and finishing{stopping} nejrosetjami. Many of these technologies, are realized as freeware components and programs which you can download from our site. The main thing to manage competently them to use. It is necessary to take into account that methods of clearing are strongly adhered to a subject domain. All depends on a field of activity of the organization and purpose{appointment} of storehouse of the data practically. That for one very valuable information is noise for others. If we will have aprioristic information on a problem{task} quality of clearing of the data can be increased by orders. Besides it is necessary to integrate qualitatively this gateway with available sources of the data.


Mechanisms of kill should become the same integral attribute of storehouses of the data as OLAP. Otherwise in mountain of the collected dust it will be practically impossible to find a grain useful. Can, yet everyone divide{share} this opinion, but all business only at a rate of storehouses. In process of his{its} increase in the sizes users will necessarily come to the same opinion.


Packing and check of base Access

{

here is a function i have made to compact and repair an access database.

exclusive access to the db is required!!

}


uses

comobj;


function compactandrepair (db: string): boolean; {db = path to access database}

var

v: olevariant;

begin

result: = true;

try

v: = createoleobject ('jro.jetengine');

try

v.compactdatabase (' provider=microsoft.jet.oledb.4.0; data source = ' + db,

' provider=microsoft.jet.oledb.4.0; data source = ' + db +'x; jet oledb:engine type=5 ');

deletefile (db);

renamefile (db +'x ', db);

finally

v: = unassigned;

end;

except

result: = false;

end;

end;