Oracle database profiler tool
The Pattern tab displays information discovered about patterns within the attribute. Pattern discovery is the profiler's attempt at generating regular expressions for data that it discovered for a specific attribute.
Non-English characters are not supported in the pattern discovery process. Table describes the various measurement results available in the Pattern tab. This is the profiler's attempt to add semantic understanding to the data that it sees. Based on patterns and some other techniques, it tries to determine which domain bucket a certain attribute's data belongs to.
The Unique Key tab provides information about the existing unique keys that were documented in the data dictionary, and possible unique keys or key combinations that were detected by the data profiling operation. The unique keys that have No in the Documented column are the ones that are discovered by data profiling. It can be a unique key candidate, and you can then cleanse the noncompliant records.
You can also use the drill-down feature to view the cause of the duplicate phone numbers in the Data Drill panel. Table describes the various measurement results available in the Unique Key tab. A value of Yes indicates that a unique key exists in the data dictionary.
A value of No indicates that the unique key was discovered because of data profiling. From analyzing the column values, data profiling determines whether a unique key should be created on the column listed in the Local Attribute s column. The number of rows, in the source object, in which the attribute represented by Local Attribute is unique. The percentage of rows, in the source object, for which the attribute represented by Local Attribute is unique.
The Functional Dependency tab displays information about the attribute or attributes that seem to depend on or determine other attributes. You can use the Show list to change the focus of the report. Unique keys defined in the database are not discovered as functional dependencies during data profiling.
Table describes the various measurement results available in the Functional Dependency tab. Table Functional Dependency Results. Number of values in the Determinant attribute that were not determined by the Dependent attribute.
If you have an attribute that is always dependent on another attribute, then it is recommended that it be a candidate for a reference table. Suggestions are shown in the Type column.
Removing the attribute into a separate reference table normalizes the schema. The Functional Dependency tab also has a Graphical subtab so that you can view information graphically. You can select a dependency and property from the lists to view graphical data. By switching the Property to Non-Compliant, you can view the exceptions to this discovery. The Referential tab displays information about foreign keys that were documented in the data dictionary, and relationships discovered during profiling.
For each relationship, you can see the level of compliance. Information can be viewed from both the Tabular and Graphical subtabs. In addition, two other subtabs are available only in the Referential tab: Joins and Redundant Columns.
Table describes the various measurement results available in the Referential tab. Yes indicates that a foreign key on the column exists in the data dictionary. No indicates that the foreign key was discovered because of data profiling. From analyzing the column values, data profiling determines whether a foreign key should be created on the column represented by Local Attribute s.
For example, the EMP table contains 5 rows of employee data. There are two employees each in department 10 and 20 and one employee in department The DEPT table contains three rows of department data with deptno value 10, 20, and The cardinality range is This is because in EMP , the number of rows per distinct value ranges from 1 for deptno 30 to 2 deptno 10 and In DEPT , there is only one row for each distinct value 10, 20, and For example, you are analyzing two tables for referential relationships: the Employees table and the Departments table.
You can then click the hyperlinked Yes in the Discovered column to view the rows that did not follows the discovered foreign key relationship. You can also select the Graphical subtab to view information graphically. This view is effective for viewing noncompliant records, such as orphans.
To use the Graphical subtab, make a selection from the Reference and Property lists. The Joins subtab displays a join analysis on the reference selected in the Reference list. The results show the relative size and exact counts of the three possible outcomes for referential relationships: joins, orphans, and childless objects. The Joins represent the values that have values in both tables. You can drill into values on the diagram to view more details in the Data Drill panel.
The Redundant Columns subtab displays information about columns in the child table that are also contained in the primary table. Redundant column results are only available when perfectly unique columns are found during profiling. In this example, the Location column in the EMP table is a redundant column, because you can get the same information from the join.
The Data Rule tab displays the data rules that are defined because of data profiling for the table selected in the object tree. The details for each data rule include the following:. Origin: Represents the origin of the data rule. For example, a value of Derived indicates that the data rule is derived. The data rules on this tab reflect the active data rules in the Data Rule panel.
You do not directly create data rules on this tab. Attribute sets enable you to restrict a data profiling operation to a subset of columns from a table, view, or materialized view. You can decrease profiling time by excluding columns for which you do not need profiling results. Data profiling can only profile up to columns from a table, view, or materialized view at a time.
You can use an attribute set to select a set of or fewer columns to profile from the object. See "Viewing Profile Results" for more information about viewing profiling results. In the Attribute Sets section, click a blank area in the Name column, enter the name of the attribute set you want to create, and press the Enter key. On the Select Objects page, select the data object to profile and use the shuttle arrows to move the data object to the Selected list.
When the selected data object contains attribute sets, the Choose Attribute Set dialog box is displayed. On the Summary page, review the options you chose on the previous wizard pages and click Finish. Once you create a data profile, you can use the Data Profile Editor to modify its definition. You can either modify profile settings or add to and remove from a data profile.
To add objects, you can use either the menu bar options or the Select Objects tab of the Edit Data Profile dialog box. On the Edit Data Profile dialog box, edit any of the following properties of the data profile and click OK.
To modify the name or description of the data profile, on the Name tab, select the name or description and enter the new value. To add or remove objects, on the Select Objects tab, use the arrows to add objects to or remove objects from the data profile. To change the location that is used as the data profiling staging area, use the Data Locations tab. Use the arrows to move the new location to the Selected Locations section.
Ensure that you select New Configuration Default to set this location as the default profiling location for the data profile. Right-click the data profile in the Projects Navigator and select Open. On the Add Profile Tables dialog box, select the objects to add to the data profile.
Use the arrows to move them to the Selected section. You can achieve the best possible data profiling performance by ensuring that the following conditions are satisfied:. The appropriate data profiling configuration parameters are used when you perform data profiling. You can configure a data profile to optimize data profiling results. Use the configuration parameters to configure a data profile. If you know that certain types of analysis are not required for the objects that you are profiling, use the configuration parameters to turn off these types of data profiling.
If the source data for profiling is stored in an Oracle Database, it is recommended that the source schema should be located on the same database instance as the profile workspace. You can do this by installing the profiling workspace into the same Oracle Database instance as the source schema location. This avoids using a database link to move data from source to profiling workspace. To ensure good data profiling performance, the computer that runs Oracle Database must have certain hardware capabilities.
In addition, you must optimize the Oracle Database instance on which you are performing data profiling. The computer that runs Oracle Database needs multiple processors. Data profiling has been designed and tuned to take maximum advantage of the parallelism provided by Oracle Database. While profiling large tables more than 10 million rows , it is highly recommended that you use a multiple processor computer.
Hints are used in queries required to perform data profiling. It picks up the degree of parallelism from the initialization parameter file of the Oracle Database. The default initialization parameter file contains parameters that take advantage of parallelism. It is important that you ensure a high memory hit ratio during data profiling.
You can ensure this by assigning a larger size of the System Global Area. If possible, configure it to 2 gigabytes GB or 3 GB. For advanced database users, Oracle recommends that you observe the buffer cache hit ratio and library cache hit ratio. Data profiling processing frequently performs full table scans and massive joins.
It is recommended to have a minimum of two disks for each CPU. Configure the disks. Oracle recommends that you create logical stripe volumes on the existing disks, each striping across all available disks. Use the following formula to calculate the stripe width:. You can also use a stripe width that is a multiple of the value returned by the formula. To create and maintain logical volumes, you need volume management software such as Veritas Volume Manager or Sun Storage Manager.
If you are using Oracle Database 10 g or a later and you do not have any volume management software, then you can use the Automatic Storage Management feature of Oracle Database to spread the workload to disks. Create different stripe volumes for different tablespaces. It is possible that some tablespaces occupy the same set of disks.
MDM applications must provide a single consolidated view of data. You must first clean up a system's master data before they can share it with multiple connected entities. Oracle Warehouse Builder provides data profiling and data correction functionality that enables MDM applications to cleanse and consolidate data. Data profiling data analysis method that enables you to detect and measure defects in your source data.
For more information about data profiling, see "Overview of Data Profiling". Data rules are help ensure data quality by determining the legal data and relationships in the source data. You can import MDM-specific data rules, define your own data rules before you perform data profiling, or derive data rules based on the data profiling results. For more information about data rules, see "Overview of Data Rules".
Data correction enables you to correct any inconsistencies, redundancies, and inaccuracies in both the data and metadata. You can automatically create correction mappings to cleanse source data. DWR enables you to measure crucial business rules regularly. As you discover inconsistencies in the data, you can define and apply new data rules to ensure data quality. However, for MDM applications that do not run on an Oracle Database, you must set up a gateway with the third-party database.
Use the Oracle node under the Databases node in the Locations Navigator. Specify the details of the MDM database such as the user name, password, host name, port, service name, and database version.
Ensure that you select the location you created in step 1 while creating the module. Import metadata from your MDM application into the module created in step 3. Right-click the module and select Import. Applying data rules to tables enables you to determine if your table data complies with the business rules defined using data rules.
You can apply data rules you imported in step 5 or other data rules that you created. For more information about creating data profiles, see "Creating Data Profiles". View the data profiling results as described in "Viewing Profile Results".
Create correction mappings as described in "Steps to Create Correction Objects". Correct data and metadata using the correction mappings generated by Oracle Warehouse Builder as described in "Deploying Schema Corrections" and "Deploying Correction Mappings". For more information about using the Metadata Import dialog, click Help on this page. The cleansed and corrected data is contained in the correction objects created because of data profiling.
To be more efficient, you can write back only those rows that must be corrected. You can achieve this by modifying the generated correction mapping. Delete the branch that passes through the compliant rows unchanged this is the branch that contains the minus filter and the minus set operators. Retain only the corrected rows processing branch in the correction mapping.
For UCM, drag and drop the interface table that corresponds to the base table with which you are working. Use the MDM application tools and documentation to determine the base table for a particular interface table.
Update the base table with changes made to the interface table. Sources Supported by Oracle Warehouse Builder for Data Profiling Oracle Warehouse Builder data profiling can support the following source types: Oracle databases Data sources accessed through Oracle gateways or ODBC Flat file sources To profile flat files, you must import them into Oracle Warehouse Builder, create external tables based on the flat files, and then profile the external tables.
In such scenarios, the usage model is: Leave your existing ETL solution in place In Oracle Warehouse Builder, create a workspace and locations so that you can connect to your data sources. Create a data profile, add the objects to be profiled, and set any profiling parameters. Run your profiling jobs Explore the results in the Data Profile Editor Derive data rules based on the profiling results, understand better and document patterns in your data.
About the Data Profile Editor The Data Profile Editor provides the primary user interface for most of the data profiling, data rules and data cleansing functionality of Oracle Warehouse Builder. From the Data Profile Editor, you can: Set up and run data profiling, that is, attribute analysis and structural analysis of selected objects. Figure displays the Data Profile Editor. Performing Data Profiling Data profiling is resource intensive and profile execution times can be quite long, especially on large data volumes.
The following guidelines can increase the usefulness of your data profiling results while reducing profiling time and effort: Profile sources that impact numerous downstream objects, or where the objects affected are the most important outputs of your system. Tip: Use the data lineage and impact analysis features of Oracle Warehouse Builder to identify sources with wide impact in your system.
Prerequisites for Data Profiling Before profiling data objects, ensure that your project contains the correct metadata for the source objects on which you are performing data profiling. Steps to Perform Data Profiling To prepare for data profiling, you must create one or more data profile objects in your project. A data profile object stores the following: The set of objects to be profiled The types of profiling to perform on this set of objects Settings such as thresholds that control the profiling operations Results returned by the most recent execution of data profiling using these settings Metadata about data corrections generated from profiling results After you have decided which objects to profile, use the following steps to guide you through the profiling process: In the Projects Navigator, expand a project node and import all objects to profile into this project.
See "Prerequisites for Data Profiling". See "Creating Data Profiles". See "Configuring Data Profiles". Profile the data to perform the types of analysis specified in the previous step. See "Profiling Data". View and analyze the data profiling results. See "Viewing Profile Results". Creating Data Profiles Once your locations are prepared and the data is available, you must create a data profile object in Design Center.
To create a data profile: From the Projects Navigator, expand the project node in which you want to create a data profile. The Welcome page of the Data Profile Wizard is displayed.
On the Welcome page, click Next. The new data profile is added to the Data Profiles node in the navigation tree. Configuring Data Profiles You can, and should, configure a data profile before running it if there are specific types of analysis that you do, or do not, want to run. You can configure a data profile at one of the following levels: The entire profile all the objects contained in the profile An individual object in the data profile For example, the data profile contains three tables.
And you can analyze the data. SQL script creates tables with the columns, datatypes, and definitions as shown in Table , Table , and Table Total time spent in this unit in nanoseconds. The profiler does not set this field, but it is provided for the convenience of analysis tools. With Oracle database version 8. In general, if a user can debug a unit, the same user can profile it. Oracle advises that modules that are being profiled should be compiled DEBUG, since this provides additional information about the unit in the database.
Improving application performance is an iterative process. Each iteration involves the following steps:. A run involves running the application through benchmark tests with profiler data collection enabled. Stopping profiler data collection, which writes the collected data for the run into database tables. As the application executes, profiler data is collected in memory data structures that last for the duration of the run. Flushing the collected data involves storing collected data in database tables.
The tables should already exist in the profiler user's schema. SQL script creates the tables and other data structures required for persistently storing the profiler data. SQL drops the current tables. Also, it may take some time executing package initialization code the first time a package procedure or function is called. To avoid timing this overhead, " warm up" the database before collecting profile data.
To do this, run the application once without gathering profiler data. You can allow profiling across all users of a system, for example, to profile all users of a package, independent of who is using it.
You can allow profiling across all users of a system, for example, to profile all users of a package, independent of who is using it. SQL script which:. Stopping profiler data collection, which writes the collected data for the run into database tables. As the application executes, profiler data is collected in memory data structures that last for the duration of the run. Flushing the collected data involves storing collected data in the database tables created earlier.
The line value specifies which source line. If the row doesn't exist in the table, no code was generated for that line, and therefore it should not be mentioned in reports. If the source of a single statement is on a single line, any code generated for that statement will be attributed to that line number. In some cases, such as a simple declaration, or because of optimization, no code will be needed. If a statement spans multiple lines, any code generated for that statement will be attributed to some line in the range, but it is not guaranteed that every line in the range will have code attributed to it.
In such a case there will be gaps in the set of line values. In particular, multi-line SQL-related statements may appear to be on a single line usually the first.
When multiple statements are on the same line, the profiler will combine the occurrences for each statement. This may be confusing if a line has embedded control flow. For example, if 'then In general, profiler and coverage reports are most easily interpreted if each statement is on its own line. Each routine in this package has two versions that allow you to determine how errors are reported. In each case, the parameters of the function and procedure are identical.
Only the method by which errors are reported differs. If there is an error, there is a correspondence between the error codes that the functions return, and the exceptions that the procedures raise. A 0 return value from any function denotes successful completion; a nonzero return value denotes an error condition. The possible errors are as follows:. Check whether the profiler tables have been created, are accessible, and that there is adequate space.
There is a mismatch between package and database implementation. The only recovery is to install the correct version of the package. This function flushes profiler data collected in the user's session.
0コメント