- Tech Corner
- Articles
- Oracle Tips
Working with Oracle
Abstract
Paging
The objective is to retrieve blocks of n rows from the results. If the page is not the first one, then the first skip rows would be discarded until the n rows of the page of interest are read. The skip rows could be read then discarded, but that would be an expensive operation if thousands of rows were to be skipped. A better way is to have the SQL discard them so that only rows of the page of interest are returned. Below are the two most common SQL forms to do so in Oracle; the first form being the fastest according to AskTom [2].
-- Form 1 SELECT * FROM (SELECT Q.*, rownum AS rnum FROM (THE_QUERY) Q WHERE rownum <= (skip+n)) WHERE rnum > skip -- Form 2 SELECT * FROM (SELECT Q.*, rownum AS rnum FROM (THE_QUERY) Q) WHERE rnum BETWEEN skip+1 AND skip+n
Though the first form is faster, the second form is more generic. Experience has shown that the first form does not work well when THE_QUERY has JOINs or an ORDER BY clause without a unique column. In these cases, records could be repeated from page to page, resulting, some times, in identical pages.
Large Objects (LOBs)
Large objects are a set of datatypes designed to hold large amounts of data (in order of terabytes). There are three types of LOBs in Oracle: binary (BLOB), character (CLOB) and BFILE. The latter is a read-only datatype that points to an external file.
Restrictions
There are some restrictions on using LOBs. Table 1 below summarizes those restrictions. Oracle also requires that all columns in the ORDER BY clause must appear in the SELECT clause, so that the following statement yields to an error: SELECT col1, col3 FROM atable ORDER BY col2.
Unsupported SQL Operations | Example of unsupported usage |
---|---|
SELECT DISTINCT/UNIQUE | SELECT DISTINCT clobCol FROM... |
ORDER BY | SELECT ... ORDER BY clobCol |
GROUP BY | SELECT avg(num) FROM ... GROUP BY clobCol |
UNION, INTERSECT, MINUS (Note that UNION ALL works for LOBs) |
SELECT clobCol1 FROM table1 UNION SELECT clobCol2 FROM table2 |
JOIN queries | SELECT ... FROM ... WHERE table1.clobCol = table2.clobCol |
INDEX columns | CREATE INDEX clobIndx ON atable(clobCol)... |
Functions and Operations on LOBs
Aggregated functions (count, min, max, grouping) and unicode functions are not supported for use with LOBs. Table 2 lists the functions and operations by category and specifies whether they are supported in SQL or PL/SQL. CNV indicates the corresponding functions are performed by converting the CLOB to a character datatype such as VARCHAR2. In SQL environment, only the first 4K bytes of the CLOB are converted and used in the operation. In PL/SQL environment, only the first 32K bytes are converted and used.
Category | Operation / Function | SQL Example / Comments | SQL | PL/SQL |
---|---|---|---|---|
Concatenation | ||, concat() | SELECT clobCol1 || clobCol2 FROM atable | Yes | Yes |
Comparison | = , !=, >, >=, <, <=, <>, ^= | IF clobCol1 = clobCol2 THEN ... | No | Yes |
IN, NOT IN | IF clobCol NOT IN (clob1, clob2, clob3) THEN ... | No | Yes | |
BETWEEN | IF clobCol BETWEEN clobCol2 AND clobCol3 THEN ... | No | Yes | |
LIKE [ESCAPE] | IF clobCol LIKE '%pattern%' THEN ... | Yes | Yes | |
IS [NOT] NULL | ... WHERE clobCol IS NOT NULL | Yes | Yes | |
Character Functions | lower, nls_lower, upper, nls_upper |
... WHERE upper(clobCol1) = upper(clobCol2) | Yes | Yes |
lpad, rpad | SELECT rpad(clobCol, 10000, 'a') FROM... | Yes | Yes | |
trim, ltrim, rtrim | ... WHERE ltrim(clobCol, '0') = '100' | Yes | Yes | |
replace | SELECT replace(clobCol, 'old', 'new') FROM ... | Yes | Yes | |
substr | ... WHERE substr(clobCol, 1, 4) = 'this' | Yes | Yes | |
translate | SELECT translate(clobCol, '123', 'abc') FROM ... | cnv | cnv | |
ascii | SELECT ascii(clobCol) FROM ... | cnv | cnv | |
instr | ... WHERE instr(clobCol, 'book') = 11 | Yes | Yes | |
length | ... WHERE length(clobCol) < 50 | Yes | Yes | |
nlssort | ... WHERE nlssort(clobCol, 'NLS_SORT = French') > nlssort('S', 'NLS_SORT = French') | cnv | cnv | |
Regular Expression | regexp_like, regexp_replace, regexp_instr, regexp_substr | ... WHERE NOT regexp_like(clobCol, '%pattern%') | Yes | Yes |
Conversion | chartorowid | chartorowid(clobCol) | cnv | cnv |
hextoraw | hextoraw(clobCol) | No | cnv | |
to_date | to_date(clobCol) | cnv | cnv | |
to_number | to_number(clobCol) | cnv | cnv | |
to_timestamp | to_timestamp(clobCol) | No | cnv | |
to_char, to_nchar | to_char(clobCol) | Yes | Yes | |
to_clob, to_nclob | to_nchar(varchar2Clob) | Yes | Yes |
LOBs and JDBC
Data interface for LOBs
The usual way to set LOBs values is using java.sql.Blob
and java.sql.Clob
interfaces or Oracle's extensions oracle.sql.BLOB
and oracle.sql.CLOB
.
Connection con = DriverManager .getConnection("jdbc:oracle:thin:@localhost:1521:orcl", dbprops); PreparedStatmenet pstmt = con.prepareStatement("INSERT INTO blobtable VALUES (?)"); Blob blob = BLOB.createTemporary(con, true, BLOB.DURATION_SESSION); OutputStream out = blob.setBinaryStream(0); out.write(byteValues); pstmt.setBlob(1, blob);
In Oracle 10g, the setBytes
, setBinaryStream
, setString
, setCharacterStream
and setAsciiStream
methods of PreparedStatement
have been extended to work on BLOB and CLOB columns. According to Oracle's documentations, there is an automatic switching of setBytes
to setBinaryStream
for data larger than 2000 bytes (not 2KB) and of setString
to setCharacterStream
for data larger than 32766 characters.
Conversely, the getter methods have been extended to work on LOB columns. The same auto-switching rules apply to the getters as well.
However, there are bugs in the 10.1 driver that prevent some auto-switching to work properly. Even worse, setBinaryStream
and setCharacterStream
do not work for calling stored procedures passing large size of data. Fortunately, these bugs have been fixed in 10.2, but there are still some bugs related to using national characters (see Oracle & Unicode). Therefore, the safest way to work with LOBs is still using Blob
and Clob
interfaces.
Passing NULL values
It is good practice to use PreparedStatement.setNull
to set a null value to a LOB column, otherwise an ArrayIndexOutOfBoundsException may occur. You can download test 1 and run it to produce the error, or just look at its source. The test uses the below lobtest table and a stored procedure c_lobtest to insert a record into the table.
CREATE TABLE lobtest (id NUMBER, sz INTEGER, text CLOB, binary BLOB);
The test consists in inserting two records into the lobtest table, the first record with a text and no binary data, and the second record with no text and some binary data. Note the use of setBytes
and setString
on lines 78 and 90 respectively to set null values. The test works fine if only the second record is inserted, or by using data files bigger than 100 KB, or by replacing the stored procedure with a regular SQL INSERT statement.
According to the folks at Oracle, the error comes from the fact that the PL/SQL is invoked the first time with an assignment of a CLOB, which places the SQL in the shared pool with the CLOB parameter, then the second time with an assignment of a String, which is inconsistent with what is now in the shared pool. The workaround is shown below:
// insert text pstmt.setLong(1, id); pstmt.setInt(2, size); pstmt.setCharacterStream(3, new InputStreamReader(bain), size); pstmt.setNull(4, java.sql.Types.BLOB); // ... // insert binary pstmt.setLong(1, id); pstmt.setInt(2, size); pstmt.setNull(3, java.sql.Types.CLOB); pstmt.setBinaryStream(4, new FileInputStream(fis), size);
LONG Datatypes
LONG and LONG RAW are older datatypes used to store large data. Starting with 8i, Oracle deprecates LONG datatypes and encourages the use of LOBs over them. There are several advantages for doing so:
- LONG and LONG RAW are limited to 2GB versus 4GB for LOBs.
- LONG and LONG RAW require sequential access methods. (More on this later).
- A table can only have one LONG or LONG RAW column.
However, if you work with legacy applications or if you have to access the system tables, then you would probably run into LONG datatypes.
Streaming LONG Datatypes
The methods getBytes
and getString
of ResultSet
can be used to read the content of a LONG or LONG RAW column. However, to conserve memory, you can also read their content as a stream using getInputStream
, getAsciiStream
and getUnicodeStream
.
LONG RAW Data Conversions
A call to getInputStream
returns raw data, that is bytes in
its raw form.
A call to getAsciiStream
converts the raw bytes to hexadecimal
and returns the ASCII representation.
A call to getUnicodeStream
converts the raw bytes to hexadecimal
and return the Unicode characters.
LONG Data Conversions
When getAsciiStream
is used, the driver assumes that the underlying data in the database uses an US7ASCII or WE8ISO8859P1 character set. It just returns the bytes as corresponding ASCII characters. If the assumption is false, then the returned data could be meaningless.
When getUnicodeStream
is used, the driver returns a stream of Unicode characters in UTF-16 encoding. This applies to all underlying character set that Oracle supports.
When getBinaryStream
is used, there are two possible cases.
- If the character set of the database is US7ASCII or WE8ISO8859P1, then the driver returns a UTF-8 stream of bytes.
- Otherwise, the driver returns a US7ASCII stream of bytes.
Sequential Access
If a query fetches multiple columns and one of which is a LONG or LONG RAW column, then the columns must be read sequentially. Any attempt to read a column beyond the LONG or LONG RAW column closes the latter. Reading the LONG or LONG RAW column afterwards (in Java) will result in a "java.sql.SQLException: Stream has already been closed" error as shown in the below example.
// supposing we have the following query
RestulSet rs = stmt.executeQuery("SELECT col1, longCol, col3 FROM atable");
// reading in this order is ok
rs.getString( "col1" );
rs.getString( "longCol" );
rs.getString( "col3" );
// reading "col3" before "longCol" results in
// "java.sql.SQLException: Stream has already been closed"
rs.getString( "col1" );
rs.getString( "col3" );
rs.getString( "longCol" );
In the above example, if col3 must be read before longCol (for instance, the processing of longCol depends on the value of col3), then the order of the columns in the query can simply be rearranged so that col3 would come first.
Oracle & Unicode
Oracle supports storing of characters in different encoding than ASCII. This is commonly known as National Language Support (NLS) or Globalization Support.
NLS Parameters
A set of NLS parameters specifies, among others, the character set to be used by Oracle (i.e. the encoding used for storing characters in the database). The following SQL command displays all NLS parameters:
SQL> SELECT * FROM nls_database_parametes;
The ones that are of interest for us, as far as unicode support, are:
NLS_CHARACTERSET
, or database character set, defines the character set for the entire database (i.e. encoding for CHAR, VARCHAR2 and CLOB datatypes).NLS_NCHAR_CHARACTERSET
, or national character set, defines the character set for national-language columns (i.e. NCHAR, NVARCHAR2 and NCLOB).
A unicode encoding can be specified as database character set if all applications on the database support unicode. Otherwise using the national-language columns will allow using unicode along side with existing applications that only support ASCII. It also allows to slowly convert existing application to using unicode.
Setting Character Set
In Oracle 10g, the default values for NLS_CHARACTERSET is an 8-bit West European encoding (the exact encoding depends on the platform Oracle server is installed on) and the default value for NLS_NCHAR_CHARACTERSET is AL16UTF16, which is a UTF-16 encoding. Follow the below steps to change their values:
Log into the database using SQL*Plus with sysdba privilege:
% sqlplus sys/<password>@<oracle_sid> as sysdba
Run the database in restricted mode in order to alter it:
SQL> shutdown immediate -- or normal SQL> startup mount SQL> alter system enable restricted session; SQL> alter system set job_queue_processes=0; SQL> alter database open; SQL> alter database national character set UTF8; SQL> shutdown immediate -- or normal SQL> startup
Altering the database national character set (NLS_NCHAR_CHARACTERSET) is only allowed when there is no table in the database defined with a national-language column. Some default schema in Oracle already have tables with these columns, but they could be dropped as follow:
SQL> DROP USER oe CASCADE; SQL> DROP USER pm CASCADE;
I18n and JDBC
The Oracle JDBC driver (ojdbc14.jar
) only contains US7ASCII, WE8DEC, WE8ISO8859P1, UTF8 and AL16UTF16 character sets. If other character sets were to be used, then orai18n.jar
must be included as well.
By default, Oracle's JDBC treats all character columns as CHAR. In order to insert Java strings into NCHAR, NVARCHAR2 and NCLOB columns, OraclePreparedStatement.setFormOfUse
has to be invoked on each national-language column as in the below code example:
PreparedStatement pstmt = con.prepareStatement("INSERT INTO i18n VALUES (?,?,?)");
pstmt.setLong(1, id); // number column
pstmt.setString(2, "name"); // VARCHAR2 column
((OraclePreparedStatement)pstmt)
.setFormOfUse(3, OraclePreparedStatement.FORM_NCHAR);
pstmt.setString(3, "some unicode string"); // NVARCHAR2 column
Alternately, if the system property oracle.jdbc.defaultNChar
or the connection property defaultNChar
is set to true, then JDBC treats all character columns as national-language, thus there is no need to use setFormOfUse
on each national-language column.
However, if defaultNChar
is set to true, the database will implicitly convert all CHAR data into NCHAR. This conversion has a substantial performance impact. In which case, setFormOfUse(OraclePreparedStatement.FORM_CHAR)
could be used to tell the database not to convert CHAR datatypes.
Issues with defaultNChar
It appears that an "ORA-01460: unimplemented or unreasonable conversion requested" error will occur when using setCharacterStream
and setString
for passing strings (both ASCII and unicode) of length between 2K and 16K as national-language parameters to a stored procedure. This error occurs when using thin JDBC 10.x against 10.x and 9.x databases. It does not occur when using JDBC 9.2.0.6 against 9.2.0.6 database.
If the parameter is an NCLOB, then setClob
may be used as a workaround. However, I don't know of a workaround when the parameter is an NVARCHAR2.
Test 2 is a set of two tests that illustrate the issue.
Unicode and LIKE ESCAPE
Wildcard characters such as '%' and '_' may need to be escaped in a pattern of the LIKE clause. Oracle allows to specify the character used as escape character with the ESCAPE clause. For instance, the following query uses '\' to escape '_', thus looking for all texts that start with "SYS_".
SELECT * FROM atable WHERE textcol LIKE 'SYS\_%' ESCAPE '\';
For national-character columns, Oracle seems to implicitly convert the pattern to the encoding of the column. It seems to do the same with the escape character except when the column is an NVARCHAR2. That is, the above SQL query works fine if textcol is a NCLOB, but an error (ORA-01425: escape character must be character string of length 1) occurs if it is a NVARCHAR2. In that case, the escape character must be converted into the right encoding with to_nchar
.
Optimizing NOT IN
Oracle is very slow when a sub-query is used for the NOT IN clause [3]. (This is proper to Oracle, other databases work fine). An optimization of NOT IN consists in converting the query to use NOT EXISTS as shown below.
-- This query uses NOT IN and a sub-query SELECT name FROM Foo F1 WHERE F1.id NOT IN (SELECT id FROM Bar B1 WHERE B1.priority > 9) -- But it's much faster when using NOT EXISTS SELECT name FROM Foo F1 WHERE NOT EXISTS (SELECT id FROM Bar B1 WHERE B1.priority > 9 AND B1.id = F1.id) --Also faster by adding NOT NULL SELECT name FROM Foo F1 WHERE F1.id IS NOT NULL AND F1.id NOT IN (SELECT id FROM Bar B1 WHERE B1.id IS NOT NULL AND B1.priority > 9)
References
- [1] Oracle Database Online Documentation
- Online documentation for Oracle 10g release 2 (requires TechNet login)
- [2] Paging in Oracle
- Discussion on paging in Oracle on AskTom
- [3] Speeding up NOT IN
- "Speeding Up Queries with Semi-Joins and Anti-Joins: How Oracle Evaluates EXISTS, NOT EXISTS, IN, and NOT IN", Roger Schrag
Copyright © 2005-2006, Northwest Summit. All rights reserved.