Why sometimes oracle 10g XE sucks big time

A few days ago, one of my web application that use as backend oracle (i'm running oracle-xe-10.2.0.1-0.1) hangs and refuse to work.
I was very surprised of this, because i was more than sure that i did not made any modifications on my web server.
First thing i look into it was the apache web server.I try to restart the web server, check the php lib file to see if it is corrupted or missing, check the ORA_HOME enviroment, check it the DMZ server where the oracle 10XE is running respond to my pings and telnets in 1521.Every thing was working as espected, except my web application.I was very frustrated and disappointed.
After a while, it hit me, why not check the sqlplus client from ORA_HOME path.
Tannnnaaaa...this was the problem.The sqlclient refuse to work or connect to the oracle server.


This is the output of the strace that i did on sqlplus

\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256
13100 lseek(6, 512, SEEK_SET) = 512
13100 read(6, "\337y
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
13100 lseek(6, 1024, SEEK_SET) = 1024
13100 read(6, "\25\7\'\0072\7>\7j\7\276\17$\'\6K5S\24TfT\307T(VsV
\222"..., 86) = 86
13100 brk(0x80af000) = 0x80af000
13100 times(NULL) = -2058704771
13100 times(NULL) = -2058704771
13100 times(NULL) = -2058704771
13100 times(NULL) = -2058704771
13100 times(NULL) = -2058704771
13100 times(NULL) = -2058704771


That is sqlplus process calls only "times" system call in a loop (in
fact, infinite loop)

After a little bit of google, i found where is the problem.

..in fact, after ~ 200 days uptime of your server. the oracle 10XE client sqlplus hangs in an infinite loop

This is oracle known bug 4612267

This is the most stupid bug i ever found in a commercial or non-commercial linux software.
Because ot this stupid bug(i am more than sure that is not so stupid, i will explain later) i have to rebuild my php5 lib file.

To fix this issue you have 3 sollutions

1- restart the web server, this is big NO NO NO
2- upgrade to oracle 11XE, this is another NO NO NO way, because 11G is not free.
3- use sqlplus from instantclient_11, rebuild the php lib to use the sqlplus form instantclient_11
4- applying the patch 4612267 to the oracle 10XE install.

First i tried sollutin number 4.For this i used the OPATCH software from oracle.I have not succeed to install this patch, because the patch was build for another oracle server version.This is very strange because on the oracle home page, from where i downloaded this patch, i've selected my oracle version.

Hehe, this is not a stupid bug ;)

The only sollution that worked for me was number 3.Downloaded instantclient11 from oracle download page (you need an account to perform this task) and recompile the php lib file

'./configure' '--with-apxs2=/usr/local/apache2/bin/apxs' '--with-pgsql' '--with-gd' '--with-zlib' '--with-oci8=instantclient,/opt/instantclient_11' '--enable-bcmath' '--with-sqlite' '--with-mysql' '--with-mysql-sock=/var/lib/mysql/mysql.sock' '--enable-mbstring' '--with-mcrypt'


--with-oci8=instantclient,/opt/instantclient_11, this is the sollution that worked for me.
Before this, i used to compile the php lib file, with the sqlplus from ORA_HOME (/usr/lib/oracle/xe/app/oracle/product/10.2.0/server)

Now, let me tell you why this bug is not so stupid ;)

The evil logic is, if you have server uptime more then 200 days, this is sure to be an production database server, not a home computer.So, the oracle developers(or project managers) decide to include this kinki not stupid bug, to force you to upgrade your oracle 10XE server, to buy a new version, to give them money :)Another thing, why thet patch refuse to work?It's make me wonder.

Until i reach "the" 4GB data limit of 10XE, i'll stick with it, for a while :)

Comments

Unknown said…
Thank you very much for your post!!! I've been in the same problem and followed your instructions....and it's solved :)!

Popular posts from this blog

Review of Yashica ML 50mm F2

NAT in opensolaris