Migracja strony kodowej

Ostatnio wystąpił problem z pewną bazą deweloperską, w której nie działały polskie znaki. Sprawdziłem parametry bazy poprzez zapytanie

select d.parameter Dparameter,
       d.value     Dvalue,
       i.value     Ivalue,
       s.value     Svalue
  from nls_database_parameters d, nls_instance_parameters i,
       nls_session_parameters s
 where d.parameter = i.parameter (+)
   and d.parameter = s.parameter (+)
order by 1

ze strony: http://www.databasejournal.com/features/oracle/article.php/3485216/The-Globalization-of-Language-in-Oracle—National-Language-Support.htm

Wartość parametru NLS_CHARACTERSET wskazała, że ustawiona jest zachodnioeuropejska strona kodowa WE8ISO8859P1 czyli ISO 8859-1. Konieczne było jej zmienienie albo na wschodnioeuropejską EE8ISO8859P2 czyli ISO 8859-2, albo na Unicode’ową AL32UTF8. W celu dokonania migracji skorzystałem z instrukcji ze strony http://download.oracle.com/docs/cd/B10500_01/server.920/a96529/ch10.htm

SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER SYSTEM ENABLE RESTRICTED SESSION;
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET AQ_TM_PROCESSES=0;
ALTER DATABASE OPEN;
ALTER DATABASE CHARACTER SET AL32UTF8;
SHUTDOWN IMMEDIATE;
STARTUP;

Wszystkie te operacje musiałem robić na maszynie, na której stała baza. SQL*Plusa uruchomiłem komendą

sqlplus / as sysdba

Pojawiły się dwa problemy wynikłe z faktu, że podczas migracji strony kodowej Oracle nie modyfikuje zapisanych danych a jedynie zmienia swoje ustawienia:

  1. Kodowania WE8ISO8859P1 i EE8ISO8859P2 zawsze używają jednego bajtu na znak podczas gdy AL32UTF8 używa od 1 do 4 bajtów. Jednak
    dotyczy to tylko typów CHAR i VARCHAR. W przypadku ustawionego kodowania wielobajtowego stosowane jest w CLOBach 2-bajtowe kodowanie. Zmiana
    strony kodowej na AL32UTF8 spowodowałaby chaos, ponieważ każda para znaków ISO 8859-1 byłaby traktowana jako jeden znak Unicode. Z tego powodu taka
    konwersja jest niemożliwa. Jedynym sposobem jest migracja strony kodowej poprzez import i export danych, ale ta droga nie została
    udokumentowana.
  2. Możliwe jest tylko zmigrowanie na stronę kodową będącą ścisłym nadzbiorem dotychczasowej. Ten warunek oznacza, że dla każdego ciągu bitów dozwolonego w dotychczasowym kodowaniu – ten ciąg bitów mapuje się na ten sam znak w nowym kodowaniu. 7-bitowe kodowanie US7ASCII spełnia ten warunek w stosunku do EE8ISO8859P2 i AL32UTF8, ale WE8ISO8859P1 już nie. Na przykład æ (ae) ma kod E6 w WE8ISO8859P1, ale w AL32UTF8 bajt E6 nie występuje samodzielnie podczas gdy w EE8ISO8859P2 znaku æ nie ma a bajt E6 to ć
    Nie była możliwa migracja z WE8ISO8859P1 na EE8ISO8859P2, więc zmieniłem jako SYS ustawienia Oracle’a:
update sys.props$ set VALUE$='US7ASCII' where NAME='NLS_CHARACTERSET'

Nie polecam tej metody na produkcyjnych bazach, ponieważ zmienianie systemowych ustawień bywa niebezpieczne. Miałem jednak backup bazy, więc zaryzykowałem. Po zmianie wartości w tabeli komenda

ALTER DATABASE CHARACTER SET EE8ISO8859P2;

przeszła bez problemu. Jeśli byłyby jakieś znaki o kodach powyżej 127 to byłyby one wyświetlane jako inne w wielu wypadkach znaki, ale nie było takich przypadków.

You May Also Like

Recently at storm-users

I've been reading through storm-users Google Group recently. This resolution was heavily inspired by Adam Kawa's post "Football zero, Apache Pig hero". Since I've encountered a lot of insightful and very interesting information I've decided to describe some of those in this post.

  • nimbus will work in HA mode - There's a pull request open for it already... but some recent work (distributing topology files via Bittorrent) will greatly simplify the implementation. Once the Bittorrent work is done we'll look at reworking the HA pull request. (storm’s pull request)

  • pig on storm - Pig on Trident would be a cool and welcome project. Join and groupBy have very clear semantics there, as those concepts exist directly in Trident. The extensions needed to Pig are the concept of incremental, persistent state across batches (mirroring those concepts in Trident). You can read a complete proposal.

  • implementing topologies in pure python with petrel looks like this:

class Bolt(storm.BasicBolt):
    def initialize(self, conf, context):
       ''' This method executed only once '''
        storm.log('initializing bolt')

    def process(self, tup):
       ''' This method executed every time a new tuple arrived '''       
       msg = tup.values[0]
       storm.log('Got tuple %s' %msg)

if __name__ == "__main__":
    Bolt().run()
  • Fliptop is happy with storm - see their presentation here

  • topology metrics in 0.9.0: The new metrics feature allows you to collect arbitrarily custom metrics over fixed windows. Those metrics are exported to a metrics stream that you can consume by implementing IMetricsConsumer and configure with Config.java#L473. Use TopologyContext#registerMetric to register new metrics.

  • storm vs flume - some users' point of view: I use Storm and Flume and find that they are better at different things - it really depends on your use case as to which one is better suited. First and foremost, they were originally designed to do different things: Flume is a reliable service for collecting, aggregating, and moving large amounts of data from source to destination (e.g. log data from many web servers to HDFS). Storm is more for real-time computation (e.g. streaming analytics) where you analyse data in flight and don't necessarily land it anywhere. Having said that, Storm is also fault-tolerant and can write to external data stores (e.g. HBase) and you can do real-time computation in Flume (using interceptors)

That's all for this day - however, I'll keep on reading through storm-users, so watch this space for more info on storm development.

I've been reading through storm-users Google Group recently. This resolution was heavily inspired by Adam Kawa's post "Football zero, Apache Pig hero". Since I've encountered a lot of insightful and very interesting information I've decided to describe some of those in this post.

  • nimbus will work in HA mode - There's a pull request open for it already... but some recent work (distributing topology files via Bittorrent) will greatly simplify the implementation. Once the Bittorrent work is done we'll look at reworking the HA pull request. (storm’s pull request)

  • pig on storm - Pig on Trident would be a cool and welcome project. Join and groupBy have very clear semantics there, as those concepts exist directly in Trident. The extensions needed to Pig are the concept of incremental, persistent state across batches (mirroring those concepts in Trident). You can read a complete proposal.

  • implementing topologies in pure python with petrel looks like this:

class Bolt(storm.BasicBolt):
    def initialize(self, conf, context):
       ''' This method executed only once '''
        storm.log('initializing bolt')

    def process(self, tup):
       ''' This method executed every time a new tuple arrived '''       
       msg = tup.values[0]
       storm.log('Got tuple %s' %msg)

if __name__ == "__main__":
    Bolt().run()
  • Fliptop is happy with storm - see their presentation here

  • topology metrics in 0.9.0: The new metrics feature allows you to collect arbitrarily custom metrics over fixed windows. Those metrics are exported to a metrics stream that you can consume by implementing IMetricsConsumer and configure with Config.java#L473. Use TopologyContext#registerMetric to register new metrics.

  • storm vs flume - some users' point of view: I use Storm and Flume and find that they are better at different things - it really depends on your use case as to which one is better suited. First and foremost, they were originally designed to do different things: Flume is a reliable service for collecting, aggregating, and moving large amounts of data from source to destination (e.g. log data from many web servers to HDFS). Storm is more for real-time computation (e.g. streaming analytics) where you analyse data in flight and don't necessarily land it anywhere. Having said that, Storm is also fault-tolerant and can write to external data stores (e.g. HBase) and you can do real-time computation in Flume (using interceptors)

That's all for this day - however, I'll keep on reading through storm-users, so watch this space for more info on storm development.