import z bazy z błędnym kodowaniem

Ostatnio musiałem importować dane z bazy danych z kodowaniem w której zachodnioeuropejskiego w której były przechowywane polskie znaki. Było z tym trochę problemów, więc opiszę rozwiązanie

Zagadnienie Baza A – źródłowa, u klienta. Kodowanie zachodnioeuropejskie WE8DEC. Przechowywane są w niej jednak polskie znaki w kodowaniu EE8ISO8859P2 czyli ISO-8859-2. Takie kodowanie stosuje klient bazy. Oracle pozwala na takie kombinacje. Miałem kiedyś do czynienia z bazą danych w kodowaniu US7ASCII, która też przechowywała polskie znaki, mimo że to kodowanie zawiera tylko na znaki do kodu 127 a więc bez narodowych. Można zapisywać lub odczytywać dane z poziomu klienta tak aby konwersje nie miały miejsca – dzięki temu obce kodowanie nie przeszkadza. Problem pojawił się przy dostępie przez DB linki, ale o tym za chwilę. Baza B – docelowa, u nas. Optymalne kodowanie AL32UTF8 pozwalające przechowywać wszystkie znaki łącznie ze wschodnioazjatyckimi.

Próba rozwiązania Pierwsza próba importu polegała na wykonaniu insert into /tabela z B/ select … from /tabela z A/. Przy takiej operacji Oracle robił konwersję znaków z kodowania zachodnioeuropejskiego na unicode. W efekcie otrzymywaliśmy “krzaczki”, ale były to różne “krzaczki”, tak się przynajmniej na początkowo wydawało. Skoro literka Ą ma w kodowaniu ISO 8859-2 kod 0xA1 a pod tym kodem w ISO 8859-1 widnieje znak ¡, więc importował się np. ZWI¡ZEK. Wydawało się, że wystarczy tylko dopisać funkcję konwertującą odpowiedni “krzaczek” na odpowieni polski znak i będzie po kłopocie. Okazało się jednak, że literki Ś i Ż konwertują się na ten sam symbol � oznaczający nieznany znak. Rozwiązanie Pojawił się pomysł, aby funkcją utl_raw.cast_to_raw skonwertować oryginalny VARCHAR2 z systemu A na tym RAW, potem zaimportować do B i tam funkcją utl_raw.cast_to_varchar2 przekształcić z RAW na VARCHAR2 omijając po drodze konwersję. Okazało się jednak, że podczas odwoływania się z B do A poprzez dblink konwersja następuje przed wywołaniem  utl_raw.cast_to_raw i to mimo wymuszenia, aby funkcja się wykonywała na zdalnym serwerze. Konieczne okazało się utworzenie na A widoków zwracających wynik operacji utl_raw.cast_to_raw na polach z polskimi znakami. Dzięki temu poprzez dblink importowane są z A do B tablice bajtów typu RAW. Potem można w B przekonwertować wpis z RAW do VARCHAR2, ale najpierw trzeba przekształcić odebraną tablicę bajtów aby tekst zapisany w niej w kodowaniu EE8ISO8859P2 został przekształcony na tekst zapisany w kodowaniu bazy B – AL32UTF8. Polecenie wygląda więc następująco utl_raw.cast_to_varchar2(utl_raw.convert(kolumna_z_polskimi_znakami, ‘AMERICAN_AMERICA.AL32UTF8’, ‘AMERICAN_AMERICA.EE8ISO8859P2’)) i rozwiązało problem konwersji.

You May Also Like

Sample for lift-ng: Micro-burn 1.0.0 released

During a last few evenings in my free time I've worked on mini-application called micro-burn. The idea of it appear from work with Agile Jira in our commercial project. This is a great tool for agile projects management. It has inline tasks edition, drag & drop board, reports and many more, but it also have a few drawbacks that turn down our team motivation.

Motivation

From time to time our sprints scope is changing. It is not a big deal because we are trying to be agile :-) but Jira's burndowchart in this situation draw a peek. Because in fact that chart shows scope changes not a real burndown. It means, that chart cannot break down an x-axis if we really do more than we were planned – it always stop on at most zero.

Also for better progress monitoring we've started to split our user stories to technical tasks and estimating them. Original burndowchart doesn't show points from technical tasks. I can find motivation of this – user story almost finished isn't finished at all until user can use it. But in the other hand, if we know which tasks is problematic we can do some teamwork to move it on.

So I realize that it is a good opportunity to try some new approaches and tools.

Tools

I've started with lift framework. In the World of Single Page Applications, this framework has more than simple interface for serving REST services. It comes with awesome Comet support. Comet is a replacement for WebSockets that run on all browsers. It supports long polling and transparent fallback to short polling if limit of client connections exceed. In backend you can handle pushes in CometActor. For further reading take a look at Roundtrip promises

But lift framework is also a kind of framework of frameworks. You can handle own abstraction of CometActors and push to client javascript that shorten up your way from server to client. So it was the trigger for author of lift-ng to make a lift with Angular integration that is build on top of lift. It provides AngularActors from which you can emit/broadcast events to scope of controller. NgModelBinders that synchronize your backend model with client scope in a few lines! I've used them to send project state (all sprints and thier details) to client and notify him about scrum board changes. My actor doing all of this hard work looks pretty small:

Lift-ng also provides factories for creating of Angular services. Services could respond with futures that are transformed to Angular promises in-fly. This is all what was need to serve sprint history:

And on the client side - use of service:


In my opinion this two frameworks gives a huge boost in developing of web applications. You have the power of strongly typing with Scala, you can design your domain on Actors and all of this with simplicity of node.js – lack of json trasforming boilerplate and dynamic application reload.

DDD + Event Sourcing

I've also tried a few fresh approaches to DDD. I've organize domain objects in actors. There are SprintActors with encapsulate sprint aggregate root. Task changes are stored as events which are computed as a difference between two boards states. When it should be provided a history of sprint, next board states are computed from initial state and sequence of events. So I realize that the best way to keep this kind of event sourcing approach tested is to make random tests. This is a test doing random changes at board, calculating events and checking if initial state + events is equals to previously created state:



First look

Screenshot of first version:


If you want to look at this closer, check the source code or download ready to run fatjar on github.During a last few evenings in my free time I've worked on mini-application called micro-burn. The idea of it appear from work with Agile Jira in our commercial project. This is a great tool for agile projects management. It has inline tasks edition, drag & drop board, reports and many more, but it also have a few drawbacks that turn down our team motivation.

How to automate tests with Groovy 2.0, Spock and Gradle

This is the launch of the 1st blog in my life, so cheers and have a nice reading!

y u no test?

Couple of years ago I wasn't a big fan of unit testing. It was obvious to me that well prepared unit tests are crucial though. I didn't known why exactly crucial yet then. I just felt they are important. My disliking to write automation tests was mostly related to the effort necessary to prepare them. Also a spaghetti code was easily spotted in test sources.

Some goodies at hand

Now I know! Test are crucial to get a better design and a confidence. Confidence to improve without a hesitation. Moreover, now I have the tool to make test automation easy as Sunday morning... I'm talking about the Spock Framework. If you got here probably already know what the Spock is, so I won't introduce it. Enough to say that Spock is an awesome unit testing tool which, thanks to Groovy AST Transformation, simplifies creation of tests greatly.

An obstacle

The point is, since a new major version of Groovy has been released (2.0), there is no matching version of Spock available yet.

What now?

Well, in a matter of fact there is such a version. It's still under development though. It can be obtained from this Maven repository. We can of course use the Maven to build a project and run tests. But why not to go even more "groovy" way? XML is not for humans, is it? Lets use Gradle.

The build file

Update: at the end of the post is updated version of the build file.
apply plugin: 'groovy'
apply plugin: 'idea'

def langLevel = 1.7

sourceCompatibility = langLevel
targetCompatibility = langLevel

group = 'com.tamashumi.example.testwithspock'
version = '0.1'

repositories {
mavenLocal()
mavenCentral()
maven { url 'http://oss.sonatype.org/content/repositories/snapshots/' }
}

dependencies {
groovy 'org.codehaus.groovy:groovy-all:2.0.1'
testCompile 'org.spockframework:spock-core:0.7-groovy-2.0-SNAPSHOT'
}

idea {
project {
jdkName = langLevel
languageLevel = langLevel
}
}
As you can see the build.gradle file is almost self-explanatory. Groovy plugin is applied to compile groovy code. It needs groovy-all.jar - declared in version 2.0 at dependencies block just next to Spock in version 0.7. What's most important, mentioned Maven repository URL is added at repositories block.

Project structure and execution

Gradle's default project directory structure is similar to Maven's one. Unfortunately there is no 'create project' task and you have to create it by hand. It's not a big obstacle though. The structure you will create will more or less look as follows:
<project root>

├── build.gradle
└── src
├── main
│ ├── groovy
└── test
└── groovy
To build a project now you can type command gradle build or gradle test to only run tests.

How about Java?

You can test native Java code with Spock. Just add src/main/java directory and a following line to the build.gradle:
apply plugin: 'java'
This way if you don't want or just can't deploy Groovy compiled stuff into your production JVM for any reason, still whole goodness of testing with Spock and Groovy is at your hand.

A silly-simple example

Just to show that it works, here you go with a basic example.

Java simple example class:

public class SimpleJavaClass {

public int sumAll(int... args) {

int sum = 0;

for (int arg : args){
sum += arg;
}

return sum;
}
}

Groovy simple example class:

class SimpleGroovyClass {

String concatenateAll(char separator, String... args) {

args.join(separator as String)
}
}

The test, uhm... I mean the Specification:

class JustASpecification extends Specification {

@Unroll('Sums integers #integers into: #expectedResult')
def "Can sum different amount of integers"() {

given:
def instance = new SimpleJavaClass()

when:
def result = instance.sumAll(* integers)

then:
result == expectedResult

where:
expectedResult | integers
11 | [3, 3, 5]
8 | [3, 5]
254 | [2, 4, 8, 16, 32, 64, 128]
22 | [7, 5, 6, 2, 2]
}

@Unroll('Concatenates strings #strings with separator "#separator" into: #expectedResult')
def "Can concatenate different amount of integers with a specified separator"() {

given:
def instance = new SimpleGroovyClass()

when:
def result = instance.concatenateAll(separator, * strings)

then:
result == expectedResult

where:
expectedResult | separator | strings
'Whasup dude?' | ' ' as char | ['Whasup', 'dude?']
'2012/09/15' | '/' as char | ['2012', '09', '15']
'nice-to-meet-you' | '-' as char | ['nice', 'to', 'meet', 'you']
}
}
To run tests with Gradle simply execute command gradle test. Test reports can be found at <project root>/build/reports/tests/index.html and look kind a like this.


Please note that, thanks to @Unroll annotation, test is executed once per each parameters row in the 'table' at specification's where: block. This isn't a Java label, but a AST transformation magic.

IDE integration

Gradle's plugin for Iintellij Idea

I've added also Intellij Idea plugin for IDE project generation and some configuration for it (IDE's JDK name). To generate Idea's project files just run command: gradle idea There are available Eclipse and Netbeans plugins too, however I haven't tested them. Idea's one works well.

Intellij Idea's plugins for Gradle

Idea itself has a light Gradle support built-in on its own. To not get confused: Gradle has plugin for Idea and Idea has plugin for Gradle. To get even more 'pluginated', there is also JetGradle plugin within Idea. However I haven't found good reason for it's existence - well, maybe excluding one. It shows dependency tree. There is a bug though - JetGradle work's fine only for lang level 1.6. Strangely all the plugins together do not conflict each other. They even give complementary, quite useful tool set.

Running tests under IDE

Jest to add something sweet this is how Specification looks when run with jUnit  runner under Intellij Idea (right mouse button on JustASpecification class or whole folder of specification extending classes and select "Run ...". You'll see a nice view like this.

Building web application

If you need to build Java web application and bundle it as war archive just add plugin by typing the line
apply plugin: 'war'
in the build.gradle file and create a directory src/main/webapp.

Want to know more?

If you haven't heard about Spock or Gradle before or just curious, check the following links:

What next?

The last thing left is to write the real production code you are about to test. No matter will it be Groovy or Java, I leave this to your need and invention. Of course, you are welcome to post a comments here. I'll answer or even write some more posts about the subject.

Important update

Spock version 0.7 has been released, so the above build file doesn't work anymore. It's easy to fix it though. Just remove last dash and a word SNAPSHOT from Spock dependency declaration. Other important thing is that now spock-core depends on groovy-all-2.0.5, so to avoid dependency conflict groovy dependency should be changed from version 2.0.1 to 2.0.5.
Besides oss.sonata.org snapshots maven repository can be removed. No obstacles any more and the build file now looks as follows:
apply plugin: 'groovy'
apply plugin: 'idea'

def langLevel = 1.7

sourceCompatibility = langLevel
targetCompatibility = langLevel

group = 'com.tamashumi.example.testwithspock'
version = '0.1'

repositories {
mavenLocal()
mavenCentral()
}

dependencies {
groovy 'org.codehaus.groovy:groovy-all:2.0.5'
testCompile 'org.spockframework:spock-core:0.7-groovy-2.0'
}

idea {
project {
jdkName = langLevel
languageLevel = langLevel
}
}

Need to make a quick json fixes – JSONPath for rescue

From time to time I have a need to do some fixes in my json data. In a world of flat files I do this with grep/sed/awk tool chain. How to handle it for JSON? Searching for a solution I came across the JSONPath. It quite mature tool (from 2007) but I haven't hear about it so I decided to share my experience with others.

First of all you can try it without pain online: http://jsonpath.curiousconcept.com/. Full syntax is described at http://goessner.net/articles/JsonPath/



But also you can download python binding and run it from command line:
$ sudo apt-get install python-jsonpath-rw
$ sudo apt-get install python-setuptools
$ sudo easy_install -U jsonpath

After that you can use inside python or with simple cli wrapper:
#!/usr/bin/python
import sys, json, jsonpath

path = sys.argv[
1]

result = jsonpath.jsonpath(json.load(sys.stdin), path)
print json.dumps(result, indent=2)

… you can use it in your shell e.g. for json:
{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
}
}

You can print only book nodes with price lower than 10 by:
$ jsonpath '$..book[?(@.price 

Result:
[
{
"category": "reference",
"price": 8.95,
"title": "Sayings of the Century",
"author": "Nigel Rees"
},
{
"category": "fiction",
"price": 8.99,
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"author": "Herman Melville"
}
]

Have a nice JSON hacking!From time to time I have a need to do some fixes in my json data. In a world of flat files I do this with grep/sed/awk tool chain. How to handle it for JSON? Searching for a solution I came across the JSONPath. It quite mature tool (from 2007) but I haven't hear about it so I decided to share my experience with others.