Поиск по этому блогу

среда, 10 июля 2019 г.

Pentaho BI Server and timestamp(datepicker) field in report

I was fighting with this problem entire week(with switching to different projects/tasks) and I didn't find solutions in internet/docs.

The Problem

  1. Report created in Pentaho Report Designer
  2. Parameters - timestamp fields(date picker view)
  3. Use API for downloading reports


API call(using curl)

 curl "https://pentaho.server.internal.org/pentaho/api/repos/blablabla/prpt.report?date_from=2019-07-08T00:00:00.000-0700&date_till=2019-07-08T00:59:59.999-0700&output-target=table/csv;page-mode=stream" -o output.csv


So correct format is: 2019-07-08T00:00:00.000-0700. May be this information will help to save some time to the person who finds this article.

вторник, 12 марта 2019 г.

Pentaho PDI logging: fixing concurrency issues

If you are running single thread logging for PDI transformation to database you are good.
I am using MySQL (Percona cluster)
I got multiple  issues with concurrent inserts/updates to this table:

ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : Unable to write log record to log table
or
Deadlock found when trying to get lock; try restarting transaction

What major issues with DB schema I see:
1) ID_BATCH  column is generated on Pentaho side, it locks tables "LOCK TABLE WRITE" to do update, this is why I got Deadlock. - REMOVE ID
2)  CHANNEL_ID - is not indexed by default - ADD Index:
alter table log_DB.TABLE_NAME add index idx_2 (CHANNEL_ID);
Every time you need to comment out drop index if you do SQL schema changes from PDI otherwise PDI will remove it.
3) If possible decrease number of logged lines(it will also speed up insert).
4) Don't use "Log record timeout(in days)". Solution: do delete from table in separate Job/transformation.
5) It's better to create primary key for table:
alter table log_DB.TABLE_NAME add column id int primary key auto_increment;


To Pentaho team: looks like it's very old problem. I spent 1 hour to find solution that work for me.... Why can't you fix it?

Bug:
https://jira.pentaho.com/browse/PDI-2054

My version is 8.2 0_o





суббота, 23 февраля 2019 г.

renaming column in ClickHouse

If your CH server supports ALTER TABLE UPDATE(if no, sorry but it's time to migrate to new version)  here is trick:

!Before doing it please test on your test data!



create new column

ALTER TABLE my_super_table
    ADD COLUMN
    new_name UInt32 DEFAULT CAST(0, 'UInt32');


later set value

ALTER TABLE my_super_table
    UPDATE new_name = old_name WHERE 1;

Not sure but I think it's better to wait until you merges are finished

drop column

ALTER TABLE my_super_table
    DROP COLUMN old_name;



That's it!



пятница, 3 августа 2018 г.

четверг, 26 апреля 2018 г.

Percona Live 2018


This year I visited US Percona Live 2018. A lot of amazing topics, from my perspective the best one was from ClickHouse Team: "ClickHouse Deep Dive".

Also good ones:
  • "Designing and launching the next-generation database system @ Slack: from whiteboard to production"
  • "The Evolution of Backup/Restore System in Facebook" 



вторник, 27 февраля 2018 г.

среда, 7 февраля 2018 г.

surprise with "Start slave" on MySQL from python

I worked on some tool that checks slave status and starts it in some cases automatically. Here is strange MySQL(Percona in my case) behaviour that I stepped on. 
Topology is: Master->Slave.(Multi-threaded replication).
Important slave configuration: Master_Info_File: mysql.slave_master_info,
Script logic is pretty simple:
  1. Connect to DB
  2. Get Slave status
  3. Check if it's my case
  4. Start slave
  5. Get Slave status & and return a result.
From first glance it looks correct and I started to test it and got some unexpected results: start slave got locked and script was waiting, at this point of time it was not clear... I started to investigate the issue.
See part of innodb status:
------- TRX HAS BEEN WAITING 20 SEC FOR THIS LOCK TO BE GRANTED:
RECORD
LOCKS space id 5 page no 3 n bits 88 index `PRIMARY` of table
`mysql`.`slave_worker_info` trx id 1687457 lock mode S waiting
------------------
---TRANSACTION
1687450, ACTIVE 262 sec
4
lock struct(s), heap size 1184, 10 row lock(s)
MySQL
thread id 55, OS thread handle 0x7f4748d43700, query id 1260 XX.XX.XX.XX
[username] Waiting for slave thread to start
text unavailable>
Thread #55 is a connect from my tool. First one is system slave thread. It looking as Deadlock, let me explain my thoughts:
By default at least python(MySQL-python)works with autocommit=OFF(I think most of MySQL libs for various scripting languages works so). When script gets slave status it holds a lock because master_info_repository=TABLE, and if you check engine of the tables(mysql.slave_master_info) they are InnoDB.
Connect #55 was trying to start slave, system tread waited for releasing lock `mysql`.`slave_worker_info` table, that was taken by #55.
Was fixed by enabling autocommit for connection.
#MySQL #Percona #Python