한글 파일명의 디지털 파일 업로드시 메타데이터 처리 오류 (언더바로 표시되는 문제)

edited April 2014 in AtoM
AtoM 기술항목에 디지털 파일을 import하면 한글 파일명이 언더바(_)로 처리됩니다.
예를 들어 '안대진 기증기록물 1.pdf' 가 '__________1.pdf'로 표기됩니다.

filename_korean_2014-03-28 오전 3.26.07


이 문제 때문에 한글파일명을 모두 영문으로 수정해야 할지 고민 중이었습니다.
AtoM 구글 포럼에 문의한 결과 한글, 일본어 등 멀티바이트 문자들에 대해 동일한 에러가 발생됨을 확인했습니다.



Artefactual에서는 이 문제를 버그로 등록했습니다.




아래는 artefactual의 답변 내용입니다.

Hi Daejin,


I have tested with a korean filename in AtoM 2.0.1 and I get the same error, an underline.


I have created a bug report: https://projects.artefactual.com/issues/6545.


I am hoping that other users that are working in multi-byte character languages, like Thai and Japanese might respond to this post and let us know if they have created a solution or workaround.


Jessica



On Friday, March 28, 2014 2:24:59 PM UTC-7, Dan Gillean wrote:


Hi Daejin,


Interesting - it appears that the Korean characters are displaying fine in AtoM - just not in the file name.


My theory is that either this has to do with specific character-encoding issues in your local machine prior to upload, or that some library in the application is not using UTF-8 encoding. I will ask a developer to take a look and respond.


Regards,


Dan Gillean, MAS, MLIS


AtoM Product Manager / Systems Analyst,


Artefactual Systems, Inc.


604-527-2056


@accesstomemory



On Thu, Mar 27, 2014 at 11:36 AM, djahhn <djahhn@gmail.com> wrote:


Hi,


This is Daejin from Korea.


When I import digital objects, filename didn't show exactly.


We got many digital file with korean filename.


Please, give me a solution.


Regards.

Daejin

Comments

  • edited April 2014
    이 문제에 대한 artefactual의 추가 답변입니다.
    업로드시 파일 네임을 삭제(sanitizing)하는 코드를 삭제하면 해결된다고 합니다.
    아래 메일 링크대로 QubitDigitalObject.php를 수정해 봤으나 업로드 자체가 안되는 에러가 발생했습니다.
    삭제한 코드는 아래와 같습니다.

      protected static function sanitizeFilename($filename)
      {
        return preg_replace('/[^a-z0-9_\.-]/i', '_', $filename);
      }



    artefactual의 답변 내용은 아래와 같습니다.






    Hi Daejin,

    This appears to be occurring because the filename is being sanitized on upload - you can see where in the code, here: https://github.com/artefactual/atom/blob/2.x/lib/model/QubitDigitalObject.php#L1427-L1430


    Sanitizing is done to remove invalid characters that cannot appear in filenames or URLs - but I believe that in the past, when this was first implemented, multi-byte characters were not supported in file names at the time, and therefore were also stripped. This has changed now, I believe, so it could be fixed in AtoM if the bug were sponsored for development.


    You could always try removing the lines of code sanitizing the filename, or editing the function itself.


    Regards,


    Dan Gillean, MAS, MLIS

    AtoM Product Manager / Systems Analyst,

    Artefactual Systems, Inc.

    604-527-2056

    @accesstomemory


Sign In or Register to comment.