我用ragflow解析一个800M的docx文档时出错,提示Error file size exceeds(<=128Mb) page(1~100000001):no chunk built form xxx.docx
查看配置文件,发现没有地方配置了128Mb。翻看源代码,发现解析的代码里有
def build(row):
if row[“size”] > DOC_MAXIMUM_SIZE:
set_progress(row[“id”], prog=-1, msg=”File size exceeds( <= %dMb )” %
(int(DOC_MAXIMUM_SIZE / 1024 / 1024)))
return []
在ragflow/rag/settings.py中有对这个常量进行定义:DOC_MAXIMUM_SIZE = int(os.environ.get(“MAX_CONTENT_LENGTH”, 128 * 1024 * 1024))
这时再去看$ragflow/docker.env时,发现里面有
# The maximum file size limit (in bytes) for each upload to your knowledge base or File Management.
# To change the 1GB file size limit, uncomment the line below and update as needed.
# MAX_CONTENT_LENGTH=1073741824
# After updating, ensure `client_max_body_size` in nginx/nginx.conf is updated accordingly.
# Note that neither `MAX_CONTENT_LENGTH` nor `client_max_body_size` sets the maximum size for files uploaded to an agent.
# See https://ragflow.io/docs/dev/begin_component for details.
因为MAX_CONTENT_LENGTH默认是被注释掉的状态,所以获取不到这个值时,就初始化成了128MB。将这个注释打开,可以解析不超过1GB的文档。这个值可以按需修改。