#archiveteam-bs 2015-10-10,Sat

↑back Search

Time Nickname Message
00:06 🔗 godane anyways looking at the size of archiveteam's dumps for thingiverse
00:06 🔗 godane 38G per a dump
00:07 🔗 godane i'm now thinking my dumps may still be needed if possible
00:07 🔗 godane if anything else just to be more downloadable
00:14 🔗 godane i made my script faster by blocking /images/ui-
00:14 🔗 godane there images for interface that 404 on me
00:18 🔗 arkiver I think we also are getting everything with the warrior grab
00:22 🔗 JesseW has joined #archiveteam-bs
01:20 🔗 primus104 has quit IRC (Leaving.)
02:11 🔗 JesseW has quit IRC (Read error: Operation timed out)
02:25 🔗 Start i've uploaded english, french, russian, and japanese (demo) versions of the legoland game to IA: https://archive.org/search.php?query=subject%3A%22legoland%22%20AND%20subject%3A%22video%20game%22
02:31 🔗 fie__ has quit IRC (Quit: Leaving)
02:38 🔗 Rotab ooh, i had that game.
02:39 🔗 Start has quit IRC (Read error: Connection reset by peer)
02:40 🔗 Start has joined #archiveteam-bs
02:56 🔗 zenguy_pc has quit IRC (Read error: Connection reset by peer)
03:08 🔗 Ctrl-S what sort of compression and detail should i use for the scan of CD cover inserts and CDs
03:09 🔗 Ctrl-S I'm guessing you'd rather me not upload 300mb BMP scans of them
03:11 🔗 zenguy_pc has joined #archiveteam-bs
03:22 🔗 JesseW has joined #archiveteam-bs
03:35 🔗 JesseW Ctrl-S: As long as it's lossless, and as large as you have, I think any format is fine. The derive task should handle it. https://archive.org/help/derivatives.php
03:36 🔗 Ctrl-S okay
03:37 🔗 JesseW The BMPs are probably just fine
03:37 🔗 JesseW but I'm hardly knowledgable about this.
03:39 🔗 aaaaaaaaa well, maybe do png or something like that at a minimum
03:42 🔗 zenguy_pc has quit IRC (Read error: Connection reset by peer)
03:43 🔗 JesseW is png lossless?
03:46 🔗 aaaaaaaaa It uses deflate, I believe.
03:48 🔗 aaaaaaaaa ah, it uses deflate with a prefilter. And yes, most libraries use the lossless mode by default.
03:52 🔗 closure or you could use xpm, which is both lossless and ascii, so you can see the pictures in your text editor in 100 years ;)
03:52 🔗 closure (a little large files though)
03:52 🔗 aaaaaaaaa I thought that was black and white only
03:53 🔗 closure not at all
03:53 🔗 aaaaaaaaa oops, I mistook it for xbm
03:53 🔗 closure also it's valid C code, just because
03:54 🔗 aaaaaaaaa oh nope, I was actually thinking of PBM
03:54 🔗 Ctrl-S so i scan at the highest res bitmap and you guys can convert it to something sane?
03:58 🔗 zenguy_pc has joined #archiveteam-bs
04:03 🔗 zenguy_pc has quit IRC (Read error: Connection reset by peer)
04:06 🔗 aaaaaaaa_ has joined #archiveteam-bs
04:06 🔗 aaaaaaaaa has quit IRC (Read error: Connection reset by peer)
04:06 🔗 swebb sets mode: +o aaaaaaaa_
04:07 🔗 aaaaaaaa_ is now known as aaaaaaaaa
04:07 🔗 aaaaaaaaa has quit IRC (Client Quit)
04:19 🔗 zenguy_pc has joined #archiveteam-bs
06:10 🔗 PurpleSym has joined #archiveteam-bs
07:05 🔗 lbft has quit IRC (Read error: Operation timed out)
07:10 🔗 lbft has joined #archiveteam-bs
07:11 🔗 Aranje has quit IRC (Quit: Three sheets to the wind)
07:50 🔗 primus104 has joined #archiveteam-bs
07:51 🔗 JesseW has quit IRC (Read error: Operation timed out)
08:03 🔗 fie has joined #archiveteam-bs
08:27 🔗 primus104 has quit IRC (Leaving.)
09:25 🔗 arkiver I'd scan it in the highest resolution possibe, don't worry about the size
09:25 🔗 arkiver so bmp would be fine
09:26 🔗 arkiver if IA doesn't derive the bmp images, I think you should also upload a converted version of the bmp as preview
10:34 🔗 schbirid has joined #archiveteam-bs
10:39 🔗 primus104 has joined #archiveteam-bs
10:58 🔗 fie_ has joined #archiveteam-bs
11:00 🔗 fie has quit IRC (Read error: Operation timed out)
11:01 🔗 Infreq has quit IRC (Read error: Operation timed out)
11:01 🔗 Baljem_ has joined #archiveteam-bs
11:01 🔗 Infreq has joined #archiveteam-bs
11:02 🔗 Baljem has quit IRC (Read error: Operation timed out)
11:05 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
11:21 🔗 DopefishJ has joined #archiveteam-bs
11:21 🔗 swebb sets mode: +o DopefishJ
11:22 🔗 DFJustin has quit IRC (Read error: Operation timed out)
14:26 🔗 zenguy_pc has quit IRC (Ping timeout: 252 seconds)
14:28 🔗 vitzli has joined #archiveteam-bs
14:38 🔗 zenguy_pc has joined #archiveteam-bs
14:48 🔗 Start has quit IRC (Quit: Disconnected.)
14:54 🔗 Smiley has quit IRC (Read error: Connection reset by peer)
14:54 🔗 Stiletto has joined #archiveteam-bs
14:54 🔗 RichardG_ has joined #archiveteam-bs
14:54 🔗 logan has joined #archiveteam-bs
14:55 🔗 Rai-chan has quit IRC (Ping timeout: 268 seconds)
14:55 🔗 kniffy has quit IRC (Ping timeout: 268 seconds)
14:55 🔗 Fusl has quit IRC (Ping timeout: 268 seconds)
14:55 🔗 joepie91 has quit IRC (Ping timeout: 268 seconds)
14:56 🔗 matthusby has quit IRC (Ping timeout: 268 seconds)
14:56 🔗 RichardG has quit IRC (Ping timeout: 268 seconds)
14:56 🔗 espes__ has quit IRC (Ping timeout: 268 seconds)
14:56 🔗 jk[SVP] has quit IRC (Ping timeout: 268 seconds)
14:56 🔗 logan2 has quit IRC (Ping timeout: 268 seconds)
14:56 🔗 matthusby has joined #archiveteam-bs
14:56 🔗 jk[SVP] has joined #archiveteam-bs
14:57 🔗 edsu has joined #archiveteam-bs
14:57 🔗 swebb sets mode: +o edsu
14:57 🔗 ohhdemgir has quit IRC (Ping timeout: 268 seconds)
14:57 🔗 zhongfu has quit IRC (Ping timeout: 268 seconds)
14:57 🔗 edsu_ has quit IRC (Ping timeout: 268 seconds)
14:58 🔗 wp494_ has joined #archiveteam-bs
14:58 🔗 joepie91 has joined #archiveteam-bs
14:59 🔗 zhongfu has joined #archiveteam-bs
15:00 🔗 will- has joined #archiveteam-bs
15:01 🔗 ohhdemgir has joined #archiveteam-bs
15:01 🔗 Stilett0 has quit IRC (Ping timeout: 268 seconds)
15:01 🔗 vtyl has quit IRC (Ping timeout: 268 seconds)
15:01 🔗 pwnsrv has quit IRC (Ping timeout: 268 seconds)
15:01 🔗 wp494 has quit IRC (Ping timeout: 268 seconds)
15:02 🔗 vtyl has joined #archiveteam-bs
15:02 🔗 goekesmi has quit IRC (Ping timeout: 268 seconds)
15:02 🔗 no2pencil has quit IRC (Ping timeout: 268 seconds)
15:02 🔗 will has quit IRC (Ping timeout: 268 seconds)
15:02 🔗 will- is now known as will
15:03 🔗 wednesda- has joined #archiveteam-bs
15:03 🔗 primus104 has quit IRC (Leaving.)
15:04 🔗 goekesmi has joined #archiveteam-bs
15:04 🔗 Fusl has joined #archiveteam-bs
15:05 🔗 no2pencil has joined #archiveteam-bs
15:07 🔗 zenguy_pc has quit IRC (Ping timeout: 310 seconds)
15:07 🔗 SimpBrain has joined #archiveteam-bs
15:08 🔗 wacky_ has quit IRC (Ping timeout: 268 seconds)
15:08 🔗 wednesday has quit IRC (Ping timeout: 268 seconds)
15:15 🔗 zenguy_pc has joined #archiveteam-bs
15:24 🔗 wacky_ has joined #archiveteam-bs
15:32 🔗 espes__ has joined #archiveteam-bs
15:36 🔗 vitzli has quit IRC (Quit: Leaving)
15:36 🔗 Smiley has joined #archiveteam-bs
15:39 🔗 Rye has joined #archiveteam-bs
15:39 🔗 kniffy has joined #archiveteam-bs
16:45 🔗 primus104 has joined #archiveteam-bs
16:55 🔗 godane has quit IRC (Ping timeout: 492 seconds)
17:10 🔗 RichardG_ is now known as RichardG
17:23 🔗 JesseW has joined #archiveteam-bs
17:37 🔗 godane has joined #archiveteam-bs
17:41 🔗 JesseW has quit IRC (Read error: Operation timed out)
17:42 🔗 Start has joined #archiveteam-bs
17:50 🔗 aaaaaaaaa has joined #archiveteam-bs
17:50 🔗 swebb sets mode: +o aaaaaaaaa
18:34 🔗 JesseW has joined #archiveteam-bs
19:04 🔗 wp494_ is now known as wp494
19:04 🔗 garyrh has quit IRC (Remote host closed the connection)
19:06 🔗 dashcloud has quit IRC (Read error: Operation timed out)
19:11 🔗 dashcloud has joined #archiveteam-bs
19:42 🔗 garyrh has joined #archiveteam-bs
19:49 🔗 primus has joined #archiveteam-bs
19:55 🔗 primus Hi all, I need help with archiving a website. The site is not big, so i thought i'd try downloading it with wget.
19:56 🔗 anomie Try using wpull instead.
19:57 🔗 anomie Here's the last wpull query I ran.
19:57 🔗 anomie wpull 'http://www.musictheory.net/lessons' --no-check-certificate --user-agent "wpull-web-archiver" --page-requisites --recursive --level inf --span-hosts-allow linked-pages,page-requisites --recursive --level inf --escaped-fragment --strip-session-id --sitemaps --reject-regex "/login\.php" --tries 3 --retry-connrefused --retry-dns-error --timeout 60 --session-timeout 21600 --database "music-theory.db" -np
19:57 🔗 anomie --header="Contact: acstubbins@openmailbox.org" --warc-file "music-theory" --warc-max-size 9999999999999999999 -np
19:57 🔗 primus Thanks, does it also make browsable local website?
19:58 🔗 anomie Yeah.
19:58 🔗 primus Hmm, it doesn't seem to be available in yum for centos
19:59 🔗 anomie It isn't. You should install it with pip, or from the github.
19:59 🔗 primus What is its main advantage over wget if you don't mind me asking?
20:00 🔗 primus actually I was only looking for advice if the command from tutorial on wget would be work for my purpose: wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
20:03 🔗 anomie Okay. Let me go over the options slowly.
20:03 🔗 anomie --no-check-certificate means don't check the https certificate.
20:04 🔗 anomie --user-agent "wpull-web-archiver" really isn't neccesary, but it essentially lets web masters know they're being archived, if they ever check their logs
20:05 🔗 anomie --page-requisites is so that it archives all the images and other stuff on the page.
20:05 🔗 anomie --recursive --level inf means to recurse infinitely.
20:06 🔗 anomie --span-hosts-allow linked-pages,page-requisites tells wpull to download elements from different hosts, as long as it's a linked page, or something that falls under page-requisites
20:07 🔗 anomie Actually, you might want to remove linked-pages.
20:07 🔗 anomie I forget what --escapted-fragment, --strip-session-id are for
20:08 🔗 primus Wouldn't infinite level of recursion mean it falls into a loop?
20:09 🔗 primus The site i'm going to archive is very badly maintained so i'm quite concerned about loops.
20:16 🔗 anomie primus: No. It won't archive anything twice.
20:18 🔗 primus Thank you for all your advice, I appreciate it
20:18 🔗 anomie You're welcome.
20:18 🔗 JesseW has quit IRC (Read error: Operation timed out)
20:36 🔗 aaaaaaaaa you know, the point of the --warc-max-size is to set it to something reasonable, not ~10 exabytes
20:37 🔗 anomie aaaaaaaaa: That was to work around an annoying bug in webarchiveplayer.
20:39 🔗 aaaaaaaaa primus: wpull supports some options that wget doesn't (and vice versa: https://wpull.readthedocs.org/en/master/differences.html) but the biggest benefit is if you spot a bug, the maintainer is on this channel.
20:40 🔗 aaaaaaaaa oh and it doesn't try to keep everything in memory as well.
20:40 🔗 primus That's great. At the moment wget is already running. After it's finished I'll also run wpull, just to be on the safe side.
20:48 🔗 JesseW has joined #archiveteam-bs
20:58 🔗 BiggieJon has joined #archiveteam-bs
21:11 🔗 PurpleSym has quit IRC (Remote host closed the connection)
21:40 🔗 phiren has quit IRC (Ping timeout: 252 seconds)
21:40 🔗 phiren has joined #archiveteam-bs
21:45 🔗 JesseW has quit IRC (Leaving.)
21:45 🔗 JesseW has joined #archiveteam-bs
22:39 🔗 schbirid has quit IRC (Quit: Leaving)
22:39 🔗 BlueMaxim has joined #archiveteam-bs
23:58 🔗 JesseW has quit IRC (Read error: Operation timed out)

irclogger-viewer