Time |
Nickname |
Message |
00:06
🔗
|
godane |
anyways looking at the size of archiveteam's dumps for thingiverse |
00:06
🔗
|
godane |
38G per a dump |
00:07
🔗
|
godane |
i'm now thinking my dumps may still be needed if possible |
00:07
🔗
|
godane |
if anything else just to be more downloadable |
00:14
🔗
|
godane |
i made my script faster by blocking /images/ui- |
00:14
🔗
|
godane |
there images for interface that 404 on me |
00:18
🔗
|
arkiver |
I think we also are getting everything with the warrior grab |
00:22
🔗
|
|
JesseW has joined #archiveteam-bs |
01:20
🔗
|
|
primus104 has quit IRC (Leaving.) |
02:11
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
02:25
🔗
|
Start |
i've uploaded english, french, russian, and japanese (demo) versions of the legoland game to IA: https://archive.org/search.php?query=subject%3A%22legoland%22%20AND%20subject%3A%22video%20game%22 |
02:31
🔗
|
|
fie__ has quit IRC (Quit: Leaving) |
02:38
🔗
|
Rotab |
ooh, i had that game. |
02:39
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
02:40
🔗
|
|
Start has joined #archiveteam-bs |
02:56
🔗
|
|
zenguy_pc has quit IRC (Read error: Connection reset by peer) |
03:08
🔗
|
Ctrl-S |
what sort of compression and detail should i use for the scan of CD cover inserts and CDs |
03:09
🔗
|
Ctrl-S |
I'm guessing you'd rather me not upload 300mb BMP scans of them |
03:11
🔗
|
|
zenguy_pc has joined #archiveteam-bs |
03:22
🔗
|
|
JesseW has joined #archiveteam-bs |
03:35
🔗
|
JesseW |
Ctrl-S: As long as it's lossless, and as large as you have, I think any format is fine. The derive task should handle it. https://archive.org/help/derivatives.php |
03:36
🔗
|
Ctrl-S |
okay |
03:37
🔗
|
JesseW |
The BMPs are probably just fine |
03:37
🔗
|
JesseW |
but I'm hardly knowledgable about this. |
03:39
🔗
|
aaaaaaaaa |
well, maybe do png or something like that at a minimum |
03:42
🔗
|
|
zenguy_pc has quit IRC (Read error: Connection reset by peer) |
03:43
🔗
|
JesseW |
is png lossless? |
03:46
🔗
|
aaaaaaaaa |
It uses deflate, I believe. |
03:48
🔗
|
aaaaaaaaa |
ah, it uses deflate with a prefilter. And yes, most libraries use the lossless mode by default. |
03:52
🔗
|
closure |
or you could use xpm, which is both lossless and ascii, so you can see the pictures in your text editor in 100 years ;) |
03:52
🔗
|
closure |
(a little large files though) |
03:52
🔗
|
aaaaaaaaa |
I thought that was black and white only |
03:53
🔗
|
closure |
not at all |
03:53
🔗
|
aaaaaaaaa |
oops, I mistook it for xbm |
03:53
🔗
|
closure |
also it's valid C code, just because |
03:54
🔗
|
aaaaaaaaa |
oh nope, I was actually thinking of PBM |
03:54
🔗
|
Ctrl-S |
so i scan at the highest res bitmap and you guys can convert it to something sane? |
03:58
🔗
|
|
zenguy_pc has joined #archiveteam-bs |
04:03
🔗
|
|
zenguy_pc has quit IRC (Read error: Connection reset by peer) |
04:06
🔗
|
|
aaaaaaaa_ has joined #archiveteam-bs |
04:06
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
04:06
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
04:07
🔗
|
|
aaaaaaaa_ is now known as aaaaaaaaa |
04:07
🔗
|
|
aaaaaaaaa has quit IRC (Client Quit) |
04:19
🔗
|
|
zenguy_pc has joined #archiveteam-bs |
06:10
🔗
|
|
PurpleSym has joined #archiveteam-bs |
07:05
🔗
|
|
lbft has quit IRC (Read error: Operation timed out) |
07:10
🔗
|
|
lbft has joined #archiveteam-bs |
07:11
🔗
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
07:50
🔗
|
|
primus104 has joined #archiveteam-bs |
07:51
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
08:03
🔗
|
|
fie has joined #archiveteam-bs |
08:27
🔗
|
|
primus104 has quit IRC (Leaving.) |
09:25
🔗
|
arkiver |
I'd scan it in the highest resolution possibe, don't worry about the size |
09:25
🔗
|
arkiver |
so bmp would be fine |
09:26
🔗
|
arkiver |
if IA doesn't derive the bmp images, I think you should also upload a converted version of the bmp as preview |
10:34
🔗
|
|
schbirid has joined #archiveteam-bs |
10:39
🔗
|
|
primus104 has joined #archiveteam-bs |
10:58
🔗
|
|
fie_ has joined #archiveteam-bs |
11:00
🔗
|
|
fie has quit IRC (Read error: Operation timed out) |
11:01
🔗
|
|
Infreq has quit IRC (Read error: Operation timed out) |
11:01
🔗
|
|
Baljem_ has joined #archiveteam-bs |
11:01
🔗
|
|
Infreq has joined #archiveteam-bs |
11:02
🔗
|
|
Baljem has quit IRC (Read error: Operation timed out) |
11:05
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
11:21
🔗
|
|
DopefishJ has joined #archiveteam-bs |
11:21
🔗
|
|
swebb sets mode: +o DopefishJ |
11:22
🔗
|
|
DFJustin has quit IRC (Read error: Operation timed out) |
14:26
🔗
|
|
zenguy_pc has quit IRC (Ping timeout: 252 seconds) |
14:28
🔗
|
|
vitzli has joined #archiveteam-bs |
14:38
🔗
|
|
zenguy_pc has joined #archiveteam-bs |
14:48
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
14:54
🔗
|
|
Smiley has quit IRC (Read error: Connection reset by peer) |
14:54
🔗
|
|
Stiletto has joined #archiveteam-bs |
14:54
🔗
|
|
RichardG_ has joined #archiveteam-bs |
14:54
🔗
|
|
logan has joined #archiveteam-bs |
14:55
🔗
|
|
Rai-chan has quit IRC (Ping timeout: 268 seconds) |
14:55
🔗
|
|
kniffy has quit IRC (Ping timeout: 268 seconds) |
14:55
🔗
|
|
Fusl has quit IRC (Ping timeout: 268 seconds) |
14:55
🔗
|
|
joepie91 has quit IRC (Ping timeout: 268 seconds) |
14:56
🔗
|
|
matthusby has quit IRC (Ping timeout: 268 seconds) |
14:56
🔗
|
|
RichardG has quit IRC (Ping timeout: 268 seconds) |
14:56
🔗
|
|
espes__ has quit IRC (Ping timeout: 268 seconds) |
14:56
🔗
|
|
jk[SVP] has quit IRC (Ping timeout: 268 seconds) |
14:56
🔗
|
|
logan2 has quit IRC (Ping timeout: 268 seconds) |
14:56
🔗
|
|
matthusby has joined #archiveteam-bs |
14:56
🔗
|
|
jk[SVP] has joined #archiveteam-bs |
14:57
🔗
|
|
edsu has joined #archiveteam-bs |
14:57
🔗
|
|
swebb sets mode: +o edsu |
14:57
🔗
|
|
ohhdemgir has quit IRC (Ping timeout: 268 seconds) |
14:57
🔗
|
|
zhongfu has quit IRC (Ping timeout: 268 seconds) |
14:57
🔗
|
|
edsu_ has quit IRC (Ping timeout: 268 seconds) |
14:58
🔗
|
|
wp494_ has joined #archiveteam-bs |
14:58
🔗
|
|
joepie91 has joined #archiveteam-bs |
14:59
🔗
|
|
zhongfu has joined #archiveteam-bs |
15:00
🔗
|
|
will- has joined #archiveteam-bs |
15:01
🔗
|
|
ohhdemgir has joined #archiveteam-bs |
15:01
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 268 seconds) |
15:01
🔗
|
|
vtyl has quit IRC (Ping timeout: 268 seconds) |
15:01
🔗
|
|
pwnsrv has quit IRC (Ping timeout: 268 seconds) |
15:01
🔗
|
|
wp494 has quit IRC (Ping timeout: 268 seconds) |
15:02
🔗
|
|
vtyl has joined #archiveteam-bs |
15:02
🔗
|
|
goekesmi has quit IRC (Ping timeout: 268 seconds) |
15:02
🔗
|
|
no2pencil has quit IRC (Ping timeout: 268 seconds) |
15:02
🔗
|
|
will has quit IRC (Ping timeout: 268 seconds) |
15:02
🔗
|
|
will- is now known as will |
15:03
🔗
|
|
wednesda- has joined #archiveteam-bs |
15:03
🔗
|
|
primus104 has quit IRC (Leaving.) |
15:04
🔗
|
|
goekesmi has joined #archiveteam-bs |
15:04
🔗
|
|
Fusl has joined #archiveteam-bs |
15:05
🔗
|
|
no2pencil has joined #archiveteam-bs |
15:07
🔗
|
|
zenguy_pc has quit IRC (Ping timeout: 310 seconds) |
15:07
🔗
|
|
SimpBrain has joined #archiveteam-bs |
15:08
🔗
|
|
wacky_ has quit IRC (Ping timeout: 268 seconds) |
15:08
🔗
|
|
wednesday has quit IRC (Ping timeout: 268 seconds) |
15:15
🔗
|
|
zenguy_pc has joined #archiveteam-bs |
15:24
🔗
|
|
wacky_ has joined #archiveteam-bs |
15:32
🔗
|
|
espes__ has joined #archiveteam-bs |
15:36
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
15:36
🔗
|
|
Smiley has joined #archiveteam-bs |
15:39
🔗
|
|
Rye has joined #archiveteam-bs |
15:39
🔗
|
|
kniffy has joined #archiveteam-bs |
16:45
🔗
|
|
primus104 has joined #archiveteam-bs |
16:55
🔗
|
|
godane has quit IRC (Ping timeout: 492 seconds) |
17:10
🔗
|
|
RichardG_ is now known as RichardG |
17:23
🔗
|
|
JesseW has joined #archiveteam-bs |
17:37
🔗
|
|
godane has joined #archiveteam-bs |
17:41
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
17:42
🔗
|
|
Start has joined #archiveteam-bs |
17:50
🔗
|
|
aaaaaaaaa has joined #archiveteam-bs |
17:50
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
18:34
🔗
|
|
JesseW has joined #archiveteam-bs |
19:04
🔗
|
|
wp494_ is now known as wp494 |
19:04
🔗
|
|
garyrh has quit IRC (Remote host closed the connection) |
19:06
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
19:11
🔗
|
|
dashcloud has joined #archiveteam-bs |
19:42
🔗
|
|
garyrh has joined #archiveteam-bs |
19:49
🔗
|
|
primus has joined #archiveteam-bs |
19:55
🔗
|
primus |
Hi all, I need help with archiving a website. The site is not big, so i thought i'd try downloading it with wget. |
19:56
🔗
|
anomie |
Try using wpull instead. |
19:57
🔗
|
anomie |
Here's the last wpull query I ran. |
19:57
🔗
|
anomie |
wpull 'http://www.musictheory.net/lessons' --no-check-certificate --user-agent "wpull-web-archiver" --page-requisites --recursive --level inf --span-hosts-allow linked-pages,page-requisites --recursive --level inf --escaped-fragment --strip-session-id --sitemaps --reject-regex "/login\.php" --tries 3 --retry-connrefused --retry-dns-error --timeout 60 --session-timeout 21600 --database "music-theory.db" -np |
19:57
🔗
|
anomie |
--header="Contact: acstubbins@openmailbox.org" --warc-file "music-theory" --warc-max-size 9999999999999999999 -np |
19:57
🔗
|
primus |
Thanks, does it also make browsable local website? |
19:58
🔗
|
anomie |
Yeah. |
19:58
🔗
|
primus |
Hmm, it doesn't seem to be available in yum for centos |
19:59
🔗
|
anomie |
It isn't. You should install it with pip, or from the github. |
19:59
🔗
|
primus |
What is its main advantage over wget if you don't mind me asking? |
20:00
🔗
|
primus |
actually I was only looking for advice if the command from tutorial on wget would be work for my purpose: wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL |
20:03
🔗
|
anomie |
Okay. Let me go over the options slowly. |
20:03
🔗
|
anomie |
--no-check-certificate means don't check the https certificate. |
20:04
🔗
|
anomie |
--user-agent "wpull-web-archiver" really isn't neccesary, but it essentially lets web masters know they're being archived, if they ever check their logs |
20:05
🔗
|
anomie |
--page-requisites is so that it archives all the images and other stuff on the page. |
20:05
🔗
|
anomie |
--recursive --level inf means to recurse infinitely. |
20:06
🔗
|
anomie |
--span-hosts-allow linked-pages,page-requisites tells wpull to download elements from different hosts, as long as it's a linked page, or something that falls under page-requisites |
20:07
🔗
|
anomie |
Actually, you might want to remove linked-pages. |
20:07
🔗
|
anomie |
I forget what --escapted-fragment, --strip-session-id are for |
20:08
🔗
|
primus |
Wouldn't infinite level of recursion mean it falls into a loop? |
20:09
🔗
|
primus |
The site i'm going to archive is very badly maintained so i'm quite concerned about loops. |
20:16
🔗
|
anomie |
primus: No. It won't archive anything twice. |
20:18
🔗
|
primus |
Thank you for all your advice, I appreciate it |
20:18
🔗
|
anomie |
You're welcome. |
20:18
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
20:36
🔗
|
aaaaaaaaa |
you know, the point of the --warc-max-size is to set it to something reasonable, not ~10 exabytes |
20:37
🔗
|
anomie |
aaaaaaaaa: That was to work around an annoying bug in webarchiveplayer. |
20:39
🔗
|
aaaaaaaaa |
primus: wpull supports some options that wget doesn't (and vice versa: https://wpull.readthedocs.org/en/master/differences.html) but the biggest benefit is if you spot a bug, the maintainer is on this channel. |
20:40
🔗
|
aaaaaaaaa |
oh and it doesn't try to keep everything in memory as well. |
20:40
🔗
|
primus |
That's great. At the moment wget is already running. After it's finished I'll also run wpull, just to be on the safe side. |
20:48
🔗
|
|
JesseW has joined #archiveteam-bs |
20:58
🔗
|
|
BiggieJon has joined #archiveteam-bs |
21:11
🔗
|
|
PurpleSym has quit IRC (Remote host closed the connection) |
21:40
🔗
|
|
phiren has quit IRC (Ping timeout: 252 seconds) |
21:40
🔗
|
|
phiren has joined #archiveteam-bs |
21:45
🔗
|
|
JesseW has quit IRC (Leaving.) |
21:45
🔗
|
|
JesseW has joined #archiveteam-bs |
22:39
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
22:39
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
23:58
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |