Time |
Nickname |
Message |
00:24
🔗
|
|
antomati_ has joined #archiveteam-bs |
00:24
🔗
|
|
swebb sets mode: +o antomati_ |
00:25
🔗
|
|
antomatic has quit IRC (Read error: Operation timed out) |
01:53
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
02:04
🔗
|
|
wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) |
02:08
🔗
|
|
wp494 has joined #archiveteam-bs |
02:48
🔗
|
godane |
so i added some better code for grabbing Arirang streams |
02:49
🔗
|
godane |
i had to change it cause index_3_av.m3u8 was not always the 850x480 stream |
02:49
🔗
|
godane |
as i just curl -s "$masterurl" | grep -A1 850x480 | grep m3u8 |
02:58
🔗
|
|
wp494_ has joined #archiveteam-bs |
03:03
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
03:03
🔗
|
|
wp494_ is now known as wp494 |
03:12
🔗
|
|
ravetcofx has joined #archiveteam-bs |
03:47
🔗
|
|
Mayeau has joined #archiveteam-bs |
03:56
🔗
|
|
Mayonaise has quit IRC (Ping timeout: 864 seconds) |
03:56
🔗
|
|
Mayeau is now known as Mayonaise |
04:08
🔗
|
|
ravetcofx has quit IRC (Ping timeout: 506 seconds) |
05:09
🔗
|
|
Stilett0 has joined #archiveteam-bs |
05:12
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
05:52
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
05:59
🔗
|
|
Sk1d has joined #archiveteam-bs |
06:49
🔗
|
|
Start has joined #archiveteam-bs |
07:37
🔗
|
|
GE has joined #archiveteam-bs |
10:47
🔗
|
Medowar |
FYI: arkiver Sketchcow: bayimg is now done trackerside, I am currently syncing everything from my target over to fos. |
11:09
🔗
|
|
GE has quit IRC (Remote host closed the connection) |
11:19
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
11:48
🔗
|
|
ravetcofx has joined #archiveteam-bs |
12:15
🔗
|
|
ravetcofx has quit IRC (Ping timeout: 260 seconds) |
12:35
🔗
|
|
GE has joined #archiveteam-bs |
12:54
🔗
|
|
vitzli has joined #archiveteam-bs |
12:58
🔗
|
vitzli |
Feel like shit, 4 years late to archive the wiki :[ wayback machine luckily has it, but still. (it's small udev rule, nothing super-important) |
12:59
🔗
|
arkiver |
what wiki? |
12:59
🔗
|
vitzli |
wiki.countercaster.com |
13:00
🔗
|
vitzli |
last update and online in 2012 |
13:01
🔗
|
arkiver |
:( |
13:01
🔗
|
arkiver |
that sucks |
13:06
🔗
|
vitzli |
After I first met wikiteam tools - I began grabs of any small-ish or remotely interesting/useful wikis, it's awesome, but sometimes it's like to read one book, like it, read another, like it, I WANT MORE! HA! NO! fuck you! Author Existence Failure. |
13:09
🔗
|
vitzli |
and that is all for drunk Friday confessions, sorry about that |
13:37
🔗
|
Whopper |
Kaz: similar thing happened in Australia with metadata. It was originally to 'fight terrorists' and then we have https://www.crikey.com.au/2016/01/18/over-60-agencies-apply-to-snoop-into-your-metadata/ . The majority might have a legitimate use for the information but that's not the point. Race fixing, polluting, work health safety violations / fraud, mislabelling fruit? etc. ≠terrorism |
13:40
🔗
|
arkiver |
Medowar: :D nice! |
13:53
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
15:28
🔗
|
|
superkuh has quit IRC (Remote host closed the connection) |
15:29
🔗
|
|
Shakespea has joined #archiveteam-bs |
15:30
🔗
|
Shakespea |
FYI- http://www.panoramio.com/maps-faq |
15:30
🔗
|
|
superkuh has joined #archiveteam-bs |
15:30
🔗
|
|
Shakespea has left |
15:44
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
15:52
🔗
|
|
atrocity has quit IRC (Ping timeout: 260 seconds) |
17:28
🔗
|
|
Stilett0 has quit IRC (Read error: Connection reset by peer) |
18:03
🔗
|
|
Stiletto has joined #archiveteam-bs |
18:06
🔗
|
|
ndiddy has joined #archiveteam-bs |
19:33
🔗
|
|
RichardG_ has joined #archiveteam-bs |
19:36
🔗
|
|
RichardG has quit IRC (Ping timeout: 250 seconds) |
19:49
🔗
|
|
RichardG has joined #archiveteam-bs |
19:50
🔗
|
|
RichardG_ has quit IRC (Ping timeout: 250 seconds) |
19:51
🔗
|
tapedrive |
I want to download a site (for a personal archive at the moment, so not with archivebot) but I'm not quite sure what options to us with wget. I want to download all pages under a specific domain, and all the resources (css, images, javascripts, etc) which will be under different domains. I also want to download all linked pages - so if the main site links to external.com/foo.html then I want that page and all its requisite downloaded |
19:51
🔗
|
tapedrive |
too. Is this possible with wget, or am I going to have to write something custom for this? |
20:02
🔗
|
|
ndiddy has quit IRC (Ping timeout: 633 seconds) |
20:04
🔗
|
|
ndiddy has joined #archiveteam-bs |
20:09
🔗
|
Kaz |
yes |
20:10
🔗
|
Kaz |
one sec |
20:10
🔗
|
Kaz |
tapedrive: http://archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget |
20:11
🔗
|
Kaz |
or you can use https://github.com/ludios/grab-site |
20:11
🔗
|
tapedrive |
I've read that (the wiki one), but I'm confused as to what options I should use for downloading all page dependencies from other domains. |
20:13
🔗
|
tapedrive |
Ah, that grab-site one looks perfect. Thanks! |
21:32
🔗
|
|
VADemon has joined #archiveteam-bs |
21:35
🔗
|
|
sep332_ has quit IRC (Konversation terminated!) |
21:35
🔗
|
|
jrwr has joined #archiveteam-bs |
21:50
🔗
|
|
kristian_ has joined #archiveteam-bs |
22:02
🔗
|
tapedrive |
Okay, I'm using grab-site, but there's an issue. The site I'm archiving has imgur images lined, but imgur's doing some weird redirect thing. |
22:02
🔗
|
tapedrive |
From the logs: 302 Moved Temporarily http://i.imgur.com/LrowbFM.jpg |
22:02
🔗
|
tapedrive |
200 OK http://imgur.com/LrowbFM |
22:03
🔗
|
tapedrive |
So it's not archiving the image, rather a stupid imgur page. |
22:03
🔗
|
tapedrive |
Any ideas to get round this? |
22:22
🔗
|
|
Stilett0 has joined #archiveteam-bs |
22:28
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
22:28
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
22:29
🔗
|
ae_g_i_s |
yeah, imgur needs you to follow the redirect |
22:29
🔗
|
ae_g_i_s |
i don't know exactly what weird magic they're using atm |
22:29
🔗
|
ae_g_i_s |
but if you essentially do the request twice (not with the same url, with the one it redirects you to) you'll have a page that has the "real" source image |
22:39
🔗
|
ae_g_i_s |
second pitfall (since we're in -bs anyway and people might fall into that trap): if you upload an image nowadays, you can not copy the image link from the image it presents because they use a base64 blob (IIRC) in there |
22:39
🔗
|
ae_g_i_s |
you have to open the "sharing link" on the right...and _there_, the image source is as usual |
22:40
🔗
|
tapedrive |
So any way to add that rule into grab-site? Or will I just have to manually get them afterwards? |
22:47
🔗
|
ae_g_i_s |
< too noob to know |
22:50
🔗
|
|
Start has joined #archiveteam-bs |
22:54
🔗
|
tapedrive |
It's not really a problem, as I can go through the log after it's complete, getting all the i.imgur.com images. |
23:03
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
23:32
🔗
|
|
ravetcofx has joined #archiveteam-bs |
23:34
🔗
|
|
ravetcofx has quit IRC (Remote host closed the connection) |
23:54
🔗
|
|
Yoshimura has quit IRC (Remote host closed the connection) |