Time |
Nickname |
Message |
00:07
🔗
|
|
Ctrl has joined #archiveteam |
00:07
🔗
|
DFJustin |
https://archive.org/details/jstor_ejc |
00:10
🔗
|
DFJustin |
dunno about the open access ebooks part though |
00:20
🔗
|
|
Martle_ has joined #archiveteam |
00:22
🔗
|
|
Martle has quit IRC (Read error: Operation timed out) |
00:26
🔗
|
|
j08nY has quit IRC (Read error: Operation timed out) |
00:26
🔗
|
|
j08nY has joined #archiveteam |
00:29
🔗
|
|
Martle__ has joined #archiveteam |
00:33
🔗
|
|
Ctrl has quit IRC (Remote host closed the connection) |
00:33
🔗
|
|
Ctrl has joined #archiveteam |
00:33
🔗
|
|
Ctrl has quit IRC (Excess Flood) |
00:35
🔗
|
SketchCow |
The SILK guy got back to me with a .csv of subdomains. |
00:35
🔗
|
SketchCow |
I've forwarded the list to arkiver to process. |
00:36
🔗
|
SketchCow |
Along with the warnings of the guy, i.e. they know we're going to do this but they can be over capacity easy by us doing massive grabs |
00:36
🔗
|
|
Martle_ has quit IRC (Read error: Operation timed out) |
00:39
🔗
|
SketchCow |
Also |
00:39
🔗
|
SketchCow |
From John Gilmore: archiveteam.org uses an invalid security certificate. The certificate is only valid for the following names: breeze.tqhosting.com, www.breeze.tqhosting.com Error code: SSL_ERROR_BAD_CERT_DOMAIN |
00:39
🔗
|
SketchCow |
I'll happily work with someone to fix this |
00:48
🔗
|
|
Soni has quit IRC (Ping timeout: 264 seconds) |
01:02
🔗
|
|
Ctrl has joined #archiveteam |
01:04
🔗
|
|
icedice2 has quit IRC (Quit: Leaving) |
01:07
🔗
|
|
Soni has joined #archiveteam |
01:13
🔗
|
|
ZexaronS has joined #archiveteam |
01:32
🔗
|
SketchCow |
Also |
01:32
🔗
|
SketchCow |
Hey Jason - I wonder if it's worth having the Archive Team spider |
01:32
🔗
|
SketchCow |
FamilySearch.org? It looks like their proprietary "partners" are |
01:32
🔗
|
SketchCow |
forcing them to put it behind a login-wall starting Dec 13. And of |
01:32
🔗
|
SketchCow |
course the first thing a login-wall does is to turn off any account |
01:32
🔗
|
SketchCow |
that starts doing bulk downloads... |
01:32
🔗
|
SketchCow |
And if you're talking about "history going offline", this has some of |
01:33
🔗
|
SketchCow |
the best most detailed history of human ancestry ever collected. I |
01:33
🔗
|
SketchCow |
have discovered and researched my ancestors back to the early 1800s in |
01:33
🔗
|
SketchCow |
their data -- all without logging in. Church baptism records from the 1500s. |
01:33
🔗
|
SketchCow |
Government census records from the very beginning. Etc. |
01:48
🔗
|
|
nertzy2 has joined #archiveteam |
01:55
🔗
|
|
nertzy has quit IRC (Read error: Operation timed out) |
02:07
🔗
|
|
ZexaronS has quit IRC (Read error: Operation timed out) |
02:43
🔗
|
|
pizzaiolo has quit IRC (Remote host closed the connection) |
02:44
🔗
|
|
kristian_ has joined #archiveteam |
02:46
🔗
|
|
j08nY has quit IRC (Remote host closed the connection) |
02:59
🔗
|
|
Valentine has joined #archiveteam |
03:00
🔗
|
|
Valentin- has quit IRC (Ping timeout: 506 seconds) |
03:44
🔗
|
|
superkuh has quit IRC (Quit: the neuronal action potential is an electrical manipulation of reversible abrupt phase changes in the lipid bilaye) |
03:47
🔗
|
SketchCow |
----------------------------- |
03:47
🔗
|
SketchCow |
FOS UPDATE |
03:47
🔗
|
SketchCow |
The new FOS should basically have taken everything over from old FOS |
03:47
🔗
|
SketchCow |
There's a few hundred gigs of this and that I'll nail down this week |
03:47
🔗
|
SketchCow |
Some things might not run right, let me know if you see them |
03:47
🔗
|
SketchCow |
----------------------------- |
04:00
🔗
|
|
odemg has quit IRC (Ping timeout: 245 seconds) |
04:14
🔗
|
|
odemg has joined #archiveteam |
04:19
🔗
|
DFJustin |
ffs the one genealogy site on the internet that isn't ruined |
04:23
🔗
|
|
ranavalon has quit IRC (Read error: Connection reset by peer) |
04:52
🔗
|
|
ZexaronS has joined #archiveteam |
04:55
🔗
|
|
qw3rty110 has joined #archiveteam |
04:58
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
05:01
🔗
|
|
qw3rty19 has quit IRC (Read error: Operation timed out) |
05:42
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
05:43
🔗
|
hook54321 |
I'm in the FamilySearch Yammer chat, if anyone has any question I can probably ask them there. |
05:45
🔗
|
hook54321 |
https://media.familysearch.org/familysearch-free-sign-in-offers-greater-subscriber-experiences-and-benefits/ |
06:21
🔗
|
|
dboard2 is now known as dboard |
09:12
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 260 seconds) |
09:12
🔗
|
|
Mateon1 has joined #archiveteam |
09:59
🔗
|
|
pizzaiolo has joined #archiveteam |
10:30
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
10:41
🔗
|
|
j08nY has joined #archiveteam |
10:51
🔗
|
|
luk has joined #archiveteam |
10:58
🔗
|
|
luk has quit IRC (Ping timeout: 260 seconds) |
11:00
🔗
|
|
Fusl has joined #archiveteam |
11:06
🔗
|
|
schbirid has joined #archiveteam |
11:33
🔗
|
|
zino has quit IRC (Ping timeout: 255 seconds) |
11:41
🔗
|
|
zino has joined #archiveteam |
11:53
🔗
|
|
zino has quit IRC (Remote host closed the connection) |
12:31
🔗
|
|
kristian_ has joined #archiveteam |
12:58
🔗
|
schbirid |
so uh, dont ask me why but i set up some automatic grabbing of (selected by format) cinemageddon uploads with no actual plan but soothing my hoarding mind |
12:59
🔗
|
schbirid |
if someone (i know and trust from here) wants to upload&dark them to IA, i could rsync to you. finished torrent contents only, not the torrent file or the metadata or anything, sorry |
13:25
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
13:45
🔗
|
|
ZexaronS has joined #archiveteam |
14:23
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
14:27
🔗
|
|
ranavalon has joined #archiveteam |
14:27
🔗
|
|
ranavalon has quit IRC (Remote host closed the connection) |
14:28
🔗
|
|
ranavalon has joined #archiveteam |
14:33
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
14:54
🔗
|
|
Stilett0 has joined #archiveteam |
14:56
🔗
|
|
justaj has joined #archiveteam |
14:57
🔗
|
justaj |
hi, I was wondering how I could best save an entire Reddit thread. I've read on the AT wiki that there was a partial archive of Reddit but I want to save threads just one by one if that's possible. I made a thread asking just that - https://redd.it/7e0xm6 |
14:57
🔗
|
justaj |
I'd appreciate if anyone could help out. |
15:09
🔗
|
|
superkuh has joined #archiveteam |
15:20
🔗
|
|
balrog has quit IRC (Read error: Operation timed out) |
15:28
🔗
|
JAA |
justaj: That's a bit tricky. As soon as a thread grows too large, you can't easily access all child comments but have to retrieve what those "load more comments" links do as well. I'm not aware of any straightforward archiving solution for Reddit threads. |
15:30
🔗
|
JAA |
However, there is an archive of all Reddit comments at https://files.pushshift.io/reddit/comments/ (IA mirror at https://archive.org/details/reddit-data-comments ), and it should be possible to extract all comments for a particular thread from there. |
15:36
🔗
|
JAA |
justaj: One way to archive an entire thread would be to use warcprox with any browser, then go to the relevant thread and click on all the "load more comments" and "continue thread" links manually. That would save all relevant data to a WARC file, which can later be played back e.g. with pywb. It's all manual though. |
15:52
🔗
|
schbirid |
http://www.pagetable.com/?p=904 |
16:28
🔗
|
justaj |
JAA: I see. One trick (if you want to see a maximum of 500 comments) is to append ?limit=1000 to the URL and then archive that way. However, that still doesn't solve the issue with archiving long comment threads that are behind the "Continue this thread --->" parts. I have the Wayback Machine browser extension and I don't really mind the manual wor |
16:28
🔗
|
justaj |
k, so I think I'll try to archive the links leading to those "hidden" parts as well using that. |
16:28
🔗
|
justaj |
I'll try messing around with warcprox, but I'm so far a noob with python and certainly messing around with certificates and MITM. |
16:33
🔗
|
JAA |
Yeah, and that also doesn't help with comments which received tons of replies because some of those will be hidden behind "load more comments". I think warcprox (or a similar software) is probably the only way to really capture everything. |
16:34
🔗
|
JAA |
You don't need to know Python at all to use warcprox, and the certificate thing should be fairly straightforward. |
16:34
🔗
|
JAA |
If you want to discuss this further, please come to #archiveteam-bs. This channel is mainly for announcements. |
16:46
🔗
|
|
odemg has quit IRC (Quit: Leaving) |
17:04
🔗
|
|
pizzaiolo has quit IRC (Read error: Operation timed out) |
18:12
🔗
|
|
SirCmpwn has quit IRC (Read error: Operation timed out) |
18:12
🔗
|
|
Zialus has quit IRC (Read error: Operation timed out) |
18:13
🔗
|
|
Fusl_ has joined #archiveteam |
18:13
🔗
|
|
Martle has joined #archiveteam |
18:13
🔗
|
|
Stiletto has joined #archiveteam |
18:13
🔗
|
|
liam has quit IRC (Read error: Operation timed out) |
18:13
🔗
|
|
lukeman has quit IRC (Read error: Operation timed out) |
18:13
🔗
|
|
squires has quit IRC (Read error: Operation timed out) |
18:13
🔗
|
|
beardicus has quit IRC (Read error: Operation timed out) |
18:13
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
18:13
🔗
|
|
justaj has quit IRC (Read error: Operation timed out) |
18:13
🔗
|
|
Fusl has quit IRC (Read error: Operation timed out) |
18:13
🔗
|
|
lukeman has joined #archiveteam |
18:14
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
18:14
🔗
|
|
C4K3 has quit IRC (Read error: Operation timed out) |
18:15
🔗
|
|
REiN^ has quit IRC (Read error: Operation timed out) |
18:15
🔗
|
|
PotcFdk has quit IRC (Read error: Operation timed out) |
18:15
🔗
|
|
Martle__ has quit IRC (Read error: Operation timed out) |
18:16
🔗
|
|
Dimtree has quit IRC (Read error: Operation timed out) |
18:17
🔗
|
|
c4rc4s has quit IRC (Ping timeout: 600 seconds) |
18:17
🔗
|
|
nwf_ has quit IRC (Read error: Operation timed out) |
18:17
🔗
|
|
qw3rty110 has quit IRC (Read error: Operation timed out) |
18:17
🔗
|
|
oli_ has joined #archiveteam |
18:18
🔗
|
|
c4rc4s has joined #archiveteam |
18:18
🔗
|
|
oli has quit IRC (Read error: Operation timed out) |
18:18
🔗
|
|
oli_ is now known as oli |
18:25
🔗
|
|
SirCmpwn has joined #archiveteam |
18:37
🔗
|
wp494 |
Weather Underground is tossing out webcams now: http://help.wunderground.com/knowledgebase/articles/1821811 |
18:37
🔗
|
wp494 |
"After 10 years of proudly displaying your webcam footage across our website and apps, we sadly have to remove this functionality as we no longer have the necessary resources to maintain it. On December 15, 2017, we’ll remove the webcam feeds from our website, mobile apps, and within our API – meaning uploading and accessing webcam footage will no longer be available." |
18:38
🔗
|
wp494 |
"Q: Can I download my existing webcam footage? |
18:38
🔗
|
wp494 |
Unfortunately, we do not have download functionality for webcam footage." |
18:38
🔗
|
JAA |
Ugh |
18:39
🔗
|
wp494 |
I thought IBM "liberating" WU from NBC/Comcast would be a good thing but so far it really hasn't been |
18:43
🔗
|
|
Dimtree has joined #archiveteam |
18:50
🔗
|
|
Harzilein has joined #archiveteam |
18:50
🔗
|
Harzilein |
hi |
18:51
🔗
|
|
qw3rty110 has joined #archiveteam |
18:51
🔗
|
wp494 |
Yes, hello |
18:51
🔗
|
|
liam has joined #archiveteam |
18:52
🔗
|
|
beardicus has joined #archiveteam |
18:55
🔗
|
|
REiN^ has joined #archiveteam |
18:55
🔗
|
|
squires has joined #archiveteam |
18:55
🔗
|
|
MMovie has joined #archiveteam |
18:55
🔗
|
|
C4K3 has joined #archiveteam |
18:56
🔗
|
arkiver |
we can archive the webcam footage from wunderground.com |
18:56
🔗
|
arkiver |
https://www.wunderground.com/webcams/ |
19:00
🔗
|
|
Zialus has joined #archiveteam |
19:07
🔗
|
|
nwf_ has joined #archiveteam |
19:15
🔗
|
|
PotcFdk has joined #archiveteam |
19:23
🔗
|
|
pizzaiolo has joined #archiveteam |
19:27
🔗
|
Fusl_ |
does someone know if there's a docker image available for the warrior that doesn't require manual configuration on container boot? |
19:27
🔗
|
|
Fusl_ is now known as Fusl |
19:30
🔗
|
|
Pixi` has quit IRC (Quit: Pixi`) |
19:31
🔗
|
|
Pixi has joined #archiveteam |
19:31
🔗
|
antomatic |
Huh! IBM are short of disc space? Who knew. |
19:39
🔗
|
|
jschwart has joined #archiveteam |
19:42
🔗
|
|
bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…) |
19:45
🔗
|
|
odemg has joined #archiveteam |
19:53
🔗
|
hook54321 |
arkiver: http://icons.wunderground.com/webcamarchive/u/t/utdot/246/2016/09/20160911.mp4 |
19:54
🔗
|
arkiver |
yeah |
19:55
🔗
|
arkiver |
we just need a list of uploaders |
19:55
🔗
|
arkiver |
like kydot in https://www.wunderground.com/webcams/kydot/ |
19:56
🔗
|
arkiver |
can maybe get that from the map, will have a look |
20:00
🔗
|
hook54321 |
why do we need a list of uploaders? |
20:09
🔗
|
antomatic |
so we know what to archive |
20:10
🔗
|
antomatic |
(or at least where to start) |
20:13
🔗
|
|
j08nY has joined #archiveteam |
20:24
🔗
|
|
ZexaronS has joined #archiveteam |
20:28
🔗
|
|
bithippo has joined #archiveteam |
20:57
🔗
|
|
balrog has joined #archiveteam |
21:27
🔗
|
|
trvz has quit IRC (Ping timeout: 260 seconds) |
21:42
🔗
|
|
icedice has joined #archiveteam |
22:43
🔗
|
|
matt_ has joined #archiveteam |
22:43
🔗
|
|
matt_ is now known as Igloo_ |
22:46
🔗
|
|
achip has joined #archiveteam |
22:47
🔗
|
|
Igloo_ has quit IRC (Client Quit) |
22:49
🔗
|
|
jschwart has quit IRC (Quit: Konversation terminated!) |
22:52
🔗
|
|
Igloo_ has joined #archiveteam |
22:54
🔗
|
|
Igloo has quit IRC (Quit: leaving) |
22:54
🔗
|
|
Igloo_ is now known as Igloo |
23:06
🔗
|
|
Rondom_ has joined #archiveteam |
23:09
🔗
|
|
yuitimoth has quit IRC (Read error: Connection reset by peer) |
23:09
🔗
|
|
Rondom has quit IRC (Read error: Network is unreachable) |
23:09
🔗
|
|
atluxity has quit IRC (Remote host closed the connection) |
23:09
🔗
|
|
yuitimoth has joined #archiveteam |
23:09
🔗
|
|
atluxity has joined #archiveteam |
23:09
🔗
|
|
kcaj has quit IRC (Ping timeout: 506 seconds) |
23:11
🔗
|
|
kcaj has joined #archiveteam |
23:40
🔗
|
|
trvz has joined #archiveteam |
23:44
🔗
|
|
BlueMaxim has joined #archiveteam |