#archiveteam 2017-11-19,Sun

↑back Search

Time Nickname Message
00:07 🔗 Ctrl has joined #archiveteam
00:07 🔗 DFJustin https://archive.org/details/jstor_ejc
00:10 🔗 DFJustin dunno about the open access ebooks part though
00:20 🔗 Martle_ has joined #archiveteam
00:22 🔗 Martle has quit IRC (Read error: Operation timed out)
00:26 🔗 j08nY has quit IRC (Read error: Operation timed out)
00:26 🔗 j08nY has joined #archiveteam
00:29 🔗 Martle__ has joined #archiveteam
00:33 🔗 Ctrl has quit IRC (Remote host closed the connection)
00:33 🔗 Ctrl has joined #archiveteam
00:33 🔗 Ctrl has quit IRC (Excess Flood)
00:35 🔗 SketchCow The SILK guy got back to me with a .csv of subdomains.
00:35 🔗 SketchCow I've forwarded the list to arkiver to process.
00:36 🔗 SketchCow Along with the warnings of the guy, i.e. they know we're going to do this but they can be over capacity easy by us doing massive grabs
00:36 🔗 Martle_ has quit IRC (Read error: Operation timed out)
00:39 🔗 SketchCow Also
00:39 🔗 SketchCow From John Gilmore: archiveteam.org uses an invalid security certificate. The certificate is only valid for the following names: breeze.tqhosting.com, www.breeze.tqhosting.com Error code: SSL_ERROR_BAD_CERT_DOMAIN
00:39 🔗 SketchCow I'll happily work with someone to fix this
00:48 🔗 Soni has quit IRC (Ping timeout: 264 seconds)
01:02 🔗 Ctrl has joined #archiveteam
01:04 🔗 icedice2 has quit IRC (Quit: Leaving)
01:07 🔗 Soni has joined #archiveteam
01:13 🔗 ZexaronS has joined #archiveteam
01:32 🔗 SketchCow Also
01:32 🔗 SketchCow Hey Jason - I wonder if it's worth having the Archive Team spider
01:32 🔗 SketchCow FamilySearch.org? It looks like their proprietary "partners" are
01:32 🔗 SketchCow forcing them to put it behind a login-wall starting Dec 13. And of
01:32 🔗 SketchCow course the first thing a login-wall does is to turn off any account
01:32 🔗 SketchCow that starts doing bulk downloads...
01:32 🔗 SketchCow And if you're talking about "history going offline", this has some of
01:33 🔗 SketchCow the best most detailed history of human ancestry ever collected. I
01:33 🔗 SketchCow have discovered and researched my ancestors back to the early 1800s in
01:33 🔗 SketchCow their data -- all without logging in. Church baptism records from the 1500s.
01:33 🔗 SketchCow Government census records from the very beginning. Etc.
01:48 🔗 nertzy2 has joined #archiveteam
01:55 🔗 nertzy has quit IRC (Read error: Operation timed out)
02:07 🔗 ZexaronS has quit IRC (Read error: Operation timed out)
02:43 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
02:44 🔗 kristian_ has joined #archiveteam
02:46 🔗 j08nY has quit IRC (Remote host closed the connection)
02:59 🔗 Valentine has joined #archiveteam
03:00 🔗 Valentin- has quit IRC (Ping timeout: 506 seconds)
03:44 🔗 superkuh has quit IRC (Quit: the neuronal action potential is an electrical manipulation of reversible abrupt phase changes in the lipid bilaye)
03:47 🔗 SketchCow -----------------------------
03:47 🔗 SketchCow FOS UPDATE
03:47 🔗 SketchCow The new FOS should basically have taken everything over from old FOS
03:47 🔗 SketchCow There's a few hundred gigs of this and that I'll nail down this week
03:47 🔗 SketchCow Some things might not run right, let me know if you see them
03:47 🔗 SketchCow -----------------------------
04:00 🔗 odemg has quit IRC (Ping timeout: 245 seconds)
04:14 🔗 odemg has joined #archiveteam
04:19 🔗 DFJustin ffs the one genealogy site on the internet that isn't ruined
04:23 🔗 ranavalon has quit IRC (Read error: Connection reset by peer)
04:52 🔗 ZexaronS has joined #archiveteam
04:55 🔗 qw3rty110 has joined #archiveteam
04:58 🔗 kristian_ has quit IRC (Quit: Leaving)
05:01 🔗 qw3rty19 has quit IRC (Read error: Operation timed out)
05:42 🔗 ZexaronS has quit IRC (Quit: Leaving)
05:43 🔗 hook54321 I'm in the FamilySearch Yammer chat, if anyone has any question I can probably ask them there.
05:45 🔗 hook54321 https://media.familysearch.org/familysearch-free-sign-in-offers-greater-subscriber-experiences-and-benefits/
06:21 🔗 dboard2 is now known as dboard
09:12 🔗 Mateon1 has quit IRC (Ping timeout: 260 seconds)
09:12 🔗 Mateon1 has joined #archiveteam
09:59 🔗 pizzaiolo has joined #archiveteam
10:30 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
10:41 🔗 j08nY has joined #archiveteam
10:51 🔗 luk has joined #archiveteam
10:58 🔗 luk has quit IRC (Ping timeout: 260 seconds)
11:00 🔗 Fusl has joined #archiveteam
11:06 🔗 schbirid has joined #archiveteam
11:33 🔗 zino has quit IRC (Ping timeout: 255 seconds)
11:41 🔗 zino has joined #archiveteam
11:53 🔗 zino has quit IRC (Remote host closed the connection)
12:31 🔗 kristian_ has joined #archiveteam
12:58 🔗 schbirid so uh, dont ask me why but i set up some automatic grabbing of (selected by format) cinemageddon uploads with no actual plan but soothing my hoarding mind
12:59 🔗 schbirid if someone (i know and trust from here) wants to upload&dark them to IA, i could rsync to you. finished torrent contents only, not the torrent file or the metadata or anything, sorry
13:25 🔗 j08nY has quit IRC (Quit: Leaving)
13:45 🔗 ZexaronS has joined #archiveteam
14:23 🔗 kristian_ has quit IRC (Quit: Leaving)
14:27 🔗 ranavalon has joined #archiveteam
14:27 🔗 ranavalon has quit IRC (Remote host closed the connection)
14:28 🔗 ranavalon has joined #archiveteam
14:33 🔗 ZexaronS has quit IRC (Quit: Leaving)
14:54 🔗 Stilett0 has joined #archiveteam
14:56 🔗 justaj has joined #archiveteam
14:57 🔗 justaj hi, I was wondering how I could best save an entire Reddit thread. I've read on the AT wiki that there was a partial archive of Reddit but I want to save threads just one by one if that's possible. I made a thread asking just that - https://redd.it/7e0xm6
14:57 🔗 justaj I'd appreciate if anyone could help out.
15:09 🔗 superkuh has joined #archiveteam
15:20 🔗 balrog has quit IRC (Read error: Operation timed out)
15:28 🔗 JAA justaj: That's a bit tricky. As soon as a thread grows too large, you can't easily access all child comments but have to retrieve what those "load more comments" links do as well. I'm not aware of any straightforward archiving solution for Reddit threads.
15:30 🔗 JAA However, there is an archive of all Reddit comments at https://files.pushshift.io/reddit/comments/ (IA mirror at https://archive.org/details/reddit-data-comments ), and it should be possible to extract all comments for a particular thread from there.
15:36 🔗 JAA justaj: One way to archive an entire thread would be to use warcprox with any browser, then go to the relevant thread and click on all the "load more comments" and "continue thread" links manually. That would save all relevant data to a WARC file, which can later be played back e.g. with pywb. It's all manual though.
15:52 🔗 schbirid http://www.pagetable.com/?p=904
16:28 🔗 justaj JAA: I see. One trick (if you want to see a maximum of 500 comments) is to append ?limit=1000 to the URL and then archive that way. However, that still doesn't solve the issue with archiving long comment threads that are behind the "Continue this thread --->" parts. I have the Wayback Machine browser extension and I don't really mind the manual wor
16:28 🔗 justaj k, so I think I'll try to archive the links leading to those "hidden" parts as well using that.
16:28 🔗 justaj I'll try messing around with warcprox, but I'm so far a noob with python and certainly messing around with certificates and MITM.
16:33 🔗 JAA Yeah, and that also doesn't help with comments which received tons of replies because some of those will be hidden behind "load more comments". I think warcprox (or a similar software) is probably the only way to really capture everything.
16:34 🔗 JAA You don't need to know Python at all to use warcprox, and the certificate thing should be fairly straightforward.
16:34 🔗 JAA If you want to discuss this further, please come to #archiveteam-bs. This channel is mainly for announcements.
16:46 🔗 odemg has quit IRC (Quit: Leaving)
17:04 🔗 pizzaiolo has quit IRC (Read error: Operation timed out)
18:12 🔗 SirCmpwn has quit IRC (Read error: Operation timed out)
18:12 🔗 Zialus has quit IRC (Read error: Operation timed out)
18:13 🔗 Fusl_ has joined #archiveteam
18:13 🔗 Martle has joined #archiveteam
18:13 🔗 Stiletto has joined #archiveteam
18:13 🔗 liam has quit IRC (Read error: Operation timed out)
18:13 🔗 lukeman has quit IRC (Read error: Operation timed out)
18:13 🔗 squires has quit IRC (Read error: Operation timed out)
18:13 🔗 beardicus has quit IRC (Read error: Operation timed out)
18:13 🔗 MMovie has quit IRC (Read error: Operation timed out)
18:13 🔗 justaj has quit IRC (Read error: Operation timed out)
18:13 🔗 Fusl has quit IRC (Read error: Operation timed out)
18:13 🔗 lukeman has joined #archiveteam
18:14 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
18:14 🔗 C4K3 has quit IRC (Read error: Operation timed out)
18:15 🔗 REiN^ has quit IRC (Read error: Operation timed out)
18:15 🔗 PotcFdk has quit IRC (Read error: Operation timed out)
18:15 🔗 Martle__ has quit IRC (Read error: Operation timed out)
18:16 🔗 Dimtree has quit IRC (Read error: Operation timed out)
18:17 🔗 c4rc4s has quit IRC (Ping timeout: 600 seconds)
18:17 🔗 nwf_ has quit IRC (Read error: Operation timed out)
18:17 🔗 qw3rty110 has quit IRC (Read error: Operation timed out)
18:17 🔗 oli_ has joined #archiveteam
18:18 🔗 c4rc4s has joined #archiveteam
18:18 🔗 oli has quit IRC (Read error: Operation timed out)
18:18 🔗 oli_ is now known as oli
18:25 🔗 SirCmpwn has joined #archiveteam
18:37 🔗 wp494 Weather Underground is tossing out webcams now: http://help.wunderground.com/knowledgebase/articles/1821811
18:37 🔗 wp494 "After 10 years of proudly displaying your webcam footage across our website and apps, we sadly have to remove this functionality as we no longer have the necessary resources to maintain it. On December 15, 2017, we’ll remove the webcam feeds from our website, mobile apps, and within our API – meaning uploading and accessing webcam footage will no longer be available."
18:38 🔗 wp494 "Q: Can I download my existing webcam footage?
18:38 🔗 wp494 Unfortunately, we do not have download functionality for webcam footage."
18:38 🔗 JAA Ugh
18:39 🔗 wp494 I thought IBM "liberating" WU from NBC/Comcast would be a good thing but so far it really hasn't been
18:43 🔗 Dimtree has joined #archiveteam
18:50 🔗 Harzilein has joined #archiveteam
18:50 🔗 Harzilein hi
18:51 🔗 qw3rty110 has joined #archiveteam
18:51 🔗 wp494 Yes, hello
18:51 🔗 liam has joined #archiveteam
18:52 🔗 beardicus has joined #archiveteam
18:55 🔗 REiN^ has joined #archiveteam
18:55 🔗 squires has joined #archiveteam
18:55 🔗 MMovie has joined #archiveteam
18:55 🔗 C4K3 has joined #archiveteam
18:56 🔗 arkiver we can archive the webcam footage from wunderground.com
18:56 🔗 arkiver https://www.wunderground.com/webcams/
19:00 🔗 Zialus has joined #archiveteam
19:07 🔗 nwf_ has joined #archiveteam
19:15 🔗 PotcFdk has joined #archiveteam
19:23 🔗 pizzaiolo has joined #archiveteam
19:27 🔗 Fusl_ does someone know if there's a docker image available for the warrior that doesn't require manual configuration on container boot?
19:27 🔗 Fusl_ is now known as Fusl
19:30 🔗 Pixi` has quit IRC (Quit: Pixi`)
19:31 🔗 Pixi has joined #archiveteam
19:31 🔗 antomatic Huh! IBM are short of disc space? Who knew.
19:39 🔗 jschwart has joined #archiveteam
19:42 🔗 bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…)
19:45 🔗 odemg has joined #archiveteam
19:53 🔗 hook54321 arkiver: http://icons.wunderground.com/webcamarchive/u/t/utdot/246/2016/09/20160911.mp4
19:54 🔗 arkiver yeah
19:55 🔗 arkiver we just need a list of uploaders
19:55 🔗 arkiver like kydot in https://www.wunderground.com/webcams/kydot/
19:56 🔗 arkiver can maybe get that from the map, will have a look
20:00 🔗 hook54321 why do we need a list of uploaders?
20:09 🔗 antomatic so we know what to archive
20:10 🔗 antomatic (or at least where to start)
20:13 🔗 j08nY has joined #archiveteam
20:24 🔗 ZexaronS has joined #archiveteam
20:28 🔗 bithippo has joined #archiveteam
20:57 🔗 balrog has joined #archiveteam
21:27 🔗 trvz has quit IRC (Ping timeout: 260 seconds)
21:42 🔗 icedice has joined #archiveteam
22:43 🔗 matt_ has joined #archiveteam
22:43 🔗 matt_ is now known as Igloo_
22:46 🔗 achip has joined #archiveteam
22:47 🔗 Igloo_ has quit IRC (Client Quit)
22:49 🔗 jschwart has quit IRC (Quit: Konversation terminated!)
22:52 🔗 Igloo_ has joined #archiveteam
22:54 🔗 Igloo has quit IRC (Quit: leaving)
22:54 🔗 Igloo_ is now known as Igloo
23:06 🔗 Rondom_ has joined #archiveteam
23:09 🔗 yuitimoth has quit IRC (Read error: Connection reset by peer)
23:09 🔗 Rondom has quit IRC (Read error: Network is unreachable)
23:09 🔗 atluxity has quit IRC (Remote host closed the connection)
23:09 🔗 yuitimoth has joined #archiveteam
23:09 🔗 atluxity has joined #archiveteam
23:09 🔗 kcaj has quit IRC (Ping timeout: 506 seconds)
23:11 🔗 kcaj has joined #archiveteam
23:40 🔗 trvz has joined #archiveteam
23:44 🔗 BlueMaxim has joined #archiveteam

irclogger-viewer