#archiveteam-bs 2018-02-09,Fri

↑back Search

Time Nickname Message
01:43 πŸ”— pizzaiolo has quit IRC (Remote host closed the connection)
01:47 πŸ”— dashcloud has quit IRC (Ping timeout: 492 seconds)
01:51 πŸ”— dashcloud has joined #archiveteam-bs
02:01 πŸ”— RichardG has quit IRC (Read error: Connection reset by peer)
02:01 πŸ”— RichardG has joined #archiveteam-bs
03:56 πŸ”— phuzion Do we have a technique to archive an entire subreddit?
04:01 πŸ”— Stilett0 has joined #archiveteam-bs
04:18 πŸ”— BlueMax has joined #archiveteam-bs
04:20 πŸ”— qw3rty115 has joined #archiveteam-bs
04:23 πŸ”— qw3rty114 has quit IRC (Read error: Operation timed out)
05:24 πŸ”— godane SketchCow: i'm having trouble connecting to FOS
05:30 πŸ”— godane btw i found out i got a complete copy of Body Mind And Soul The Mystery And The Magic on another tape
05:30 πŸ”— godane it was missing over a 1 hour on the first tape with it
05:38 πŸ”— godane nevermind about FOS having problems
05:38 πŸ”— godane its fine now
05:44 πŸ”— godane SketchCow: btw since i mailed those tapes you can sent me more tapes
05:45 πŸ”— godane i also need more shipping labels to mail the rest
07:39 πŸ”— dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
07:39 πŸ”— dashcloud has joined #archiveteam-bs
07:41 πŸ”— schbirid has joined #archiveteam-bs
08:57 πŸ”— JAA phuzion: Not really, no. Currently, it's still possible in theory to do that with the search API, but that will be gone soon as well.
08:57 πŸ”— JAA Otherwise, you only get the 1000 newest/top/... threads.
08:58 πŸ”— JAA Large threads also cause various issues since you have to follow the "load more comments" links etc.
08:58 πŸ”— JAA (Which are handled with JS, no less.)
09:05 πŸ”— BlueMaxim has joined #archiveteam-bs
09:11 πŸ”— BlueMax has quit IRC (Read error: Operation timed out)
09:17 πŸ”— RichardG has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— will has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— Smiley has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— BnAboyZ has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— kisspunch has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— Zebranky has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— MrRadar2 has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— BnARobin has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— jtn2 has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— Tenebrae has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— Fusl has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— hook54321 has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— ez has quit IRC (se.hub irc.efnet.nl)
09:17 πŸ”— Polylith has quit IRC (se.hub irc.efnet.nl)
09:25 πŸ”— PurpleSym JAA: I could easily add support for clicking these links to chromebot.
09:27 πŸ”— JAA Yeah, that part is solvable. I'm more worried about the search API change and not being able to find all threads in a subreddit anymore.
09:29 πŸ”— PurpleSym However replay won’t work, since it uses POST requests.
09:37 πŸ”— BlueMaxim has quit IRC (Leaving)
10:10 πŸ”— JAA The Pushshift API could be a workaround for Reddit crippling the search API. https://redditsearch.io/
10:11 πŸ”— JAA Also, TIL that the Pushshift archives do not contain deleted comments.
10:12 πŸ”— JAA They have a realtime component for the API which should have everything that was available on Reddit for more than a second or so.
10:12 πŸ”— JAA But the archives are independent monthly crawls. Comments deleted between the posting time and the time of the monthly crawl are lost.
10:12 πŸ”— JAA Is anyone aware of other Reddit archives?
11:20 πŸ”— pizzaiolo has joined #archiveteam-bs
11:22 πŸ”— mabynogy has joined #archiveteam-bs
11:37 πŸ”— RichardG has joined #archiveteam-bs
11:37 πŸ”— will has joined #archiveteam-bs
11:37 πŸ”— Smiley has joined #archiveteam-bs
11:37 πŸ”— BnAboyZ has joined #archiveteam-bs
11:37 πŸ”— kisspunch has joined #archiveteam-bs
11:37 πŸ”— Zebranky has joined #archiveteam-bs
11:37 πŸ”— MrRadar2 has joined #archiveteam-bs
11:37 πŸ”— BnARobin has joined #archiveteam-bs
11:37 πŸ”— jtn2 has joined #archiveteam-bs
11:37 πŸ”— Tenebrae has joined #archiveteam-bs
11:37 πŸ”— Fusl has joined #archiveteam-bs
11:37 πŸ”— hook54321 has joined #archiveteam-bs
11:37 πŸ”— ez has joined #archiveteam-bs
11:37 πŸ”— Polylith has joined #archiveteam-bs
12:45 πŸ”— ranavalon has joined #archiveteam-bs
12:47 πŸ”— bitspill has quit IRC ()
12:48 πŸ”— bitspill has joined #archiveteam-bs
13:10 πŸ”— midas has quit IRC ()
13:11 πŸ”— midas has joined #archiveteam-bs
13:40 πŸ”— DrasticAc has quit IRC ()
13:40 πŸ”— DrasticAc has joined #archiveteam-bs
14:12 πŸ”— godane so i'm over 12k items this month
14:12 πŸ”— godane 49,128 items so far this year
14:31 πŸ”— Mateon1 has quit IRC (Ping timeout: 252 seconds)
14:31 πŸ”— Mateon1 has joined #archiveteam-bs
14:56 πŸ”— riking has quit IRC ()
14:56 πŸ”— riking has joined #archiveteam-bs
14:57 πŸ”— ThisAsYou has quit IRC ()
14:57 πŸ”— ThisAsYou has joined #archiveteam-bs
15:08 πŸ”— dogsrcool has joined #archiveteam-bs
15:15 πŸ”— VerifiedJ has joined #archiveteam-bs
15:17 πŸ”— mundus JAA that's why he created it
15:18 πŸ”— mundus Has a project for oddshot.tv been proposed?
15:18 πŸ”— mundus https://oddshot.tv/
15:18 πŸ”— mundus https://medium.com/the-oddshot-loop/end-of-an-era-aefeca0420bf
15:19 πŸ”— mundus "Oddshot.tv will shutdown it's servers and applications on Monday February 12th Video files will not be accessible after this time."
15:19 πŸ”— mabynogy I've started archiving 4chan /g board - I have a json file per day - I plan to add an image scrapper soon to collect the memes - I'd like to put that somewhere where anybody could download it - any idea about that?
15:20 πŸ”— Jon fuck. the scanner I'm using for these old SF mags is generating TIFFs as required.... but the compression scheme inside the TIFF is "JPEG"
15:22 πŸ”— Jon i dont think the scanner gives me the option of controlling the compression type either
15:22 πŸ”— mundus API documentation for oddshot is here: https://api.oddshot.tv/docs/
15:23 πŸ”— mundus This seems like a perfect AT project
15:29 πŸ”— JAA mundus: Hm?
15:30 πŸ”— mundus the pushshift api
15:31 πŸ”— JAA Yes
15:31 πŸ”— JAA Ah, you were talking about the search part?
15:31 πŸ”— JAA I thought this was in reference to the deleted comments.
15:32 πŸ”— mundus yeah, the search
15:32 πŸ”— mundus The deleted comments exist before he started live archiving
15:33 πŸ”— mundus I think he started live archiving like 2 years ago
15:33 πŸ”— mundus https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/
15:39 πŸ”— JAA mundus: https://www.reddit.com/r/datasets/comments/71lkf1/pushshift_dataset_has_at_least_11_days_delay_in/dnbrl8t/?context=10000
15:39 πŸ”— mundus hmm
16:00 πŸ”— atrocity has joined #archiveteam-bs
16:29 πŸ”— SketchCow godane: I will tell my folks!
16:29 πŸ”— ZexaronS has joined #archiveteam-bs
16:31 πŸ”— JAA Oddshot has about 15 million video views per month: https://www.reddit.com/r/GlobalOffensive/comments/7wd4qh/psa_oddshot_is_shutting_down/dtzeyav/?context=1
16:31 πŸ”— JAA Maybe that helps for estimating the size.
16:31 πŸ”— godane SketchCow: i will also need labels to mail my own vhs tapes that i have digitize
16:55 πŸ”— godane so the tape i'm capturing gave me some trouble
16:56 πŸ”— godane like almost static in middle of picture
16:56 πŸ”— godane i start capturing in part of the tape i had no picture and got picture
16:57 πŸ”— godane so i'm going to see about recapturing it
16:58 πŸ”— godane nevermind going to skip this part of tape
17:24 πŸ”— godane anyways i skipped that tape cause it kept giving me no video after awhil
17:24 πŸ”— godane *awhile
17:30 πŸ”— godane so i got HBO First Look at Nine Months
17:30 πŸ”— godane and On The Set Judge Dredd videos
17:47 πŸ”— godane i'm at 124k for DTIC Archive
17:59 πŸ”— icedice has joined #archiveteam-bs
18:00 πŸ”— ZexaronS has quit IRC (Quit: Leaving)
18:07 πŸ”— godane i'm running a recheck of dtic archive pdfs to see that i uploaded everything in older areas
18:07 πŸ”— godane already found one number that doesn't have a item page
19:06 πŸ”— pizzaiolo has quit IRC (Remote host closed the connection)
19:09 πŸ”— pizzaiolo has joined #archiveteam-bs
19:14 πŸ”— w0rp has quit IRC (Read error: Operation timed out)
19:16 πŸ”— w0rp has joined #archiveteam-bs
19:22 πŸ”— REiN^ has quit IRC (Read error: Operation timed out)
19:24 πŸ”— icedice2 has joined #archiveteam-bs
19:27 πŸ”— icedice2 has quit IRC (Client Quit)
19:27 πŸ”— icedice has quit IRC (Read error: Operation timed out)
19:28 πŸ”— Ravenloft has quit IRC (Read error: Operation timed out)
20:15 πŸ”— godane my latest digitize tapes: https://www.patreon.com/posts/digitize-tapes-16904851
20:19 πŸ”— schbirid has quit IRC (Quit: Leaving)
20:20 πŸ”— ola_norsk has joined #archiveteam-bs
20:21 πŸ”— ola_norsk hi. I'm wondering if someone could help me out in regards to using S3 at IA..eg there's a setting in the app i'm using "S3_ROOT=s3://bucket/path/" ..
20:24 πŸ”— ola_norsk i don't want to mess anything up or causing clutter, so while making sense of https://archive.org/help/abouts3.txt , i'd prefer some input
20:26 πŸ”— ola_norsk e.g what's a "S3 Bucket" ?
20:26 πŸ”— astrid s3 bucket => ia item
20:27 πŸ”— ola_norsk ty. So ideally i could actually have one single item to WARC to ?
20:29 πŸ”— ola_norsk e.g one "Ola_Norsk_WARCS" item etc
20:29 πŸ”— jschwart has joined #archiveteam-bs
20:32 πŸ”— RichardG has quit IRC (Read error: Connection reset by peer)
20:34 πŸ”— ola_norsk the sofware is webrecorder btw https://github.com/webrecorder/webrecorder ..and I'm guess "path" would be collection then
20:34 πŸ”— RichardG has joined #archiveteam-bs
20:34 πŸ”— ola_norsk e.g "Ola_Norsk_WARCS/twitter_hashtag_netneutrality"..
20:34 πŸ”— schbirid has joined #archiveteam-bs
20:36 πŸ”— ola_norsk or "Ola_Norsk_WARCS/some_collection_name/"
20:42 πŸ”— ola_norsk or e.g, since i focus on twitter for the time being; "S3_ROOT=s3://twitter_hashtags_<year>/<hashtag named collection>/"
20:45 πŸ”— ola_norsk i guess it will be safer if i try it out on a test item first :]
20:51 πŸ”— schbirid has quit IRC (Quit: Leaving)
20:55 πŸ”— ola_norsk e.g "S3_ROOT=s3://s3.us.archive.org/ola_norsk_warcs_test" would be correct for a single item?
21:35 πŸ”— mabynogy has quit IRC (Quit: dpt.slasheva.com)
21:42 πŸ”— REiN^ has joined #archiveteam-bs
21:44 πŸ”— ZexaronS has joined #archiveteam-bs
21:45 πŸ”— mabynogy has joined #archiveteam-bs
21:50 πŸ”— BlueMax has joined #archiveteam-bs
22:12 πŸ”— mabynogy has quit IRC (Quit: dpt.slasheva.com)
22:12 πŸ”— VerifiedJ has left
22:33 πŸ”— pizzaiolo has quit IRC (pizzaiolo)
22:33 πŸ”— pizzaiolo has joined #archiveteam-bs
22:37 πŸ”— pizzaiolo has quit IRC (Client Quit)
22:38 πŸ”— godane has quit IRC (Read error: Operation timed out)
22:38 πŸ”— pizzaiolo has joined #archiveteam-bs
22:48 πŸ”— odemg arkiver, JAA SketchCow DFJustin do we know about this, https://medium.com/the-oddshot-loop/end-of-an-era-aefeca0420bf closing shop monday....
22:49 πŸ”— odemg "Over the last year, since introducing upload to the platform, we noticed more and more unsavory NSFW content being uploaded, which quickly became almost impossible to moderate. The frontpage in which we used to love everyday, seeing the best daily / weekly / monthly video highlights became a page we avoided and hated."
22:49 πŸ”— odemg ....always the fappers that ruin everything...
22:49 πŸ”— godane has joined #archiveteam-bs
22:49 πŸ”— godane has quit IRC (Client Quit)
22:53 πŸ”— JAA odemg: Yes, we know about it.
22:56 πŸ”— odemg JAA, are we already getting it or is it too late?
22:58 πŸ”— JAA I threw it into ArchiveBot earlier, which grabbed some of it, but I'm not aware of any systematic efforts.
22:59 πŸ”— JAA The COO is active on Reddit. Could be worth trying to contact them for more information (how large in total, possibility of a bulk export, etc.)
23:01 πŸ”— JAA UnfunMid is the username if you want to give that a shot (heh).
23:04 πŸ”— jschwart has quit IRC (Quit: Konversation terminated!)
23:07 πŸ”— riking fucking photobucket. is there any way to get the images out?
23:07 πŸ”— riking http://pic.photobucket.com/bwe.png
23:09 πŸ”— JAA Need to send the right referrer etc.
23:09 πŸ”— JAA Yep, they suck.
23:09 πŸ”— riking http://photobucket.com/gallery/user/beitstuck/media/cGF0aDovQm9va0ZvcnQucG5n/?ref=
23:09 πŸ”— riking uhhh
23:09 πŸ”— riking i'm seeing bwe, on photobucket.
23:09 πŸ”— riking do you see an image there?
23:10 πŸ”— riking HOLY SHIT THE ADS, i opened incognito
23:10 πŸ”— JAA lol, that's an interesting one.
23:10 πŸ”— JAA Well yeah, image hosting is not exactly a profitable business.
23:10 πŸ”— ola_norsk could bee data limit?
23:11 πŸ”— riking bwe = bandwidth exceeded
23:11 πŸ”— JAA Is there a bandwidth limit now as well?
23:11 πŸ”— JAA I thought that was only for external requests.
23:11 πŸ”— riking I hit the "random" button on the site i'm writing an archive viewer for
23:12 πŸ”— ola_norsk bandwidth limit is serverside is it not?
23:12 πŸ”— JAA https://support.photobucket.com/hc/en-us/articles/200724504-Storage-Vs-Bandwidth-What-s-The-Difference-
23:13 πŸ”— JAA Sounds like they still only limit bandwidth that comes from third party sites.
23:13 πŸ”— JAA Which makes sense, since they plaster their own website with ads.
23:13 πŸ”— JAA So they already make money from those visitors.
23:13 πŸ”— riking going to bet: that's the policy but it's not the implementation.
23:14 πŸ”— riking actually i think it's an account wide flag?
23:15 πŸ”— JAA That's definitely possible. Photobucket being a piece of shit wouldn't exactly be breaking news...
23:16 πŸ”— riking i clicked around and noticed load times, as if they weren't sure whether they wanted to say "fuck you" or not
23:16 πŸ”— JAA Or their servers just suck...
23:16 πŸ”— riking * to clarify: as if nobody had asked for that image in years
23:16 πŸ”— riking completely cold cache
23:20 πŸ”— JAA riking: On that link you mentioned, it might be a broken image gallery or so. I clicked the right arrow once and started seeing images and couldn't get back to the bwe.png afterwards.
23:20 πŸ”— riking Right so if you're investigating, here's the image source https://mspfa.com/?s=540&p=1
23:21 πŸ”— riking source has not been audited for quality :P
23:21 πŸ”— JAA Also, interestingly, I can download the first image in that gallery just fine with curl without a referrer or proper user agent.
23:21 πŸ”— JAA http://i996.photobucket.com/albums/af84/sharonitzhaki/icons/65s-1.jpg is the link for that.
23:22 πŸ”— riking i'm seeing a different username
23:22 πŸ”— JAA Huh
23:22 πŸ”— JAA Oh yeah, I somehow got redirected to a different gallery, dafuq?
23:25 πŸ”— JAA I guess that was the right arrow click then.
23:28 πŸ”— BlueMax has quit IRC (Leaving)
23:32 πŸ”— JAA I found a few projects on GitHub which try to work around this problem. Looks like the method used by https://github.com/nicinabox/fixpb still works, for example.
23:33 πŸ”— JAA curl -v -H 'Referer: http://photobucket.com/gallery/user/kk251/media/NORTON%20TANK%20on%20V11_zpsibge9erf.jpg' 'https://i282.photobucket.com/albums/kk251/BKL93908/NORTON%20TANK%20on%20V11_zpsibge9erf.jpg' >/dev/null
23:33 πŸ”— JAA Returns a ~70 kB JPEG
23:34 πŸ”— REiN^ has quit IRC (Read error: Operation timed out)
23:35 πŸ”— JAA Hm, now I also get it without specifying the referrer. Might be in some server-side cache now or something.
23:36 πŸ”— JAA Huh, I guess the redirects to bwe.png might also get cached. lol
23:37 πŸ”— JAA That would also explain why your original link got the error, riking.
23:38 πŸ”— riking Oh nice.
23:38 πŸ”— JAA So once you trigger the redirect, you need to wait six hours (according to the Expires header) until you can access it again.
23:38 πŸ”— JAA Fuck Photobucket.
23:39 πŸ”— riking hoooo wee
23:43 πŸ”— icedice has joined #archiveteam-bs
23:45 πŸ”— JAA Regarding the size of Oddshot: https://www.reddit.com/r/DataHoarder/comments/7wdcb8/oddshottv_the_stream_clip_hosting_service_is/du0b7ri/
23:45 πŸ”— JAA No estimate yet, but the COO will likely post it there.
23:45 πŸ”— JAA Also, there's an API, but you can't just get access immediately: https://www.reddit.com/r/Oddshot/comments/4szqxe/oddshot_api/
23:45 πŸ”— JAA Well, maybe that information is outdated, but I can't find anything else about it.
23:47 πŸ”— JAA I could've sworn I saw a post about it on Reddit somewhere earlier today, but I can't find it anymore.
23:51 πŸ”— JAA Everything on the website happens through POST requests with GraphQL.
23:55 πŸ”— riking there's a tool to get graphql docs
23:55 πŸ”— JAA Oh hey, there we go: api.oddshot.tv
23:56 πŸ”— JAA "Our API is public and you can get a key from your account profile on Oddshot, there is a 'show key' button."
23:56 πŸ”— JAA https://www.reddit.com/r/GlobalOffensive/comments/5pg0wh/female_cs_elegiggle/dcrfwru/
23:56 πŸ”— riking graphdoc -e https://gql.twitch.tv/gql -x Client-Id: kimne78kx3ncx6brgo4mv6wki5h1ko -x Authorization: OAuth 2v6zpwzz1ghjb3ujv0gzuve98ue7db -o graphql-docs/ -f
23:56 πŸ”— riking ... those weren't important
23:58 πŸ”— riking and key revoked, don't bother trying to use that oauth.
23:58 πŸ”— JAA The API doesn't expose all data though, as far as I can tell.
23:58 πŸ”— riking well yeah, that's why you use graphdoc with a Cookie: header
23:58 πŸ”— riking or whatever the live site is actually using

irclogger-viewer