[00:03] *** Stilett0 has joined #archiveteam-bs [00:04] JAA: aye..trying to archive a twitter hashtag has taught me that :/ "There was a problem loading..(retry button)") [00:05] Yeah, Twitter's also pretty good at not letting you grab everything. [00:05] Reddit as well. [00:05] (We were having a discussion about that earlier in #archivebot.) [00:05] At least you can iterate over all thread IDs in a reasonable amount of time on Reddit though. [00:07] So it appears that you can get 10k results from the vid.me API. [00:07] i feel naughty doing curl requests to https://web.archive.org/save/https://twitter.com/hashtag/netneutrality?f=tweets , currently every 3rd minute :/ [00:07] You can do that for different categories, new/hot, and probably search terms (didn't try). [00:09] There are 17 categories plus hot, new, and team picks. In the ideal case, that means 20 sections times 10k results, which is still only about 1/7th of the whole site. [00:09] This is only about how to gather lists of videos and their metadata (uploader, description, etc.), not the actual videos. [00:09] (Videos are available as Dash and HLS streams.) [00:10] There are also tags, and of course you can retrieve all (?) of an uploader's videos. [00:10] JAA: As for twitter, i think one problem is that they would easily present an archive of ANYTHING, as long as they get paid for it. [00:10] For each tag, you get hot, new, and top videos. [00:11] ola_norsk: Yeah, probably. [00:11] JAA: most definitely [00:14] There's a "random video" link. We could hammer that to get videos. I don't want to do the math how many times we need to retrieve it to discover the vast majority of all videos right now though. [00:14] JAA: for a legal warrant, or a slump of money, they could present all tweets with any hastag, since the dawn of ti..twitter [00:14] Ah, I thought you were talking about vid.me now. [00:15] Yeah, there is a company which has an entire archive of Twitter, I believe. [00:15] ah, sorry, that was just a link regarding GOG Connect [00:16] Ah, you're not in #archiveteam. vid.me is shutting down on Dec 14. [00:16] That's why I'm looking into them. [00:16] really? that soon? [00:16] https://medium.com/vidme/goodbye-for-now-120b40becafa [00:16] wow, that's going to piss off alot of germans :D [00:17] ola_norsk: I was thinking about Gnip, by the way. Looks like Twitter bought them a few years ago. [00:18] "We’re building something new." .. [00:19] a.k.a "Trust us, we're not completely destroying this shit..We're building something new!".. [00:20] free image/video host "couldn't find a path to sustainability" [00:20] man, i actually thought vid.me had something good going [00:20] what a surprise :p [00:21] https://archive.org/details/jscott_geocities [00:24] wow, there's actually people who cancelled their youtube accounts after having used vid.me's easy export solution [00:24] and as far as i know, that shit might not be such easy to export back, since i don't think YT does import by url.. [00:25] og well [00:28] why not upload to both? <.< [00:28] aye [00:30] omglolbah: according to "SidAlpha", if you know that youtuber, he would'nt because it would mean he'd have to interact on several platforms.. [00:30] If only he had moved to vidme [00:31] that was his response to the request for that, not move, but upload there as well [00:31] no, I'm saying I wished he had moved so that he would be gone :p [00:31] oh [00:34] where does shit go if Youtube goes though? I mean, Google Video went to Youtube.. [00:36] Where did Yahoo Video go? [00:36] Justin.tv became Twitch right? [00:37] Justin.tv created Twitch and then closed down. Nothing was automatically moved. I don't know if Justin had vods though. [00:37] aye [00:38] I was wrong about the vid.me API not returning all results. [00:38] The actual API does return everything, or at least nearly everything. [00:38] The "API" used by the website doesn't. [00:38] I just didn't find the real API docs previously. [00:38] https://docs.vid.me/#api-Videos-List [00:38] No auth required either. [00:38] You can get chunks of 100 videos per request. [00:38] \o/ [00:39] Do we have a death date? [00:39] It gets quite slow for large offsets, indicating that they don't know how to use offsets. [00:39] 14 Dec [00:39] :-/ [00:39] how to use indices* [00:39] indexes? [00:40] where would youtube go? nowhere. it's too big :p [00:40] I never know which plural's correct. [00:40] Frogging; aye [00:40] Frodding: we'll just show up with a tractor trailer "Load it all in back y'all" [00:40] The real API returns a bit more videos, by the way: 1360532. [00:40] (About 11k more, specifically.) [00:41] Might be the NSFW/unmoderated/private filter stuff. [00:41] bithippo: YouTube is around 1 exabyte. Have fun with that. [00:41] Well, at least that order of magnitude. [00:41] I used to manage hundreds of petabytes :-P [00:41] * ola_norsk shoves in in his usb stick and applies youtube-dl ! [00:41] lol [00:42] I'm sure someone from China will sell you a 1 EB USB stick if you ask them. [00:42] Well, "1 EB". [00:42] Which will quickly err out once a few GB have been written .... :( [00:42] Yep [00:42] i'll just save it all in /dev/null [00:42] Or not error out, just overwrite the previous data etc. [00:43] Depends. Some of them are cyclical, so you can write all you want as long as you don't try to read it. :) [00:43] Yep [00:43] I'm a fan of S4. [00:43] The Super Simple Storage Service. [00:43] http://www.supersimplestorageservice.com/ [00:44] That pricing is a bargain. [00:44] bithippo: Interesting. What did you work with that included 100s of PiBs? I deal in 10s of them. [00:44] Data taking for LHC detector [00:44] Ooh, nice! [00:45] Only a couple hundred TB of spinning disk on storage arrays, the rest were tape archive libraries. [00:45] bithippo: Ah. I sort of do that on the sly. Part of our storage is for the Nordic LHC grid. [00:45] #TeamCMS [00:46] I deal mostly with crimate data though. Have a few petabytes of that. [00:46] That's awesome. [00:46] I <3 big data sets [00:47] Indeed :) [00:47] @ola_norsk If you're interested in how to make something be emulated on IA, here's some pages that lay it out for you- http://digitize.archiveteam.org/index.php/Internet_Archive_Emulation http://digitize.archiveteam.org/index.php/Making_Software_Emulate_on_IA [00:49] dashcloud: ty, i'm thinking there must be ways. If there's dosbox, there's e.g Frodo that could run in that.. [00:50] I've done a bunch of DOSBOX games, and there's a whole collection of emulated DOS/Win31/Mac Classic stuff up [00:51] ola_norsk: What, the C64 emu? [00:51] yes [00:51] No, nonononono. Go helt the jsmess people get Vice running instead. [00:52] i was hoping that was already done [00:52] I know it's started. [00:52] good stuff [00:52] But it might be stalled forever for all I know. [00:54] i have no idea about these things, but it would be cool to see C64 on Internet Arcade [00:54] JAA: I'll have very little time to do anything before the 9th, and probably not much after either, but ping me if storage is needed for vid.me. [00:55] dashcloud, ill try to make an item per that, using dosbox [00:55] ty for info [00:56] zino: Will do. I'll set up a scrape of the API first to get all the relevant information about the platform. Then we'll see. [00:57] if your software needs installation or configuration before the first run, you'll want to do that ahead of time [00:57] scrape/archive, whatever. That's the information we can save for sure. [00:57] Unless they ban us... [00:58] Using minVideoId and maxVideoId might be faster than the offset/limit method, especially for the later pages. [00:59] Current video IDs are slightly above 19 million, so that's around 190k requests (to be sure no videos are missed). [01:03] attending my first 2600 meeting [01:05] so the thing with vidme, there's a bunch of original stuff [01:06] there's a little bit of lewd stuff (they ban outright porn, but they do permit "artistic" nsfw) [01:06] and then there's a bit of it that consists of reuploads of copyrighted stuff [01:06] OK [01:06] not that I think it'll be a big deal since IA can just dark the affected stuff if someone does come yelling, but something to keep in mind [01:07] Sounds more or less what I'd expect. [01:07] like* [01:09] I can't find any information about API rate limits, except this Reddit thread: https://redd.it/6acvg5 [01:11] *** icedice has quit IRC (Quit: Leaving) [01:18] *** Ceryn has quit IRC (Connection closed) [01:26] "The Internet is Living on Borrowed Time" .. https://vid.me/1LriY (ironically on vid.me) ..That's pretty dark title, for being Lunduke :d [01:33] To be fair, it's also available on YouTube: https://www.youtube.com/watch?v=1VD_pJOFnZ0 [01:54] thats not fair :D [01:54] i think most of his vids are also on IA :d [01:55] but yeah [01:57] seriously though. I imagine there's a shitload of german vidme'ers currently bewildered as to what to do.. [01:59] a lot of people used the url importing at vidme, thinking they would simple move their entire channels.. [02:00] from what i've heard tales of, germany youtube is not the same youtube as everywhere elsetube [02:07] GEMA blocks a fuckton of music there [02:07] aye [02:09] ranma: is that the only reason though? There were so many germans coming to vid.me it was made a video about it.. [02:12] JAA: how do you get your data OUT of S4? [02:12] and what are the costs? [02:12] s4? [02:13] ranma: "German INVASION"...100k creators..https://vid.me/JjNaH [02:13] oh, it's a joke :'( [02:14] i hate slow internet. ml [02:14] *fml [02:14] *** phuzion has joined #archiveteam-bs [02:18] ranma: does it simply block ALL music? i can't see any other reason for such a noticable influx and flight of users [02:19] ranma: It's actually hard to browse vidme because of it at times, since often 1 in 2 videos on the feed is german [02:22] kinda wish some site could ZIP/7z another site [02:22] just noticed archivebot slurped down https://ftp.modland.com/ [02:23] did it *completely* slurp modland? [02:23] dd -i http://google.com -o http://bing.com [02:23] Muad-Dib: Your job for https://ftp.modland.com/ has finished. [02:25] actually, not that i'd have the space for it, tho [02:25] *** ola_norsk has quit IRC (its the beer talking) [02:26] does anyone have a great upload script for ia? their docs are too much for me to understand and uploading 1 by 1 is painful [02:34] for anyone wanting to mirror vid.me, its possible to page everything there: https://api.vid.me/videos/list?minVideoId=100&maxVideoId=1000 [02:34] just step the min/max (its easier on the db). [02:34] ..... https://usercontent.irccloud-cdn.com/file/PZalOsZ6/image.png [02:35] JAA: ^ [02:35] our wiki is more stable than this beta-like system :P [02:39] CoolCanuk: What do you mean by "upload script for ia"? [02:40] Such as https://github.com/jjjake/internetarchive ? [02:40] an easier way [02:41] eg I can loop is for 100s of files in a folder, but upload as 100 items. [02:41] That repo is your best bet for that sort of operation. [02:42] What sort of files and metadata? [02:42] pdf [02:43] currently, newspapers [02:43] and sears crap [02:44] Hmm [02:46] The two routes would be "web interface", which gives you a nice interface and shouldn't be too painful if you're putting up each folder as an item (with all of the files contained within that folder attributed to the item). Failing that, you'd need some light python or bash scripting skills to pickup up files per item, associate metadata with each item, and upload. [02:46] I could be wrong of course! But that's my interpretation based on working with the IA interfaces. [02:47] tbh, IA interface is just plain atrocious to use [02:47] Indeed. [02:47] i suppose thats artificial barrier of entry on purpose to avoid people uploading crap [02:48] I guess [02:48] I'm uploading stuff I know will probably not be found anywhere else [02:49] yea, the commitment to jump the hoops is paired with commitment to curate content [02:50] Only think I'm worried about is repetitive strain injury [02:52] *** wp494_ has joined #archiveteam-bs [02:59] *** wp494 has quit IRC (Read error: Operation timed out) [03:03] *** ld1 has quit IRC (Quit: ~) [03:06] *** ld1 has joined #archiveteam-bs [03:19] why does IA have a difficult time using the FIRST page of a pdf as the COVER >:( [03:41] *** wp494_ is now known as wp494 [03:47] X-posting from #archiveteam: if you're using youtube-dl to grab vid.me content, be aware of this issue: https://github.com/rg3/youtube-dl/issues/14199 [03:47] tl;dr: their HLS streams return a data format youtube-dl doesn't fully handle resulting in corrupted output files [03:48] Use a workaround in the 2nd to last comment to force youtube-dl to grab from the DASH endpoints instead [03:54] posting highlights of https://www.youtube.com/watch?v=KMaWSinw4MI&t=41m33s here [03:55] first one being that linus has significant disagreements with senior management, and especially NCIX's owner [03:55] which seems to be a very common theme [03:56] he also left NCIX because the people he mentored departed [03:56] says he thinks some were forced out because of extraordinarily poor management decisions [03:57] (in his opinion) [03:57] I'm reading this https://np.reddit.com/r/bapcsalescanada/comments/77h771/for_anyone_that_purchased_a_8700k_from_ncix/domm2ca/?context=3 [04:09] *** josho493 has joined #archiveteam-bs [04:09] linus pitched what sounded like a pretty good idea, try and get bought, but how? his solution was to open "NCIX Lite"s across the country which would be really small pickup places that you could ship to since shipping direct to your home sometimes killed the deal [04:10] said that the writing was on the wall as early as 7 years ago (before Amazon was doing pickup) to anyone actively paying attention, so the idea would've been to attract someone similar to Amazon if not Amazon themselves using that infrastructure when they wanted to gobble someone Whole Foods style [04:12] linus said when management didn't do that, he said it became obvious that he had to GTFO [04:12] he says he hasn't been screwed over personally by Steve (the owner) and his wife unlike some of the other horror stories going out [04:16] he wound up signing a non-compete for 2 years (which got extended by 1) [04:17] when he left he took the LTT assets, and did it on paper (and was glad he did), because even though he wouldn't think Steve would do anything untoward to him, creditors are sharks looking for their next kill [04:19] and that's about it [04:27] why do people bother with fairly standard eshop drama, was ncix the canadian amazon or something? [04:27] More like Canadian Newegg... before Newegg moved into Canada [04:28] It was *the* place to go for computer parts online, from what I understand [04:28] ah [04:28] razor thin margins, yea we have that locally too [04:28] all with fake "in stock" stickers where you wait 2 weeks and everything [04:30] *** qw3rty115 has joined #archiveteam-bs [04:34] *** qw3rty114 has quit IRC (Read error: Operation timed out) [04:38] they had a location here in Ottawa. I used to shop there until they closed it [05:07] *** josho493 has quit IRC (Quit: Page closed) [05:09] defunct as of today :o [05:09] *yesterday [05:11] *** Mateon1 has quit IRC (Ping timeout: 245 seconds) [05:12] *** Mateon1 has joined #archiveteam-bs [05:15] am I the only one who doesnt really see the big deal of google home/mini or amazon alexa? [05:29] *** ranavalon has quit IRC (Read error: Connection reset by peer) [05:36] there's a big deal? [05:38] *** shindakun has joined #archiveteam-bs [05:38] *** Jcc10 has joined #archiveteam-bs [05:42] CoolCanuk: we're all waiting for amazon to give access to alexa transcripts to app devs [05:42] so we can start archiving every little embarassing thing anyone has ever said [05:42] which of vidme's logos should I use for the article, the wordmark or their "astro" mascot [05:42] https://vid.me/media [05:48] the one on the main page (red) [05:48] wordmark it is [05:48] sadly cant be eps or svg :( [05:49] gonna resize it a little otherwise it'll appear about as big in a warrior project [05:49] or we could fix the template [05:49] wait what do you mean [05:50] lemme go dig through the spuf logs to show you [05:50] (come to think of it I'm not even sure if I took an image, I might have just pull requested and moved on) [05:51] our {{Template project}} should be fixed to a larger logo size [05:51] using it online is not an issue, because we can dynamicly resize [05:51] http://tracker.archiveteam.org/ [05:52] yeah there it could benefit from being a touch bigger at least for logos that are rectangles instead of squares [05:52] apparently we can't... :| [05:52] (it seems to like squares the best) [05:52] "benefit"? [05:52] distortion? [05:52] and yeah, I was about to say, our copy of mediawiki isn't quite as flexible as wikimedia's where you can stuff in any number and it'll spit it out for you [05:52] even ridiculously large ones like 10000px [05:52] I just noticed that. that's too bad [05:53] another reason to use SVG. [05:54] even SVGs too [05:54] no. SVGs are not raster [05:55] you can blow them up to 1000000000px and it will never distort unless you have embedded rasters [05:57] https://upload.wikimedia.org/wikipedia/commons/3/35/Tux.svg [06:02] ok I was gonna recreate an example with SPUF but there's a live one that I can get you right now [06:03] see how the miiverse logo goes a bit out of its bounds and pushes content downwards: https://i.imgur.com/P3Wcfbp.png [06:03] ew [06:03] logo should be within that white div, not yellow [06:04] (within, not overlaid) [06:04] now take the version of the steam icon we had stored on the wiki and stuffed into the project code (http://www.archiveteam.org/images/4/48/Steam_Icon_2014.png) and it wound up being a bit worse than that example [06:04] luckily a 100px version that mediawiki gracefully generated more or less solved things: https://github.com/ArchiveTeam/spuf-grab/pull/2/commits/1c319d3d144cc13599f1fe571e699ca8b3d79e60 [06:04] not the image's fault, it's the tracker ;) [06:05] afaik tracker main page was ok [06:05] how could it be ok [06:05] note how it looks like it's fine on http://tracker.archiveteam.org/ [06:05] simply use max-width for img in css [06:05] *height [06:06] but with that said scroll bars do appear [06:06] then you need to [06:06] overflow: hidden [06:06] but it's nothing near as annoying as the in-warrior example, though still a nuisance albeit very minor [06:07] I will fix it [06:08] k so a 600 x 148 version will go up on the wiki [06:08] and then if it causes problems we can grab a 100px url [06:08] for project code [06:09] we have or [06:09] **or [06:09] just use max-height: 100px [06:09] ;) [06:09] ok project page is going up [06:09] lol how did it let you upload file name with a space :P [06:09] it makes me use _ [06:10] it does insert a _ [06:10] the recent changes bot treats it as a space though [06:11] but for actually using the filename you're going to need to use underscores [06:11] o [06:11] aw crap I'm getting spam filtered and I don't even get a prompt to put in the secret phrase [06:12] oh well let's see if this workaround of inserting a space in the url works [06:12] heh [06:12] SHHHH that's supposed to be a secret :x [06:13] ok wow that apparently worked [06:15] i'll fix it for ya [06:15] gl with the filter [06:15] oh you fixed it [06:15] I was surprised I was even able to toss such a tiny little stone at that goliath [06:18] ok that's a solid foundation I think [06:19] huh [06:20] I have a workaround :P [06:21] *** slyphic has quit IRC (Read error: Operation timed out) [06:21] I got a 508 clicking that purplebot link [06:21] godane: What does "WOC" mean with the MPGs? [06:21] resource limit reached [06:22] this 208 error will be the death of me [06:22] *508 [06:23] connection timed out now... [06:23] same here ughhhh [06:24] ffff [06:24] impossible to eidt [06:25] oh finally [06:25] there must be more than just "shared hosting" being the problem [06:28] can the topic in #archiveteam changed from Compuserve to vidme? lmfao [06:28] *be changed [06:32] if it gets pointed out a few times like with compuserve then someone will probably do it [06:32] if it's just once or twice more then it's no big deal just say "yeah we're on it" [06:33] fair [07:06] making up a tag for vidme is going to be tricky. it's so short.. hard to come with a spinoff [07:42] *** Pixi has quit IRC (Ping timeout: 255 seconds) [07:42] *** Pixi has joined #archiveteam-bs [07:44] *** BlueMaxim has quit IRC (Ping timeout: 633 seconds) [07:45] *** BlueMaxim has joined #archiveteam-bs [09:05] *** Dimtree has quit IRC (Peace) [09:11] *** Dimtree has joined #archiveteam-bs [09:55] *** fie has quit IRC (Ping timeout: 245 seconds) [10:11] *** fie has joined #archiveteam-bs [10:16] *** CoolCanuk has quit IRC (Quit: Connection closed for inactivity) [10:27] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [10:35] *** schbirid has joined #archiveteam-bs [11:08] *** fie has quit IRC (Ping timeout: 246 seconds) [11:21] *** fie has joined #archiveteam-bs [11:33] *** bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…) [11:44] *** jschwart has joined #archiveteam-bs [12:11] ez: Yep, that's what I came up with yesterday as well. You can either iterate min/maxVideoId in blocks of 100 with limit=100 or implement pagination. I'd probably go for the former, i.e. retrieve video IDs 1 to 100, 101 to 200, etc. (need to figure out whether these parameters are exclusive or not though). [12:17] *** MangoTec has joined #archiveteam-bs [12:41] my god [12:41] the best thing I've ever heard just got tweeted [12:41] @ElonMusk: Payload will be my midnight cherry Tesla Roadster playing Space Oddity. Destination is Mars orbit. Will be in deep space for a billion years or so if it doesn’t blow up on ascent. [12:57] Elon knows how to put on a show. [12:57] Yep [12:58] I mean, he thinks its going to blow, they didn't want to make a real payload... so fuck it send a Car [13:01] At this time I'll recommand the old Top Gear episode where they convert a car to a space shuttle and blast it off with rockets. [13:01] recommend* [13:24] *** MangoTec has quit IRC (Quit: Page closed) [13:43] hetzner's auctions seem to have dropped in price a lot, 1/3 aka -10€ for what i have [13:43] https://www.hetzner.com/sb [13:44] nvm, had fucking US version without VAT =( [13:49] https://medium.com/vidme/goodbye-for-now-120b40becafa [13:49] https://medium.com/vidme/goodbye-for-now-120b40becafa [13:49] https://medium.com/vidme/goodbye-for-now-120b40becafa [13:49] What the fuck!! [13:49] Okay you know about it [13:49] but what the actual fuck! [14:10] people are finding out its REALLY hard to make a video website [14:16] It's easy to make a video site, it's just hard to monetise it, mediacru.sh was the best in terms of technology in my opinion but they didn't manage to monetise either. [14:16] *** ranavalon has joined #archiveteam-bs [14:17] *** ranavalon has quit IRC (Remote host closed the connection) [14:17] I'm collecting video ids from reddit anyways, heads up the bulk of the older urls (and possibly new ones) are going to be reddit porn related. [14:17] *** ranavalon has joined #archiveteam-bs [14:18] wait, youtube is still operating at loss [14:18] why the FUCK are people making so much money on their ad share then? [14:18] *? [14:21] *** voidsta has joined #archiveteam-bs [14:22] Google isn't operating at a loss, so they can keep YouTube afloat and keep trying new things to pump up their bottom line, which is why we see a new yt related shit storm every other week, yt may as well be called YouTube[beta] or YouTube[this is an experiment] [14:23] YouTube{incredible journey] [14:24] Though because it's Google ad because there is no real competition for them making any real headway we can talk like yt is 'never' going to close doors, or turn their service off, but it'll come, maybe not today, maybe not in 5 years, but it'll come when we're 'what the fucking' at a Google blog post announcing there coming plans to phase out YouTube or just turn it off. [14:26] Hopefully that comes at a time 500PB* is nothing and something we can grab in a few months [14:26] ... except YouTube will be 10 EB by then. [14:27] wait what [14:27] vimeo is dead? [14:28] vid.me I though [14:28] vid.me [14:28] ffs it looked very close to vimeo [14:28] It's an odd time we're living in when we first started 10TB was insane to think we could get, now we're doing sites nearing 300TB without a great deal of thought, we're scaling pretty well with the times I suppose, but how long before ia close doors and we have to find somewhere to put that? (I know we're talking about it...) [14:29] Ugh [14:29] if IA ever goes bust [14:29] so [14:29] we have 2 weeks for vidme [14:29] yup [14:29] I'm setting up an API scrape right now. [14:30] probably needs a channel, not sure how big it is [14:30] #vidmeh [14:30] 1.3x million videos [14:30] vidwithoutme, vidnee, vidmeh [14:31] vidmeh will do [14:31] This will almost need to be a warrior project. We can probably fix storage, but there is no way we can download this in time using a script-solution unless someone buys up Amazon nodes to do it. [14:32] JAA: Any idea what the average size of a video is? [14:32] zino: I haven't looked at the videos themselves at all yet, only the metadata. [14:32] The API returns a link to download the videos as an MP4, by the way. [14:32] The website uses Dash/HLS. [14:35] Those MP4s are hosted on CloudFront, by the way, i.e. Amazon. That could be annoying. [14:51] wiki is slow as balls [15:07] *** voidsta has left [15:08] zino: I've been scraping a few channels and here's what I've seen so far. Their highest quality is 2 mbps video (at 1080p or 720p depending on the original resolution) with audio between 128kbps and 320 kbps(!) [15:09] SD-quality video is around 1200 kbps [15:09] Ugh [15:10] thats not too bad overall [15:10] And I'm grabbing with youtube-dl's "bestvideo+bestaudio" option, if storage/bandwidth becomes an issue they have lower-quality versions we could grab instead [15:11] Na [15:11] We have da powerrrr [15:11] right now I'm working on the grabber, mostly just going to mod eroshare-grab [15:12] Some files are randomly capped at 150 KB/s download while others will saturate my 50 mbit connection [15:12] the channel pages are going to be interesting since they scroll load type [15:12] As long as the URLs for those follow a pattern that shouldn't be too hard [15:13] Oh, I just noticed there's a channel, #vidmeh [15:13] ya [15:23] Nothing is bloody working [15:24] for what [15:24] I've spent all day trying to get my proxmox cluster sorted [15:30] dat CDN [15:35] *** Jcc10 has quit IRC (Ping timeout: 260 seconds) [15:39] hay JAA your pulling all the APIs, are you saving all the reposes so we can get the raw URL for the videos? [15:39] reponses* [16:01] Yeah, of course I save them. To WARC, specifically. [16:04] *** kristian_ has joined #archiveteam-bs [16:29] *** CoolCanuk has joined #archiveteam-bs [16:33] *** fie has quit IRC (Ping timeout: 360 seconds) [16:43] *** shin has joined #archiveteam-bs [16:44] *** fie has joined #archiveteam-bs [17:06] don't know if it will help but i was made a brute force video/metadata downloader for vidme https://github.com/shindakun/vidme i don't really have the bandwidth or storage to let it run though [17:07] you guys already have a lot of tooling though [17:07] No need to bruteforce, we can get a list of all videos through their API. [17:07] shindakun: /join #vidmeh [17:07] (I'm doing that currently.) [17:08] that's basically what it does sort of... i found some seemed to be unlisted so i request details for every videoid [17:08] off to vidmeh lol [17:08] Right. There's an API endpoint for getting lists of videos though, so you don't have to run through all ~19M IDs. [17:09] You can do it with 190k requests. With further optimisation, it might be possible to decrease that even further, but that's a bit more complex. [17:09] *** ola_norsk has joined #archiveteam-bs [17:11] made a test C64/dosbox emulator item (https://archive.org/details/iaCSS64_test) , but it seems very slow. At least on my potato pc. [17:13] unfortunatly i'm no ms-dos guru. But might there be a way to optimize speed trough some dos utilites/settings that could reside in the zip file? [17:14] You are emulating in two layers. It's not going to be fast, or accurate. [17:16] yeah it's kind of emu-inception :d But, could fastER be done perhaps? [17:17] i did try it in Brave browser as well as Chromium, and Brave seemed to run it a bit better. [17:17] and my pc is kind of shit [17:21] /join #vidmeh [17:21] ahem [17:27] *** Stilett0 has quit IRC (Ping timeout: 246 seconds) [17:29] ahhhhh. CLEVER [17:32] *** Pixi has quit IRC (Quit: Pixi) [17:38] *** kristian_ has quit IRC (Quit: Leaving) [17:45] *** mundus201 is now known as mundus [17:46] *** Pixi has joined #archiveteam-bs [18:08] How can I automatically save links from an RSS feed onto the wayback machine? [18:08] *** pizzaiolo has joined #archiveteam-bs [18:14] i'd use something like this http://xmlgrid.net/xml2text.html . then get rid of the non urls in excel/google sheets. [18:15] Ew [18:15] then upload your list of urls to pastebin, get the raw link. in #archivebot , use !ao < PASTEBINrawLINK [18:15] you got a better idea, JAA ? :P [18:15] if you have the links in a list; curl --silent --max-time 120 --connect-timeout 30 'https://web.archive.org/save/THE_LINK_TO_SAVE' > /dev/null , is a way to save them i think [18:15] Grab the feed, extract the links (by parsing the XML), throw them into wpull, upload WARC to IA. Throw everything into a cronjob, done. [18:16] o ok [18:16] I suspect he's looking for something that doesn't require writing code though. [18:16] most users are :P [18:16] also why curl? cant we just use HTTP GET? [18:17] that's what curl does [18:17] That's what curl does. You could also use wget, wpull, or whatever else. [18:17] Hell, you could do it with openssl s_client if you really wanted to. [18:18] And yeah, you can obviously replace the "throw them into wpull, upload WARC to IA" with that. [18:18]