#archiveteam-bs 2019-10-09,Wed

↑back Search

Time Nickname Message
00:01 🔗 godane has joined #archiveteam-bs
01:17 🔗 DFJustin has quit IRC (Remote host closed the connection)
01:20 🔗 DFJustin has joined #archiveteam-bs
02:25 🔗 ReimuHaku has joined #archiveteam-bs
02:40 🔗 af10b3e5e has quit IRC (Read error: Connection reset by peer)
02:41 🔗 af10b3e5e has joined #archiveteam-bs
02:49 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
03:02 🔗 ntntn has joined #archiveteam-bs
03:03 🔗 ReimuHaku has quit IRC (Read error: Operation timed out)
03:04 🔗 odemgi_ has joined #archiveteam-bs
03:07 🔗 odemgi has quit IRC (Ping timeout: 252 seconds)
03:15 🔗 qw3rty has joined #archiveteam-bs
03:24 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
03:49 🔗 ranma neat https://www.reddit.com/r/emulation/comments/dexcil/compact_disc_structure_preliminary_proposal_of_a/
03:50 🔗 ranma especially how SecuROM works
04:16 🔗 af10b3e5e has quit IRC (Quit: https://i.imgur.com/xacQ09F.mp4)
04:16 🔗 d5f4a3622 has joined #archiveteam-bs
04:20 🔗 markedL will WBM replay pagination done by POST ?
04:53 🔗 markedL I can probably create warcs with POST content but if the WBM doesn't like that, it would be easier to just paginate it via GET
04:57 🔗 SmileyG has joined #archiveteam-bs
04:59 🔗 Smiley has quit IRC (Ping timeout: 258 seconds)
05:08 🔗 RichardG_ has quit IRC (Quit: Keyboard not found, press F1 to continue)
05:11 🔗 RichardG has joined #archiveteam-bs
05:23 🔗 ntntn markedL, there was a page i was looking at recently which paginated with POSTs. i can almost but not quite remember what it was. can you pls tell me an example site so i can go test the WM
05:25 🔗 ntntn it's possible that viewing an archived page and then triggering some intrapage POST will work
05:26 🔗 ntntn i'd rather have just tested than type out the theory here though
05:29 🔗 markedL if we wait a bit, someone will know
05:41 🔗 wabu has joined #archiveteam-bs
05:42 🔗 markedL I'll start a test crawl for IDs, maybe it'll be informative
05:43 🔗 ntntn everything else at DiscApps has nice urls. they are all static resources though. pages of topics are dynamic
05:43 🔗 ntntn i am doing some investigation now too. but that idea of yours is nice
05:46 🔗 ntntn actually, yes that's where i would have ended up at my next step. making an index. i wasn't actually prepared to get into working on this immediately tonight. i'll make up my mind whether to do now or tomorrow
05:53 🔗 markedL the site is very fast, so this is probably doable minus the WBM part. and no idea how many disc IDs there are out there
05:58 🔗 ntntn so you mean archiving to somewhere other than WM
05:58 🔗 ntntn thanks for lending your opinion/verdict
05:59 🔗 ntntn it accords with the impression i had got
06:00 🔗 markedL it can be in the WBM, but the next button wouldn't work; so far example "making up" urls like http://disc.yourwebapps.com/discussion.cgi?disc=1&pagemark=20 which is the same as the 2nd page
06:00 🔗 ntntn so now to investigate what forums/topics/posts are there
06:01 🔗 ntntn oh really
06:01 🔗 ntntn so it is
06:03 🔗 markedL do new comments in a thread, reorder the messages in the index?
06:03 🔗 ntntn is it going to then be as simple as changing those two numbers, to access the whole site
06:04 🔗 markedL if the order changes it needs more caution of the order of hitting them
06:06 🔗 ntntn yes it would then need caution indeed, but hopefully it's not like that. can make a/some tests post/s in somehere. i think it allows anon posting
06:06 🔗 ntntn i'm really not up to speed right now to help out as well as i could
06:08 🔗 markedL just collecting/documenting the data of how it works, the ID ranges, types of pages, is plenty helpful
06:13 🔗 ntntn i am checking on that last question you asked. i'm saying that i wasn't really prepared to - it's gotten late - but fortunately, as you have found, the site is nicely simple
06:15 🔗 ntntn i have humorously found this forum which i will test for staticness/dyanamism of posts. http://disc.yourwebapps.com/Indices/203069.html
06:17 🔗 ntntn (i didn't expect that someone would immediately get stuck into it. no doubt, it's nice that you have though)
06:17 🔗 markedL lol. late here too, can continue after rest.
06:18 🔗 ntntn look, i have written multiple sentences to make one point, clear sign of tiredness and not focussed
06:18 🔗 ntntn good, i'll do the proposed thing tomorrow
06:23 🔗 Mateon1 has quit IRC (Remote host closed the connection)
06:23 🔗 Mateon1 has joined #archiveteam-bs
06:25 🔗 ntntn just correct myself: can create my own forum to test in
06:28 🔗 d5f4a3622 has quit IRC (Read error: Connection reset by peer)
06:28 🔗 d5f4a3622 has joined #archiveteam-bs
07:18 🔗 dxrt has joined #archiveteam-bs
07:18 🔗 Fusl____ sets mode: +o dxrt
07:18 🔗 Fusl sets mode: +o dxrt
07:18 🔗 Fusl_ sets mode: +o dxrt
07:24 🔗 systwi_ is now known as systwi
08:43 🔗 odemgi has joined #archiveteam-bs
08:45 🔗 odemgi_ has quit IRC (Ping timeout: 252 seconds)
08:48 🔗 JAA markedL: No, the WBM cannot play back POST requests. I think the responses will be available, but all requests with the same URL will be mixed together of course.
09:34 🔗 JAA Soo, I'm getting rate-limited on picosong since a couple hours.
09:35 🔗 JAA Stopping my crawl for now and investigating.
09:40 🔗 JAA The rate limiting happens on /cdn/HEX.mp3 URLs. It doesn't tell me the rate limit though. I'll analyse my logs to figure that out.
09:40 🔗 d5f4a3622 has quit IRC (Read error: Connection reset by peer)
09:40 🔗 d5f4a3622 has joined #archiveteam-bs
09:41 🔗 Fusl JAA: throw the urls into a tracker project or amqp and crawl it distributed across hundreds of instances?
09:42 🔗 Igloo ^
10:24 🔗 JAA Soo, not sure what the rate limit is. The number of successful retrievals within a minute varied between 3 and 20 across the last few minutes before I stopped it.
10:26 🔗 JAA Before the rate limiting, I was making 400-ish requests per minute. So yeah, need to change how this works.
10:26 🔗 JAA Somewhere between 20 and 100 IPs needed I guess.
10:34 🔗 killsushi has quit IRC (Quit: Leaving)
10:36 🔗 ShellyRol has quit IRC (Read error: Connection reset by peer)
10:40 🔗 qw3rty2 has joined #archiveteam-bs
10:44 🔗 ShellyRol has joined #archiveteam-bs
10:46 🔗 qw3rty has quit IRC (Ping timeout: 745 seconds)
10:57 🔗 markedL ls
10:58 🔗 ntntn hi
10:58 🔗 ntntn i just got back too. why've you typed ls
11:00 🔗 markedL that was a human error. I meant to say, I saw a comment about disc that was disconcerting. something like old messages were falling off the index
11:00 🔗 ntntn i saw that too
11:01 🔗 ntntn that kind of thing i just take as par for the course that is the Internet, now
11:01 🔗 ntntn i wouldn't fret mate
11:03 🔗 ntntn and here's why i wouldn't fret
11:03 🔗 ntntn observe at the bottom of the faq,
11:04 🔗 ntntn https://disc.yourwebapps.com/faq.html
11:04 🔗 ntntn How do I delete a DiscApp? DiscApps are removed from the server after 12 weeks of inactivity (no new articles and no administration).
11:08 🔗 markedL this site is impressive in it's non modernerity
11:09 🔗 ntntn (it took quite some time to get used to the jarring feeling that disconcerts you. gradually from when i first read that the IMDb randomly deletes posts. i mean that's still shocking if i give it any thought)
11:10 🔗 ntntn yes. did you see similar thoughts expressed in those few topics i linked? this is exactly my motivation (obviously) for this
11:11 🔗 ntntn anyway right now i just have to mainly say
11:12 🔗 ntntn and clear up, that when i said 'humorously' above, it wasn't meant as mockery, but as joviality
11:18 🔗 markedL I'll start on this, should have something done in an hour
11:28 🔗 ntntn there are some few other pages on the site too, not just the forums, and it's worth getting a complete current list
11:29 🔗 ntntn for example https://yourwebapps.com/About/, and maybe ihm
11:29 🔗 ntntn ..., and maybe it's only the ones linked on that page
11:31 🔗 ntntn [h is next to t on my keyboard, single apostrophe is long-press m, and backspace is next to enter - typo expl]
11:32 🔗 Ivy Dvorak?
11:33 🔗 ntntn it's not, i'e never used that. it's workman fwiw
11:34 🔗 ntntn i do recommend trying it, if thqt kind of thing interests
11:35 🔗 Ivy ahh ok
11:37 🔗 ntntn the guy has done good work, i'd say. have long been aware of many researched layouts, but this one i felt able to just go ahead and use
11:37 🔗 ntntn that sums it up for me i reckon
11:38 🔗 ntntn lol, you might now see how this nick was selected too
11:38 🔗 Ivy lolol
11:39 🔗 Ivy a lot of layouts have some obvious relationship between n and t
11:40 🔗 ntntn ok
11:40 🔗 Ivy in dvorak they're next to each other and in colemak they have the same position as workman even
11:44 🔗 coderobe has quit IRC (Remote host closed the connection)
11:57 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:57 🔗 BlueMax has joined #archiveteam-bs
12:05 🔗 JAA Fusl, Igloo: So how to move forward with picosong? It's not possible to just distribute the /cdn/HEX.mp3 URLs since those are IP-restricted. So basically we'd have to distribute the entire thing (those URLs are generated when you access the stream or download page). That's possible with qwarc, but I haven't done it before, so it'll require some fiddling. Or would it be possible with mips maybe?
12:07 🔗 ntntn_ has joined #archiveteam-bs
12:08 🔗 ntntn has quit IRC (Ping timeout: 260 seconds)
12:08 🔗 ntntn_ is now known as ntntn
12:10 🔗 schbirid has quit IRC (Remote host closed the connection)
12:10 🔗 JAA Actually no, mips won't work reliably either due to how connections are distributed to IP addresses. There'd have to be a guarantee that the connections used by one item always use the same IP.
12:11 🔗 JAA (On the plus side, I've drained nearly all my upload backlog in the last couple hours.)
12:13 🔗 ntntn Ivy: i took a screenshot. also, pls, was anything relevant typed while i crashed out? https://i.postimg.cc/xdJVXpWM/Screenshot-20191009-124246.png
12:14 🔗 ntntn that is 'AnySoftKeyboard'
12:14 🔗 JAA ntntn: Logs of this channel and a few others are at http://archive.fart.website/bin/irclogger_logs
12:15 🔗 ntntn i should have known
12:15 🔗 ntntn thx
12:16 🔗 Ivy ^^^
12:16 🔗 Ivy also yes anysoftkeyboard is nice
12:16 🔗 JAA Can you move this keyboard discussion to #archiveteam-ot please?
12:17 🔗 ntntn sorry about that JAA. noted for future too
12:19 🔗 coderobe has joined #archiveteam-bs
12:28 🔗 markedL picosong seems to fit with tracker if you just pregenerate the IDs into job files
12:29 🔗 ntntn has quit IRC ()
12:30 🔗 markedL is it clear the rate limit is per IP or just a uniform slowdown put into every request?
12:30 🔗 JAA Yeah, it could of course be done with the tracker. Effectively, qwarc is kind of like the tracker, just locally. However, I'm not going to write Lua code for it, sorry.
12:32 🔗 JAA I'm pretty sure the rate limit is per IP. It's Cloudflare, and the error message is this: "The owner of this website (picosong.com) has banned you temporarily from accessing this website."
12:33 🔗 JAA I could also try emailing them and asking whether they could whitelist my IP or something.
12:34 🔗 JAA I'm probably not the only one to try to grab a full copy of it, and that gets expensive quickly since it's S3.
12:38 🔗 bluefoo_ has quit IRC (Ping timeout: 745 seconds)
12:40 🔗 markedL how many IPs does mips have?
12:43 🔗 JAA Many. I don't know the number, but it'd be enough. But as mentioned, it won't work, at least not directly/without changes.
12:45 🔗 kiska Many /24s if I remember
13:05 🔗 JAA Yeah, something like that. One /24 would be enough already, see numbers above. Assuming they don't ban the entire range or something.
13:06 🔗 JAA But yeah, won't work directly.
13:15 🔗 markedL my script is getting 429 also, very quick, didn't even reach 100 urls tried
13:16 🔗 JAA I'll send them an email.
13:17 🔗 Fusl glhf
13:19 🔗 markedL the /cdn/HEX.mp3 urls seem migrate-able to me. and that's the one getting 492
13:19 🔗 markedL ^429
13:23 🔗 markedL also, I'm testing them via Tor
13:24 🔗 markedL this work for everyone? https://picosong.com/cdn/36dca710680fae932906739e3fa37c28.mp3
13:27 🔗 JAA Yes, works. Weird, I'm sure I got 403s before when trying to access a generated link from another IP.
13:28 🔗 benjins markedL It looks like it works for me (in the US), downloaded the whole file just fine
13:29 🔗 bluefoo has joined #archiveteam-bs
13:29 🔗 HashbangI has quit IRC (Remote host closed the connection)
13:34 🔗 markedL It's possible both are true if that was on a S3 URI, but I don't know to generate that test case reliably
13:39 🔗 HashbangI has joined #archiveteam-bs
13:53 🔗 markedL here's an example: http://picosong.s3.amazonaws.com/WDf2/fetch%20me%20the%20microphone.mp3?Signature=8%2FvRquUfTWhcWmzqpNBfnDntLYg%3D&Expires=1570629845&AWSAccessKeyId=AKIAIVYGJY7GGRJY2Y3A
13:56 🔗 markedL and in between http://cdn.picosong.com/mgze/Fool%20On%20The%20Hill%20-%20Me%20%26%20The%20Beatles.mp3?Signature=TcPnu53my6mu75wzAa3WOW5Qbmw%3D&Expires=1570630204&AWSAccessKeyId=AKIAIVYGJY7GGRJY2Y3A
14:18 🔗 DogsRNice has joined #archiveteam-bs
14:21 🔗 DogsRNice some stuff going on with gamepedia and fandom
14:21 🔗 DogsRNice https://community.fandom.com/wiki/User_blog:MisterWoodhouse/Our_first_update_on_the_new_platform
14:22 🔗 DogsRNice has quit IRC (Remote host closed the connection)
14:22 🔗 DogsRNice has joined #archiveteam-bs
14:24 🔗 DogsRNice has quit IRC (Remote host closed the connection)
14:24 🔗 DogsRNice has joined #archiveteam-bs
15:07 🔗 Yurume has quit IRC (No Ping reply in 180 seconds.)
15:09 🔗 Yurume has joined #archiveteam-bs
15:32 🔗 Yurume has quit IRC (No Ping reply in 180 seconds.)
15:34 🔗 Yurume has joined #archiveteam-bs
15:39 🔗 deevious has quit IRC (Quit: deevious)
15:51 🔗 Yurume has quit IRC (No Ping reply in 180 seconds.)
15:53 🔗 Yurume has joined #archiveteam-bs
16:21 🔗 Raccoon has quit IRC (Ping timeout: 258 seconds)
16:39 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
16:40 🔗 Mayonaise has joined #archiveteam-bs
16:40 🔗 icedice has joined #archiveteam-bs
16:47 🔗 ats has quit IRC (leaving)
17:03 🔗 ats has joined #archiveteam-bs
17:14 🔗 VADemon has quit IRC (Quit: left4dead)
17:18 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
17:22 🔗 chirlu has joined #archiveteam-bs
17:47 🔗 JAA picosong backlog is cleared. At least that's good.
18:04 🔗 markedL I figure someone will write the lua, if you say that's the best option
18:05 🔗 JAA The best option would be them whitelisting me so that I can just continue where I stopped since that requires zero work.
18:06 🔗 JAA If they don't reply within a reasonable amount of time, we can take the DPoS route.
18:13 🔗 markedL True
18:14 🔗 markedL ntntn : test warc for disc> https://transfer.sh/j8rgz/disc1-alpha.warc
18:24 🔗 d5f4a3622 has quit IRC (Ping timeout: 255 seconds)
18:39 🔗 d5f4a3622 has joined #archiveteam-bs
19:01 🔗 bluefoo has quit IRC (Ping timeout: 496 seconds)
19:51 🔗 bluefoo has joined #archiveteam-bs
20:14 🔗 Meroje has quit IRC (Quit: bye!)
20:23 🔗 Meroje has joined #archiveteam-bs
20:26 🔗 bluefoo has quit IRC (Ping timeout: 252 seconds)
20:32 🔗 bluefoo has joined #archiveteam-bs
21:57 🔗 Meroje has quit IRC (Quit: bye!)
21:58 🔗 Meroje has joined #archiveteam-bs
22:02 🔗 Meroje has quit IRC (Client Quit)
22:02 🔗 Meroje has joined #archiveteam-bs
22:06 🔗 Meroje has quit IRC (Client Quit)
22:06 🔗 Meroje has joined #archiveteam-bs
22:10 🔗 Meroje has quit IRC (Client Quit)
22:10 🔗 Meroje has joined #archiveteam-bs
22:14 🔗 Meroje has quit IRC (Client Quit)
22:14 🔗 Meroje has joined #archiveteam-bs
22:28 🔗 ats_ has joined #archiveteam-bs
22:29 🔗 ats has quit IRC (Read error: Operation timed out)
22:32 🔗 ats_ has quit IRC (Read error: Operation timed out)
22:59 🔗 phillipsj has joined #archiveteam-bs
23:06 🔗 ScruffyB has quit IRC (Read error: Operation timed out)
23:12 🔗 BlueMax has joined #archiveteam-bs
23:27 🔗 ats has joined #archiveteam-bs
23:28 🔗 VoltZero has joined #archiveteam-bs

irclogger-viewer